''Bringing some sanity to database maintenance.''
XML to DDL is a set of python programs to convert an XML representation of a database into a set of SQL (or DDL: Data Definition Language) statements.
In addition XML to DDL can examine the difference between two XML files and output a sequence of ALTER statements that will update the database to conform to the new schema.
If you install the required python package 1 you can download the XML schema from the database directly (in the SVN repository).
Finallly, XML to DDL can generate HTML documentation of your schema.
You can find more information and download files at the Berlios page
And you can find documentation through the many test cases.
You can add comments or help in the wiki pages.
I you like you can read this document in PDF form.
This code was written with Agile Data Modeling techniques in mind see also Ten Steps to a Robust DB, Agile Data Modeling.
[1] | For PostgreSQL you need psycopg, For MySQL you need MySQLdb and for Firebird you need kinterbasdb. |
The following is a simple schema XML definition of a database:
<schema> <table name="students" fullname="List of Students" desc="List of students with their full names"> <columns> <column name="id" fullname="Primary Key" type="integer" key="1" desc="Primary key for the table"/> <column name="student_name" fullname="Student Name" type="varchar" size="80" desc="The full name of the student"/> </columns> </table> </schema>
Here we run the program indicating output for PostgreSQL:
python xml2ddl.py --dbms postgres schema1.xml
We get the following output:
DROP TABLE students; CREATE TABLE students ( id integer, student_name varchar(80), CONSTRAINT pk_students PRIMARY KEY (id)); COMMENT ON TABLE students IS 'List of students with their full names'; COMMENT ON COLUMN students.id IS 'Primary key for the table'; COMMENT ON COLUMN students.student_name IS 'The full name of the student';
If we run the program again for Firebird:
python xml2ddl.py --dbms firebird schema1.xml
we'll get different output:
DROP TABLE students; CREATE TABLE students ( id integer, student_name varchar(80), CONSTRAINT pk_students PRIMARY KEY (id)); UPDATE RDB$RELATIONS SET RDB$DESCRIPTION = 'List of students with their full names' WHERE RDB$RELATION_NAME = upper('students'); UPDATE RDB$RELATION_FIELDS SET RDB$DESCRIPTION = 'Primary key for the table' WHERE RDB$RELATION_NAME = upper('students') AND RDB$FIELD_NAME = upper('id'); UPDATE RDB$RELATION_FIELDS SET RDB$DESCRIPTION = 'The full name of the student' WHERE RDB$RELATION_NAME = upper('students') AND RDB$FIELD_NAME = upper('student_name');
The example shows a feature of XML to DDL, database independence. Currently the program supports the Firebird, PostgreSQL, and MySQL databases, but more will probably become available.
Another key feature is the ability to examine the changes done to the XML and generate the DDL statements necessary to perform the changes to the database. If this is a new XML schema (schema2.xml):
<schema> <table name="students" fullname="List of Students" desc="List of students"> <columns> <column name="id" fullname="Primary Key" type="integer" key="1" desc="Primary key for the table"/> <column name="student_name" fullname="Student Name" type="varchar" size="100" desc="The full name of the student"/> <column name="email" fullname="Electronic mail address" type="varchar" size="100" desc="The primary email for the student"/> </columns> </table> </schema>
Running this program:
python diffxml2ddl.py --dbms postgres schema1.xml schema2.xml
Produces the following DDL output:
ALTER TABLE students ALTER student_name TYPE varchar(80); ALTER TABLE students DROP email; COMMENT ON TABLE students IS 'List of students with their full names';
However, an older version of PostgreSQL doesn't support altering the column type:
python diffxml2ddl.py --dbms postgres7 schema1.xml schema2.xml
The a temporary column needs to be created, the data copied over and the old column dropped:
ALTER TABLE students ADD tmp_student_name varchar(80); UPDATE students SET tmp_student_name = student_name; ALTER TABLE students DROP student_name; ALTER TABLE students RENAME tmp_student_name TO student_name; ALTER TABLE students DROP email; COMMENT ON TABLE students IS 'List of students with their full names';
You can find a complete list of examples here: http://xml2ddl.berlios.de/testdetails.html
If you find yourself repeating the same attributes in your XML schema over and over you can put these in a dictionary:
<dictionary name="column"> <dict class="key" name="id" fullname="Primary Key" type="integer" null="no" key="1" desc="Primary key for the table" /> </dictionary>
In this example we are telling the parser that the dictionary is for the nodes called column and when it sees the class key, it should put in the the other attributes listed. So using this dictionary this would be equivalent::
... <columns> <column class="key"/> </columns> ...
as:
... <columns> <column name="id" fullname="Primary Key" type="integer" null="no" key="1" desc="Primary key for the table"/> </columns> ...
In addition you can override any attributes in the dictionary, for example this:
... <columns> <column class="key" name="student_id"/> </columns> ...
would then be equivalent to:
... <columns> <column name="student_id" fullname="Primary Key" type="integer" null="no" key="1" desc="Primary key for the table"/> </columns> ...
The dictionaries can also support multiple inheritance through the inherits attribute. Here's a rather contrived example:
<dictionary name="column"> <dict class="index" type="integer" null="no"/> <dict class="pk key="1"/> <dict class="key" inherits="index,pk" name="id" fullname="Primary Key" desc="Primary key for the table"/> </dictionary>
You can download the XML schema directly from the database. Requires a connection that conforms to the Database API. Without changing the code you can use the following connectivities:
Database | Connectivity |
---|---|
PostgreSQL | psycopg |
MySQL | MySQLdb |
Firebird | kinterbasdb |
Here's how to use the command:
python downloadXml.py --dbms <dbms> --database <database> --user <user> --pass <pass> > <filename.xml>
dbms can be one of postgres, mysql, or firebird. Defaults to "postgres".
database the name of the database, defaults to "postgres"
user the user name to connect to the database, defaults to "postgres"
pass the password to user, defaults to "postgres"
filename.xml by default it sends the XML to the console (stdout) you can pipe the output to a file as shown above.
Some of the attributes in the XML are used solely for documentation purposes. For example, fullname has no equivalent in most DBMSs. Another, it deprecated which indicates that a column or table should no longer be used, but hasn't been deleted yet.
Here's how to output the HTML document:
python xml2html.py --file schema.html schema.xml
The following is a list of the tags and attributes that xml2ddl accepts or is planned to be accepted in the future. The attributes enclosed in [square brackets] are optional. Also there are lot of thing not supported yet, and are so indicated. Basically, all the tags below except for <schema> is optional. Note, as all XML files the tag names and attribute name (eg. <table>) is case sensitive (i.e. <Table> will not work!). Attribute, values are case insensitive, (eg. dotschema="Yes" and dotschema="yes" should both work).
<schema> <include/> ... <dictionary> <dict/> ... </dictionary> ... <table> <columns> <column/> ... </columns> <indexes> <index/> ... </indexes> <relations> <relation/> ... </relations> <contraints> <contraint/> ... </contraints> <triggers> <trigger> .. </trigger> ... </triggers> </table> ... <view> -- view contents </view> ... <function> -- function contents. </function> ... </schema>
Here are the details of each of the XML tags.
<schema [name="1"] [dotshema="2"] [generated="3"]>
Not all databases have schemas, but you still need the tag.
<include href="1"/>
You can use includes to break a large XML schema into smaller pieces.
<dictionary name="1"> <dict class="2" 3="4"/> </dictionary>
The dictionary is a general system for adding attributes.
<table name="1" [oldname="2" [fullname="3"] [desc="4"] [inherits="5"]>
Create a table definition. The order may be important since xml2ddl isn't too careful about creating contraints before the table exists.
<columns> <column name="1" [oldname="2"] [fullname="3"] [desc="4"] type="5" [size="6"] [precision="7"] [null="8"] [unique="9"] [key="10"] [default="11"] [autoincrement="12"] [deprecated="13"]/> </column> </columns>
The <columns> tag gives an order list of attributes. Currently, xml2ddl doesn't reorder the columns if you move things around.
<colums> <column ....> <enumeration [name="1"] [fullname="2"] [desc="3"] [constraint="4"]> <enum val="5" [display="6"] [desc="7"]/> ... </enumeration> </column> </columns>
- Not supported Enumerations is a limited list of values that a column can contain. One purpose of enumerations is to aid in coding, to automaticaly create an enum in code, forcing the developer to use one of the enumerated types.
<relations> <relation [name="1"] [oldname="2"] column="3" table="4" [fk="5"] [ondelete="6"] [onupdate="7"]/> </relations>
Relations is an unordered list of foreign key contraints to other tables and columns. For DBMS that don't support this, the relations would be used only for documentation purposes.
<indexes> <index [name="1"] [oldname="2"] columns="3" [unique="4"] [using="5"] [where="6"]/> </indexes>
Index are an unorder list of indexes on a table (i.e. the order of the <index/> tags does not matter).
<constraints> <constraint [name="1"] [oldname="2"] [longname="3"] [desc="4"] columns="5" [unique="6"] [check="7"]/> </constraints>
The <constraints> tag lists an unorder list of contraint rules, if the database supports it. - Not supported
<triggers> <trigger name="1" [oldname="2"] [longname="3"] [desc="4"] timing="5" events="6" [fire="7"] [function="8"]> (9) </trigger> </triggers>
- Not supported The <triggers> tag lists an unorder list of triggers for the table, if the database supports it.
<dataset [only="1"]> <val 2="3"/> </dataset>
A dataset is a set of data that should be in the table. Useful, when you need to store a small set of values in the table.
<view name="1" [fullname="2"] [oldname="3"] [desc="4"] [columns="5"]> (6) </view>
Create a view to the table. (new)
<function name="1" [oldname="2"] [fullname="3"] [desc="4"] [arguments="5"] [returns="6"] [language="7"] [dbms="8"] [volatile="9"]> (10) </function>
You can specify the body of a stored procedure or function.
Storing the schema in this form has some advantages:
Here are the major directions I see XML to DDL going:
I've been pointed to another project which looks similar calle ERW A quick look shows that it tries to work at a higher level than my XML does (i.e. more abstract). It also generates code for PHP and produces nicer documentation.
Written in the Perl programming language is SQL Fairy. It seems to do a lot of what I'm doing but is even larger in scope.