Skip to content

An XML schema compiler written in C++, which generates Python code for marshalling and unmarshalling objects

License

Notifications You must be signed in to change notification settings

Tjoppen/pyjames

Repository files navigation

pyjames - A simple XML schema compiler for Python, written in C++
This is a fork of james (by Tomas Härdin), while the fork itself was initially
created by Ludvig Widman

Purpose/goals
=============

The purpose of this program is to transform a subset of XSL into Python code for marshalling and unmarshalling documents conforming to one or more schemas.
The generated code should not be needlessly complicated (no getters or setters).
The number of dependencies should be kept to a minimum.

Compilation
===========

pyjames makes use of CMake.
This allows project files and Makefiles to be generated for a variety of system without having to maintain said files.
The program is known to compile in *NIX like system (GNU/Linux, Mac OS X) via Makefiles and Windows via Visual Studio 2008.

Dependencies
------------

pyjames depends on Xerces-C++ 2.8.x or 3.1.x.

Building
--------

The program is built like a typical CMake project.
A normal build and install will look something like this (output omitted):

 ~/pyjames$ mkdir build
 ~/pyjames$ cd build
 ~/pyjames/build$ cmake ..
 ~/pyjames/build$ make
 ~/pyjames/build$ sudo make install

A short guide to usage
======================

Running the program without arguments produces the following usage information:

 USAGE: pyjames [-v] [--dry-run] output-dir list-of-XSL-documents
  -v         Verbose mode
  --dry-run  Perform generation but don't write anything to disk - instead does exit(1) if any file changes

 Generates Python classes for marshalling and unmarshalling XML to Python objects according to the given schemas.
 Files are output in the specified output directory and are named type.py

The --dry-run switch can be used to test whether running pyjames would change any files on disk.
This includes modifying existing files or creating new ones, but not no longer generating files for types that no longer exist.

Generation example
------------------

For a small project with a single schema, generating Python code for handling said schema looks like this:

 ~/foo$ mkdir generated
 ~/foo$ pyjames generated schema.xsd

To make generated into a proper package you must then create the empty file generated/__init__.py and copy pyjames/py/JamesXMLObject.py in there.
Note that outputing the files in the current directory is not recommended since that makes cleaning up difficult.

Using the generated code
========================

Here is a small example schema and code that makes use of the generated classes:

example.xsd
-----------

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://example.com/" elementFormDefault="qualified" xmlns="http://example.com/">
  <xs:element name="PersonDocument" type="tns:PersonType"/>
  <xs:element name="PersonListDocument">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="person" type="tns:PersonType" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:complexType name="PersonType">
    <xs:attribute name="name"      type="xs:string" use="required"/>
    <xs:attribute name="address"   type="xs:string" use="required"/>
    <xs:attribute name="birthYear" type="xs:int"    use="required"/>
  </xs:complexType>
</xs:schema>

person.xml
----------
<?xml version="1.0" encoding="UTF-8"?>
<PersonDocument name="Foo Oldperson" address="Bar Street 123" birthYear="1900"/>

example.py
-----------

import sys
from xml.dom.minidom import *
from generated.PersonDocument import *
from generated.PersonListDocument import *

def main():
    #unmarshal personIn from stdin
    personIn = PersonDocument.fromxml(sys.stdin)

    #create a list containing personIn and some other person
    list = PersonListDocument()
    list.person.append(personIn)
    list.person.append(PersonType("Some Otherguy", "Somewhere 999", 1985))

    #finally, marshal the list to stdout
    print list.toxml()

if __name__ == "__main__":
    main()

Generating a package for handling the schema
--------------------------------------------

The following creates a self-contained package for handling the schema.
This is a part of pyjames that will probably be improved in the near future.

~/example$ mkdir generated
~/example$ pyjames generated example.xsd
A generated/PersonDocument.py
A generated/PersonListDocument.py
A generated/PersonListDocumentType.py
A generated/PersonType.py
~/example$ touch generated/__init__.py
~/example$ cp ~/pyjames/py/JamesXMLObject.py generated

Running
-------

$ python example.py < person.xml | xmllint --format -
<?xml version="1.0" encoding="UTF-8"?>
<PersonListDocument xmlns="http://example.com/">
  <person address="Bar Street 123" birthYear="1900" name="Foo Oldperson"/>
  <person address="Somewhere 999" birthYear="1985" name="Some Otherguy"/>
</PersonListDocument>

Limitations
===========

Some schemas, like that of XHTML or XSL itself, will never be supported by pyjames.
The intent is to only support schemas that can be compiled into classes that consist of plain C++ objects.
Any schema that allows arbitrary order of elements in any type is explicitly not supported.

TODO
====

* Support more XSL features (within reason)
* Make switching between Xerces-C++ 2.8.x and 3.1.x easier
* Create output directory if it doesn't exist
* Create proper self-contained packages
 - touch __init__.py
 - output JamesXMLObject.py

About

An XML schema compiler written in C++, which generates Python code for marshalling and unmarshalling objects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published