WIP: First attempt for DITA -> DocBook with Python #23

tomschr · 2017-12-20T20:51:58Z

After I've had thought about implementing the extension functions into pyproc.py, I realized, this has to be done in a much deeper level. Therefor I tried to "rewrite" the shell script with Python as ditatodocbook.py. Hope nobody will kill me.

Sorry, but I can't stand the shell syntax and do at the same time complex XML juggling. For such a complex task (and it will getting worse), shell is not the best tool. I think this limits us a lot.

I see the following benefits with the Python approach:

More readable than this shell syntax gibberish
easier to extend it, it's Python
easier to deal with string operations, list operations etc.
logging with the -v option
a configuration file in INI format
A DITA summary of all conref's, keywords etc.
Uses of the lxml library
No global variables 😉
Uses already code from pyproc.py (used as a "library")
No further dependencies than lxml

Currently, the script does almost its job. The design is mostly based on the shell script, but shifting things into functions to make it easier to call them separately.

However, there are still some things to do:

the script creates a ditasummary.xml file in the temp directory. This is an excerpt of all important properties we need to deal with (keywords, conrefs, ...) in a flat structure. Maybe we can use this file to load this into our stylesheets and to make transformation a bit easier. Not sure, if this is feasible...
Implement convert2db to really do the transformation from DITA to DocBook. Currently, this is done by the saxon9 script. For some reason, xsltproc (or lxml) create empty DocBook files.
Implement copyimages()and collect_linkends() functions
Add some Python extension functions into the XSLT stylesheets to simplify some tasks. That was the main reason to make these feasible.

Avoids conflicts with standard Dietrich config file (which ends by .config)

We need the same XMLParser properties again, so create a argparse.Namespace object

This DITA summary file (by default: $TMPDIR/ditasummary.xml) contains all essential information about the whole directory; for example, xml:base of the directory, href of the current dita file, list of keywords, list of conrefs. The structure is flat, each dita file contains a <ditafile> element with all the necessary information.

* collect_linkends: Collect all linkends of the converted files * copyimages: Copy all image files

Currently, applying dita2docbook_template.xsl to DITA files creates empty outputs. Would it be better to call the saxon9 script instead?

tomschr added 16 commits December 20, 2017 13:34

Create links from xslt directory

985721b

Change stylesheet verison from 2.0 -> 1.0

2a2fc87

Add .gitignore

487e8aa

First attempt for DITA -> DocBook with Python

1684ee1

Add missing paths.xsl

5d3528c

Change stylesheets from 2.0 -> 1.0

4cad14a

Rename default config file to conversion.cfg

297ffd9

Avoids conflicts with standard Dietrich config file (which ends by .config)

xmlparser_args(): Create Namespace arguments

e08a552

We need the same XMLParser properties again, so create a argparse.Namespace object

Implement make_unique_ids()

38dfba0

Rename pyproc_xmlparser() -> create_xmlparser()

d49a181

Add --tmpdir to define your own temp directory

66069df

Improve comments, add FIXMEs

a5daf10

Introduce two new functions (unfinished)

6bf88fa

* collect_linkends: Collect all linkends of the converted files * copyimages: Copy all image files

convert2db: First attempt

4d333e4

Currently, applying dita2docbook_template.xsl to DITA files creates empty outputs. Would it be better to call the saxon9 script instead?

Fix some pass statements

22acaa9

tomschr self-assigned this Dec 20, 2017

tomschr changed the title ~~First attempt for DITA -> DocBook with Python~~ WIP: First attempt for DITA -> DocBook with Python Dec 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: First attempt for DITA -> DocBook with Python #23

WIP: First attempt for DITA -> DocBook with Python #23

tomschr commented Dec 20, 2017 •

edited

Loading

WIP: First attempt for DITA -> DocBook with Python #23

Are you sure you want to change the base?

WIP: First attempt for DITA -> DocBook with Python #23

Conversation

tomschr commented Dec 20, 2017 • edited Loading

tomschr commented Dec 20, 2017 •

edited

Loading