Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: First attempt for DITA -> DocBook with Python #23

Open
wants to merge 16 commits into
base: pyproc
Choose a base branch
from
Open

Conversation

tomschr
Copy link
Contributor

@tomschr tomschr commented Dec 20, 2017

After I've had thought about implementing the extension functions into pyproc.py, I realized, this has to be done in a much deeper level. Therefor I tried to "rewrite" the shell script with Python as ditatodocbook.py. Hope nobody will kill me.

Sorry, but I can't stand the shell syntax and do at the same time complex XML juggling. For such a complex task (and it will getting worse), shell is not the best tool. I think this limits us a lot.

I see the following benefits with the Python approach:

  • More readable than this shell syntax gibberish
  • easier to extend it, it's Python
  • easier to deal with string operations, list operations etc.
  • logging with the -v option
  • a configuration file in INI format
  • A DITA summary of all conref's, keywords etc.
  • Uses of the lxml library
  • No global variables 😉
  • Uses already code from pyproc.py (used as a "library")
  • No further dependencies than lxml

Currently, the script does almost its job. The design is mostly based on the shell script, but shifting things into functions to make it easier to call them separately.

However, there are still some things to do:

  • the script creates a ditasummary.xml file in the temp directory. This is an excerpt of all important properties we need to deal with (keywords, conrefs, ...) in a flat structure. Maybe we can use this file to load this into our stylesheets and to make transformation a bit easier. Not sure, if this is feasible...
  • Implement convert2db to really do the transformation from DITA to DocBook. Currently, this is done by the saxon9 script. For some reason, xsltproc (or lxml) create empty DocBook files.
  • Implement copyimages()and collect_linkends() functions
  • Add some Python extension functions into the XSLT stylesheets to simplify some tasks. That was the main reason to make these feasible.

Avoids conflicts with standard Dietrich config file (which ends by .config)
We need the same XMLParser properties again, so create a
argparse.Namespace object
This DITA summary file (by default: $TMPDIR/ditasummary.xml) contains
all essential information about the whole directory; for example,
xml:base of the directory, href of the current dita file, list of
keywords, list of conrefs.

The structure is flat, each dita file contains a <ditafile> element
with all the necessary information.
* collect_linkends: Collect all linkends of the converted files
* copyimages: Copy all image files
Currently, applying dita2docbook_template.xsl to DITA files
creates empty outputs. Would it be better to call the saxon9
script instead?
@tomschr tomschr self-assigned this Dec 20, 2017
@tomschr tomschr changed the title First attempt for DITA -> DocBook with Python WIP: First attempt for DITA -> DocBook with Python Dec 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant