Skip to content

Oblomov/asciidoctor-litprog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Asciidoctor Literate Programming extension

1. What is this?

This is a (Ruby) module to add literate programming support to Asciidoctor.

In short, literate programming is an approach to writing software and its documentation by prioritizing human language. A literate programming “source” is composed by text in some documentation system (in this case, Asciidoctor’s flavor of AsciiDoc) that describes the logic, with interspersed code snippets whose union constitutes the source code as it will be passed to the compiler.

The process of creating the code from the snippets is known as Tangling, while the extraction of the documentation is known as weaving. The literate programming support we’re introducing in Asciidoctor with this module does not require a separate weaving step, since the source document is assumed to be a valid Asciidoctor document, and should therefore be processable as-is from the standard compiler, even without special modules. In our case, we call Weaving the process that enhances the document processing by improving the appearance and functionality of the chunk references.

(In fact, this README.adoc file is itself a literate-programming source for the module itself.)

2. Why is this?

There are previous effort to introduce literate programming features in the AsciiDoc format, including eWEB, nowasp and the Model Realization Tools' aweb (not to be confused with the Ada-centered one). So why do we need another one?

Integration

we hook directly into Asciidoctor, allowing single-pass processing of a document to produce both the documentation and the source; this also allows us to include proper cross-referencing and indexing for the code chunks at the documentation output level;

More features

most importantly, support the creation of multiple files from a single document, a feature that is missing from some of the existing tools; other features include improved navigation between chunks in the woven documentation, and the option to create a DOT graph of the chunk structure.

Syntax compatibility

the existing tools have slightly different syntax; the obvious solution to this is to introduce a new, incompatible syntax, but the actual plan is to also support the syntax from the other tools.

2.1. A note on syntax

The syntax we currently support is a small extension to the aweb syntax. Similarly to noweb, chunks in aweb are defined by an introductory line in the form <<Chunk title>>=, and are referenced by using the chunk title between double angle brackets: <<Chunk title>>.

It’s interesting to note how both AsciiDoc and noweb/aweb use the <<...>> syntax for references.

In contrast to noweb, aweb relies on AsciiDoc syntax to separate chunk definitions from documentation, and it does not support inline chunk references. In particular, this means that a source line is either a chunk reference (optionally surrounded by whitespace), or a text line (to be taken verbatim), and that an at symbol (@) at the beginning of the line has no special meaning.

The downside of this simplified syntax, aside from the restriction about chunk reference usage, is that there is no markup to indicate definition and usage of symbols. The upside is that the aweb syntax virtually eliminates the need for escaping chunk references or @ symbols.

(It is also impossible to have an actual code line that begins with << and ends with >>, so if your language needs those, you’re (currently) out of luck.)

Another minor difference between the two syntaxes is that in the aweb syntax chunks with the same name are automatically concatenated, so there is no need for the <<Chunk title>>+= notation.

This module supports the standard (“legacy”) aweb syntax (with the caveats below). In addition, we interpret every source block as the definition of a (single) chunk, using the block’s own title as the title of the chunk.

i
while we do support the “legacy” aweb syntax, the output is not guaranteed to match atangle's output. We output line directives more aggressively, and the behavior with empty definitions is slightly different.
❗︎
the root chunks auto-detection mechanism we employ with the “legacy” aweb syntax is quite aggressive, and may be subject to changes in the future.

3. How is this?

The module is implemented as an Asciidoctor extension. Since we are mainly interested in producing secondary outputs, the core of the module will be a TreeProcessor, that traverses the document tree to gather all blocks that define chunks to be output to the secondary files.

The processor needs to track some state, so it needs to override the default constructor (initialize method) to set things up properly. Asciidoctor’s processors take a configuration Hash on construction, so we follow the convention, even though we do not (at present) make use of any configuration, and remember to call the superclass' constructor (otherwise, the extension won’t work properly).

Main class definition
class LiterateProgrammingTreeProcessor < Asciidoctor::Extensions::TreeProcessor
  include Asciidoctor::Logging

  <<Plugin version>>
  def initialize config = {}
    super config
    <<Declare and initialize variables needed by the processor>>
  end

  <<Support methods>>
  <<Tangling methods>>
  <<Weaving methods>>
  <<Processing methods>>
end

Of course, we need to require the asciidoctor/extensions Ruby module to have the Asciidoctor::Extensions::TreeProcessor class available:

Requires
require 'asciidoctor/extensions'

And the plugin version corresponds to the document one:

Plugin version
VERSION = '2.3'

The Weaving process will introduce cross-referencing between the chunks as well as navigation links between blocks contributing to the same chunk. We want to be able to provide default styling for these links, which we can do using a Docinfo processor that will insert the needed CSS in the document head.

Main class…​
class LiterateProgrammingDocinfoProcessor < Asciidoctor::Extensions::DocinfoProcessor
  <<Plugin version>>

  use_dsl
  at_location :head
  def process doc
%(<style>
<<Styling for woven links>>
</style>)
  end
end

3.1. Chunk management

Each chunk is identified by a title, and the corresponding source code may be split across multiple blocks. The (final) content of a chunk is obtained by the concatenation of all the blocks with the same title.

The title of the chunk is used as a handle, that can be referenced by other chunks to declare that the content of the referenced chunk should be inlined in the referencing chunk (this inlining process is known as Tangling). A special kind of chunk is the root chunk, that is not referenced by any other chunk and represents the starting point for the tangling process. We support the creation of multiple files from the same source, so we can have multiple root chunks, and we use the chunk title to represents the name of the file to be created by each root chunk.

The natural data structure to store chunks (be them generic or root chunks) is a Hash that maps the title (a String) to the content (an Array). For the processor we need to declare two such hashes: @chunks will hold the generic code chunks, while @roots will hold root chunks.

Since the source code associated with a generic chunk can be spread out over multiple blocks, we define a default value constructor for @chunks: this will simplify the process of appending new lines to a value each time we come across a new block.

The root chunk is assumed to be unique per output file (i.e. per title), but we still provide the same default value constructor, since this will allow us to handle the extraction in the same way for both types. Uniqueness of root chunks will be handled explicitly during block processing.

Declare…​
@roots = Hash.new { |hash, key| hash[key] = [] }
@chunks = Hash.new { |hash, key| hash[key] = [] }

Chunk titles can be nearly arbitrary strings, but are conventionally a natural language (synthetic) descriptions of the chunk intended use. As this can get on the longish side, and typing them multiple times can be time-consuming and error-prone, additional uses of the same title can be shortened to any unambiguous prefix followed by an ellipsis of three literal dots (…​). For example, a chunk may be titled Automagical creation of bug-free code, and this may be shortened to Automagic…​ if there are no other chunks whose title begins with Automagic.

We do require that the first time a chunk title is encountered (be it to define it or as a reference in another chunk) it must be written out in full. Moreover, since the trailing ellipsis is taken to be a shorthand notation, a chunk title cannot naturally end with it.

To assist in the handling of shortened chunk titles, we keep track of all the (full) titles we’ve come across so far:

Declare…​
@chunk_names = Set.new

and we provide a support method that will take a (possibly shortened) chunk title and return the full title, raising an exception if we do not find one (and only one) chunk title starting with the given prefix:

Support…​
def full_title string
  pfx = string.chomp("...")
  # nothing to do if title was not shortened
  return string if string == pfx
  hits = @chunk_names.find_all { |s| s.start_with? pfx }
  raise ArgumentError, "No chunk #{string}" if hits.length == 0
  raise ArgumentError, "Chunk title #{string} is not unique" if hits.length > 1
  hits.first
end

3.2. Chunk contents and metadata

The chunk content is stored as an Array whose elements are either Strings (the actual chunk lines), Asciidoctor::Reader::Cursors, an Asciidoctor-provided structure that carries information about the origin (file and line number) of the blocks, or Hashes (the attributes of the block that originated this component).

Since, as we mentioned, a chunk may span multiple blocks, we can easily track information about the origin of each of the component blocks by storing the corresponding Cursor before the corresponding lines, as detailed in the Collecting chunks section.

We also track separately which chunks are referred to by which other chunks (and in which block) to be able to provide a relationship graph if requested.

Declare…​
@chunk_backrefs = Hash.new { |hash, key| hash[key] = [] }

Updates to @chunk_backrefs are abstracted by the add_chunk_ref function:

Support…​
def add_chunk_ref includer, includer_block_id, included
  @chunk_backrefs[included].push [includer, includer_block_id]
end

3.3. Metadata output

The origin information for a block can be used to add appropriate metadata to the output files. The format with which this information is output is set by the litprog-line-template document attribute, a string where the %{line} and %{file} keywords will be replaced by the source line number and file name, respectively. As an example, for languages that do not have built-in support for a line directive, a vim-friendly solution for code navigation would be:

Example of line template setting
:litprog-line-template: # %{file}:%{line}

The default value for this template produces a C-style #line directive:

Set default attributes
doc.set_attr 'litprog-line-template', '#line %{line} "%{file}"', false

Syntax-specific line templates can be specified through a template litprog-line-template-_lang_ where lang is the language name as it would be used to specify the syntax highlighting language of a source block. The module comes with a specialization for CSS:

Set default attributes
doc.set_attr 'litprog-line-template-css', '/* %{file}:%{line} */', false

In the tree processor, the templates used to print the line information are stored in the member variable @line_directive_template, a hash mapping the language to the template. During Tangling, line directives may change based on the language of the chunk block being output, so we keep track of active directives in the @active_line_directive_template stack:

Declare…​
@line_directive_template = { }
@active_line_directive_template = []

These variables are initialized at the beginning of the tangling phase, with the special key _ used for the default template.

Set line directive
@line_directive_template['_'] = doc.attr('litprog-line-template').dup
doc.attributes.each do |key, value|
  lang = key.dup
  if lang.delete_prefix! 'litprog-line-template-'
    @line_directive_template[lang] = value unless lang.empty?
  end
end
@active_line_directive_template.push @line_directive_template['_']

The actual output of the line directive is encapsulated in the output_line_directive method:

Support…​
def output_line_directive file, fname, lineno
  template = @active_line_directive_template.last
  file.puts( template % { line: lineno, file: fname}) unless template.nil_or_empty?
end

3.4. Tangling

Tangling is the process of “stitching together” all the code blocks, recursively following the referenced chunks starting from the root chunk, for each file.

References to other chunks are identified by a chunk title written between double angle brackets (e.g. <<(Possibly shortened) chunk title>>) on a line of its own, optionally surrounded by whitespace. When processing chunks line by line, we may want to check if a particular line is a chunk reference, and if so we’ll want the full name of the chunk, as well as any indenting that precedes the reference:

Support…​
def is_chunk_ref line
  if line.match /^(\s*)<<(.*)>>\s*$/
    return full_title($2), $1
  else
    return false
  end
end

The recursive tangling of chunks is achieved by starting at the root chunk, outputting any line that is not a reference to another chunk, and recursively calling the function any time a reference is encountered.

The state we need to keep track of during the recursion is composed of:

the output stream

to which we are writing the lines,

the title of the chunk being processed

to detect circular references and produce meaningful error messages,

the current indent

added to all lines being output,

the contents of the chunk being processed

this could be obtained knowing the chunk name and the chunk type, but by passing the chunk contents itself we can simplify the logic of the method,

the names of the chunks we’re in the middle of processing

this is a Set to which chunk names are added when entering the method and removed on exit, and it is used to detect circular references.

As mentioned in Chunk contents and metadata, the chunk is an Array whose elements are either Strings (the actual chunk lines), Hashes of attributes, or Asciidoctor::Reader::Cursors (that provide source line information). We handle the three cases separately, and raise an appropriate exception if we come across something unexpected.

We return the number of time the active line directive template was pushed, so that it can be popped as many times by the caller.

Tangling…​
def recursive_tangle file, chunk_name, indent, chunk, stack
  stack.add chunk_name
  fname = ''
  lineno = 0
  line_directive_template_push = 0
  chunk.each do |line|
    case line
    <<Hash case>>
    <<Cursor case>>
    <<String case>>
    else
      raise TypeError, "Unknown chunk element #{line.inspect}"
    end
  end
  stack.delete chunk_name
  return line_directive_template_push
end

In the Hash case, we only care about finding the source language of the block, if defined, to set the @active_line_directive_template appropriately:

Hash case
when Hash
  lang = line.fetch('language', '_')
  lang = '_' unless @line_directive_template.key? lang
  @active_line_directive_template.push @line_directive_template[lang]
  line_directive_template_push += 1

A Cursor always precedes the content lines it refers to. We use it to update the filename (fname) and line number (lineno) information, and we output a line directive, since the upcoming text lines will have a different origin compared to what has been output so far:

Cursor case
when Asciidoctor::Reader::Cursor
  fname = line.file
  lineno = line.lineno + 1
  output_line_directive file, fname, lineno

If the chunk element we’re processing is a String, this can be either a reference to another chunk, or an actual content line. In both cases, we update the current origin line number lineno, so that the origin information is correct if we need to output a new line directive.

If the line is not a reference, we just output it as-is, preserving indent, except for empty strings, in which case the indent is not added.

String case
when String
  lineno += 1
  ref, new_indent = is_chunk_ref line
  if ref
    <<Reference case>>
  else
    file.puts line.empty? ? line : indent + line
  end

In the reference case, we check for circular references or references to undefined chunks (raising appropriate exceptions), and then recurse into the referenced chunk. After returning from the referenced chunk, we output a new line directive, so that subsequent lines from the current chunk have correct origin information metadata. If the line directive template was change in the recursion, we pop it after outputting the new line, under the assumption that the language change will not be in effect until the next actual line of output.

i
The rationale for this is that language changes happen in embedded language context, with the fences delimiting the new language part of the block in the original language. An example of this is the CSS embedded by the Docinfo Processor of this module.
Reference case
# must not be in the stack
raise RuntimeError, "Recursive reference to #{ref} from #{chunk_name}" if stack.include? ref
# must be defined
raise ArgumentError, "Found reference to undefined chunk #{ref}" unless @chunks.has_key? ref
# recurse and get line directive stack growth
to_pop = recursive_tangle file, ref, indent + new_indent, @chunks[ref], stack
output_line_directive file, fname, lineno
# pop line directive stack
@active_line_directive_template.pop to_pop

The recursive tangling process must be repeated for each root chunk defined by the document. Each root chunk will use the root name as output file name, unless overridden. The special root chunk name * will indicate that the chunks have to be streamed to the standard output.

Tangling…​
def tangle doc
  <<Set line directive>>
  <<Prepare output directory>>
  <<Root name map creation>>
  @roots.each do |name, initial_chunk|
    <<Remap file name if requested>>
    if name == '*'
      to_pop = recursive_tangle STDOUT, name, '', initial_chunk, Set[]
      @active_line_directive_template.pop to_pop
    else
      <<Convert name to full_path>>
      File.open(full_path, 'w') do |f|
        to_pop = recursive_tangle f, name, '', initial_chunk, Set[]
        @active_line_directive_template.pop to_pop
      end
    end
  end
end

We allow users to specify where the output files should be placed by overriding the litprog-outdir document attribute. If set, this must be a path relative to the docdir. If unset, the docdir will be used directly. The output directory is created if not present (and if different from the docdir).

Prepare…​
docdir = doc.attributes['docdir']
outdir = doc.attributes['litprog-outdir']
if outdir and not outdir.empty?
  outdir = File.join(docdir, outdir)
  FileUtils.mkdir_p outdir
else
  outdir = docdir
end

Accessing FileUtils introduces a new requirement:

Requires
require 'fileutils'

When tangling a new file, the name provided by the user is considered relative to the (literate programming) output directory:

Convert…​
full_path = File.join(outdir, name)

3.4.1. Output file name mapping

Root chunk names are used as output file names by default, but this behavior can be overridden on a name-by-name case by setting the litprog-file-map document attribute. If not empty, this is a colon-separated list of entries in the chunk_name > file_name form. Whitespace around the file and chunk names is optional and will be stripped. The user is warned if either the chunk or file name is empty, and for any referenced root chunk name that was not found in the file. Identity maps (mapping the root chunk name to itself) are ignored.

Root name…​
root_name_map = {}
doc.attr('litprog-file-map').to_s.split ':' do |entry|
  entry.strip!
  cname, fname = entry.split '>', 2
  cname.strip!
  fname.strip!
  if cname.empty? or fname.empty?
    logger.warn 'empty chunk name in litprog-file-map ignored' if cname.empty?
    logger.warn 'empty file name in litprog-file-map ignored' if fname.empty?
    next
  end
  unless @roots.include? cname
    logger.warn "non-existent chunk #{cname} in litprog-file-map ignored"
    next
  end
  next if cname == fname # nothing to remap
  <<Check for fname uniqueness>>
  root_name_map[cname] = fname
end

We want output file names to be unique, i.e. different both from other file names and from root chunk names. This is to avoid overwriting an output with the other.

i
due to the way this check is done, it’s not possible to swap two chunk names with a A > B : B > A file map.
Check for fname…​
raise ArgumentError, "#{cname} remapped to existing #{fname}" if @roots.include? fname
mapped_already = root_name_map.key fname
raise ArgumentError, "#{cname} remapped to #{fname}, same as #{mapped_already}" if mapped_already

Once the root_name_map hash is constructed, its use is trivial:

Remap…​
name = root_name_map.fetch name, name

3.5. Collecting chunks

3.5.1. New style

AsciiDoc’s syntax allows us to forego special syntax to identify code chunks: we assume that any listing block in the source style is (part of) a single code chunk.

Processing of a single block requires us to identify the chunk type (root or generic) and title, add the title to the known chunk titles (if necessary) and append the block lines to the chunk contents.

Since the default value for missing chunks is an empty Array, we can append the new lines directly using the Array#+= method, without special-casing the case for the first block that defines a chunk.

We also need to check if the new lines reference other chunks, and if so we add the title to the list of known titles, to allow shortened names to be used henceforth. This information can also be used for cross-referencing chunks, in which case the ID of the block is necessary to identify exactly which block in a chunk references another chunk. This block ID is described below.

Processing…​
def add_to_chunk chunk_hash, chunk_title, block_lines, block_id
  @chunk_names.add chunk_title
  chunk_hash[chunk_title] += block_lines

  <<Check for references and prime the chunk names>>
end

We want to be able to reference blocks by the title of the chunk(s) they define, so we generate a chunk-specific ID and assign it to the block if appropriate.

To simplify management, we keep track of the blocks that contribute to each chunk:

Declare…​
@chunk_blocks = Hash.new { |hash, key| hash[key] = [] }

Since a source block contributes to a single chunk, this map would be sufficient to trivially reconstruct the whole chunk contents with origin information. However, since the “legacy” aweb syntax has a more complex many-to-many correspondence between chunks and blocks, we need to separate the two pieces of information.

The chunk-specific block ID is always generated when a block is added to a chunk, but since Asciidoctor does not support having multiple IDs referring to the same block, it is assigned as the block ID only if the block does not already have a user-defined ID. The chunk-specific ID is generated using the method Asciidoctor uses for sections, but prepending _chunk and appending a sequential block_‍N where N is the sequential block number (1-based, computed after appending the current block to the @chunk_blocks). The map between title and block ID is also registered in the document catalog, for use in the weaving process.

Support…​
def add_chunk_block_with_id chunk_title, block
  block_count = @chunk_blocks[chunk_title].append(block).size
  title_for_id = "_chunk_#{chunk_title}_block_#{block_count}"
  new_id = Asciidoctor::Section.generate_id title_for_id, block.document
  # TODO error handling
  block.document.register :refs, [new_id, block]
  block.id = new_id unless block.id
  block.document.catalog[:lit_prog_chunks][chunk_title] << new_id
  return new_id
end
i
since the chunk-specific block ID is only assigned to the block if it doesn’t have an ID already, it should not be used in cross-references directly. An auxiliary function is defined to help remap from the chunk-based ID to the Asciidoctor ID
Support…​
def remap_chunk_block_id doc, chunk_block_id
  return doc.catalog[:refs][chunk_block_id].id
end

To allow document metadata to be used in source blocks (e.g. to share author and version information) we allow the :attributes substitutions (and only those) to be applied to the block lines:

Support…​
def apply_supported_subs block
  if block.subs.include? :attributes
     block.apply_subs block.lines, [:attributes]
  else
     block.lines
  end
end

A source block contributes to a single chunk. This will be a root chunk if the block has an output attribute, or a generic chunk otherwise. The chunk_hash local variable is used to track which of the @root and @chunks collections this block needs to be added to.

Processing…​
def process_source_block block
  chunk_hash = @chunks
  if block.attributes.has_key? 'output'
    <<Handle root chunk>>
  else
    <<Handle generic chunk>>
  end
  <<Track source location information>>
  block_lines = apply_supported_subs block
  block_id = add_chunk_block_with_id chunk_title, block
  add_to_chunk chunk_hash, chunk_title, block_lines, block_id
end

For a root chunk, the chunk_hash must be set to @root, and we take the output block attribute as chunk_title.

Handle root chunk
chunk_hash = @roots
chunk_title = block.attributes['output']
<<Ensure root chunk title is unique>>

Root chunks are unique (we do not append to them), so we need to check that there are no root chunks already defined with the given chunk_title:

Ensure root…​
raise ArgumentError, "Duplicate root chunk for #{chunk_title}" if @roots.has_key?(chunk_title)

For a generic chunk, chunk_hash is left at the default value (@chunks), and the chunk_title is set from the title attribute of the block. We want to use the raw block title for this, which is not exposed by Asciidoctor directly, Because of this, we need to “monkey patch” the block class to provide an appropriate method:

Monkey patch the Block class
module Asciidoctor
  class Block
    def litprog_raw_title
      @title
    end
  end
end

We can use this method to retrieve the raw block title, and if the block title was shortened, we also replace it with the full chunk title, to improve the legibility of the documentation.

Handle generic chunk
# We use the block title (TODO up to the first full stop or colon) as chunk name
title = block.litprog_raw_title
chunk_title = full_title title
block.title = chunk_title if title != chunk_title

Regardless of the chunk type, processing of the block is finished by scanning the lines of the block, to add any referenced chunk name to @chunk_names:

Check for references…​
block_lines.each do |line|
  mentioned, _ = is_chunk_ref line
  if mentioned
    @chunk_names.add mentioned
    add_chunk_ref chunk_title, block_id, mentioned
  end
end

For each block composing a chunk we want to keep track of where it was defined, so that this information can be added to the output file if requested, and also the source language for the block, to control the way the location is output. We do this by pushing the attribute and the source_location metadata of each block into the corresponding chunk Array, right before the corresponding lines:

Track source location…​
chunk_hash[chunk_title].append block.attributes
chunk_hash[chunk_title].append block.source_location

The source_location is only tracked correctly when the sourcemap feature is enable for the document. This must be done at the preprocessing stage, during which we can also set the defaults for our custom attributes:

Enable sourcemap and set default attributes
preprocessor do
  process do |doc, reader|
    doc.sourcemap = true
    <<Set default attributes>>
    nil
  end
end

3.5.2. Legacy aweb compatibility

In aweb, chunk definition is done in anonymous listing blocks (without special attributes or styles). A listing block is assumed to define a chunk if the block begins with a chunk assignment line, i.e. a line that contain only a <<Chunk title>>=, without leading whitespace, and optionally followed by whitespace.

Processing…​
CHUNK_DEF_RX = /^<<(.*)>>=\s*$/
def process_listing_block block
  <<Filter legacy listing block>>
  <<Define listing block processing variables>>
  <<Legacy block processing>>
end

If the block does not begin with a chunk definition, we can bail out early:

Filter legacy listing block
return if block.lines.empty?
return unless block.lines.first.match(CHUNK_DEF_RX)

A single block can define multiple chunks: each definition spans from the line following the assignment line to the end of the block or the next chunk assignment line. We know however that we have at least one chunk (since otherwise the block is skipped):

Define listing block processing variables
chunk_titles = [ full_title($1) ]

Since we can have multiple chunks defined in the same block, we cannot use the block’s source_location directly: we need to track the offset (in lines) where each chunk definition begins from the block source location.

Define listing block…​
block_location = block.source_location
chunk_offset = 0

To group the block lines into chunk definitions, we can leverage Ruby’s Enumerable#slice_when method. A new slice starts when the second line in the pair is a chunk assignment. In this case, the match will give us the chunk title, that we store in chunk_titles, and the block_lines we’re interested in are the lines in the slice, except for the first one (that holds the chunk assignment expression).

Legacy block processing
block.lines.slice_when do |l1, l2|
  l2.match(CHUNK_DEF_RX) and chunk_titles.append(full_title $1)
end.each do |lines|
  chunk_title = chunk_titles.shift
  block_lines = lines.drop 1
  chunk_hash = @chunks
  <<Detect legacy chunk type>>
  <<Track legacy chunk location information>>
  block_id = add_chunk_block_with_id chunk_title, block
  add_to_chunk chunk_hash, chunk_title, block_lines, block_id
end

In aweb, the root chunk is determined by the user from the command line, and by default it is identified by the special chunk title *. Multiple root chunks are supported, but require multiple pass (one per root) to extract. We extend the root chunk auto-detection by assuming that any chunk that does not contain spaces in the title is a root chunk.

Detect legacy chunk type
unless chunk_title.include? " "
  chunk_hash = @roots
  <<Ensure root chunk title is unique>>
end

The actual location of the chunk being processed can be obtained from the block location adding the chunk_offset, plus one to skip the chunk assignment line. After we’ve set the origin for the current chunk lines, we can increment the chunk_offset for the next chunk.

Track legacy chunk location…​
chunk_location = block_location.dup
chunk_location.advance(chunk_offset + 1)
chunk_hash[chunk_title].append(chunk_location)
chunk_offset += lines.size

3.6. Weaving

Since our documents are natively AsciiDoc documents, the literate source itself can be processed by any AsciiDoc processor, even without support for the special syntax that defines chunks. The weaving process in this case is limited to a manipulation of the source blocks to improve the appearance and functionality of chunk references. Additionally, the graph describing chunk inclusion is also output during this phase, if requested.

To support chunk cross-referencing, we manipulate all the blocks associated with a chunk, adding links to the other blocks that define the same chunk, and replacing chunk references with AsciiDoc hyperlinks, in addition to the block title normalization done during the processing.

For each block we will need to know if a block is the last block in the list to determine if it needs a “next” link or not, so we cache the value of the last block index to speed up the check.

Weaving…​
def weave doc
  @chunk_blocks.each do |chunk_title, block_list|
    last_block_index = block_list.size - 1
    block_list.each_with_index do |block, i|
      <<Add chunk navigation links>>
    end
  end
  if doc.attr('litprog-dot-graph')
    <<Output chunk reference graph>>
  end
end

The chunk navigation links are added to the title of the block if there are preceding/following blocks in the same list. We also include a link to the chunk block(s) that include the chunk this block belongs to: for these, we have to remeber that the chunk-specific block ID may not correspond to the actual block ID known to Asciidoctor.

Add chunk nav…​
links = []
# link to previous block in this chunk
links << "xref:\##{block_list[i-1].id}[⮝,role=prev]" if i > 0
# link to next block in this chunk
links << "xref:\##{block_list[i+1].id}[⮟,role=next]" if i != last_block_index
# link to block(s) that include the chunk this block belongs to
if @chunk_backrefs.key? chunk_title
  # uplinks are placed using unshift, so process them in reverse order
  @chunk_backrefs[chunk_title].reverse_each do |inc|
    includer, includer_block_id = inc
    if count_chunk_blocks(doc, includer) > 1
      includer_block_num = includer_block_id.split('_').last
      desc = "Used in: #{includer} [#{includer_block_num}]"
    else
      desc = "Used in: #{includer}"
    end
    # remap from the chunk-specific block ID to the Asciidoctor block ID
    includer_block_id = remap_chunk_block_id doc, includer_block_id
    links.unshift '|' if links.length > 0
    # TODO apparently AsciiDoc(tor) doesn't support anchor titles?
    # links.unshift "xref:\##{includer_block_id}[⏚,role=up,title=\"${desc}\"]"
    desc.gsub!("'",'&apos;')
    links.unshift "+++<a href='\##{includer_block_id}' class='up' title='#{desc}'>⏚</a>+++"
  end
end
if links.length > 0
  # protect against a nil title ---------v
  block.title = (block.litprog_raw_title || '') + ' [.litprog-nav]#' + (links * ' ') + '#'
end

The default style for the navigation links floats them to the end of the line (we fall back to right floating for older user agents), prints them in an upright font, and removes the text underline:

Styling…​
span.litprog-nav {
  float: right;
  float: inline-end;
  font-style: normal;
}
span.litprog-nav a {
  text-decoration: none;
}

3.6.1. Turning chunk references into in-doc references

The final part of the weaving process is to turn chunk references found inside chunks into hyperlinks to the corresponding chunk definition(s). Since in-document the code snippets are handled by the syntax highlighter, to be able to capture and manage the chunk references we need to hook into the syntax highlighting mechanism.

Currently we implement support only for the rouge syntax highlighter, that we extend with a custom derived class, for which we override the lexer and formatter:

Override rouge highlighter
class LitProgRouge < (Asciidoctor::SyntaxHighlighter.for 'rouge')
  register_for 'rouge'

  def create_lexer node, source, lang, opts
    <<Custom lexer>>
  end

  def create_formatter node, source, lang, opts
    <<Custom formatter>>
  end
end

The new lexer overrides whatever lexer would normally be used by Asciidoctor, but extends the step method (used by RegExp lexers in rouge) to look for whole lines that match a chunk and yield a Comment::Special token instead of whatever the original lexer would:

Custom lexer
lexer = super
class << lexer
  def step state, stream
    if state == get_state(:root) or stream.beginning_of_line?
      if stream.scan /((?:^|[\r\n]+)\s*)(<<.*>>)(\s*)$/
        yield_token Text::Whitespace, stream.captures[0]
        yield_token Comment::Special, stream.captures[1]
        yield_token Text::Whitespace, stream.captures[2]
        return true
      end
    end
    super
  end
end
lexer

The custom formatter looks for Comment::Special tokens and turns them into hyperlinks if the comment content matches a chunk reference.

To resolve the chunk references, the formatter needs to query the document catalog, which we make available by creating a new :@litprog_catalog instance variable.

If multiple blocks contribute to a chunk, separate numbered links are created for each block past the first.

i
this formatter only works as expected for HTML output.
❗︎
we overload the span rather than safe_span method, to simplify title matching. Otherwise we would need to unescape the special characters <, >, &, and then re-escape them again when creating the links.
Custom formatter
formatter = super
# make the document catalog accessible to the formatter
formatter.instance_variable_set :@litprog_catalog, node.document.catalog[:lit_prog_chunks]

class << formatter
  include Asciidoctor::Logging
  <<Define function to link to a literate programming chunk>>
  def span tok, val
    special = tok.matches? ::Rouge::Token::Tokens::Comment::Special
    if special
      m = val.match /<<(.*)>>/
      if m
        title = m[1]
        <<Query the document catalog of literary programming chunks>>
        if hits.empty?
          logger.warn "Unresolved chunk reference #{title.inspect} found in special comment while formatting source"
        else
          first, *rest = *hits
          safe_val = "&lt;&lt;" + litprog_link(first, title)
          if rest.length > 0
            safe_val += "<sup> " + rest.each_with_index.map { |hit, index|
              litprog_link(hit, index+2)
            }.join(' ') + "</sup>"
          end
          safe_val += "&gt;&gt;"
          return safe_span tok, safe_val
        end
      end
    end
    super
  end
end
formatter

The function to generate the link is trivial: it simply returns an a HTML element with a litprog-nav class.

Define function to link…​
def litprog_link id, text
  target = '#' + id
  "<a class='litprog-nav' href='#{target}'>#{text}</a>"
end

These are also styled without underline:

Styling…​
a.litprog-nav {
   text-decoration: none;
}

The map between title and link targets is retrieved from the document catalog, and we use an ad-hoc version of the full_title function, because we expect any duplication or missing chunks to have been detected at previous stages. This section of the code also takes care to apply escape_special_html_chars to the title. This takes care of any <, > and & in the text, as the standard rouge HTML formatter would do..

Query the document catalog…​
pfx = title.chomp("...")
if pfx != title
  fulltitle, hits = @litprog_catalog.find { |k, v| k.start_with? pfx }
  fulltitle = fulltitle.gsub("'", '&apos;')
  title = "<abbr title='#{fulltitle}'>#{escape_special_html_chars title}</abbr>"
else
  hits = @litprog_catalog[title]
  title = escape_special_html_chars title
end

3.6.2. Producing the reference graph

If the litprog-dot-graph attribute is set, we produce in the output directory a DOT source, named after the document source, with a .litprog.dot extension. This DOT file describes the inclusion graph between the chunks, output with a left-to-right orientation (included chunks on the left and including chunks on the right).

❗︎
The mechanism is currently very barebone. Several possible improvements that are being considered are presented in the TODO list.
Output chunk reference graph
dotfile = doc.attr('docname') + '.litprog.dot'
dotdir = doc.attr('outdir', '.', 'docdir')
File.open(File.join(dotdir, dotfile), 'w') do |f|
  f.puts %(
digraph {
  rankdir=LR;
  nodesep="1";
  overlap=false;
)

  <<Output DOT connections>>
  <<Output DOT chunks>>

  f.puts '}'
end

The DOT file uses the same symbolic naming convention as the block IDs, but without the block count.

Support…​
def dot_chunk_id doc, chunk_name
  block_id = doc.catalog[:lit_prog_chunks][chunk_name].first
  return block_id.gsub(/_block_\d+$/,'')
end

We use record structures in DOT, to identify chunks composed of multiple blocks. For this, we need to frequently determine how many blocks a chunk is composed of.

Support…​
def count_chunk_blocks doc, chunk_name
  doc.catalog[:lit_prog_chunks][chunk_name].length
end

Chunk names used as labels in the DOT file have to be properly quoted, and we limit their length to avoid the nodes in the graph from getting too long.

The wrap-around is implemented by adding newlines whenever adding a word to a non-empty line would exceed the line length. The “non-empty line” condition is added to allow words longer than the limit to be added.

Support…​
def limit_line_length text, maxlen
  words = text.split ' '
  ret = []
  line = ''
  words.each { |word|
    if line.length > 0 and line.length + word.length > maxlen
      ret.push line
      line = ''
    end
    line += ' ' if line.length > 0
    line += word
  }
  ret.push line
  ret.join("\\n")
end

Quoting actually does more than just quoting: it also adds the record structure for chunks composed by multiple blocks:

Support…​
def quote_for_dot doc, chunk_name
  nblocks = count_chunk_blocks doc, chunk_name
  # start by escaping the name proper
  base = limit_line_length(chunk_name, 33).gsub('["<>|]', '\\\0')
  # add a <chunk> port to the base name
  base = "<chunk> #{base}"
  # add the other ports for multi-block chunks
  if nblocks > 1
    base += "| { " + 1.upto(nblocks).map { |i| "<block_#{i}> #{i}" }.join(' | ') + " }"
  end
  return '"' + base + '"'
end

The connections between the graphs are simply obtained iterating over the chunk references, extracting the ID of both the referencing and referenced chunk, and connecting the primary record of the referenced chunk to the appropriate block record of the referencing chunk.

Output DOT connections
@chunk_backrefs.each { |chunk, refs|
  this_id = dot_chunk_id doc, chunk
  refs.each { |ref, block_id|
    ref_id = dot_chunk_id doc, ref
    port = count_chunk_blocks(doc, ref) == 1 ? "chunk" : block_id.match(/block_\d+$/)[0]
    f.puts "#{this_id}:chunk:e -> #{ref_id}:#{port}:w"
  }
}

Chunk node definitions are output to the DOT file after all connections, with proper quoting, and forcing a monospace font for the root chunks. Since for code chunks we want to output the full (quoted, wrapped) chunk names, we iterate the @chunk_names array.

Output DOT chunks
@chunk_names.each { |chunk|
  chunk_id = dot_chunk_id doc, chunk
  quoted_chunk = quote_for_dot doc, chunk
  fontspec = @roots.key?(chunk) ? ",fontname=\"Monospace\"" : ""
  f.puts "#{chunk_id} [shape=record,label=#{quoted_chunk}#{fontspec}]"
}

3.7. Document processing

The document as a whole is processed simply by processing all the listing blocks, Tangling the output files, and Weaving the documentation, after initializing the catalog of literate programming chunks, that maps titles to chunk IDs.

Processing…​
def process doc
  doc.catalog[:lit_prog_chunks] = Hash.new { |h, k| h[k] = [] }
  doc.find_by context: :listing do |block|
    if block.style == 'source'
      process_source_block block
    else
      process_listing_block block
    end
  end
  tangle doc
  weave doc
  doc
end

3.8. The module

The complete module simply assembles what we’ve seen so far, and registers the extension with Asciidoctor:

The module structure
<<Licensing statement>>

<<Requires>>

<<Override...>>

<<Monkey patch...>>

<<Main class...>>

Asciidoctor::Extensions.register do
  <<Enable sourcemap...>>
  tree_processor LiterateProgrammingTreeProcessor
  docinfo_processor LiterateProgrammingDocinfoProcessor
end

The software is copyright © 2021–2024 by Giuseppe Bilotta, and is made available under the MIT license. See the LICENSE file for further details.

Licensing…​
# Copyright (C) 2021–2024 Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
# This software is licensed under the MIT license. See LICENSE for details

5. Missing features and known issues

5.1. Known issue

Known issues and limitations so far:

  • we need to monkey-patch the Block class to access raw (unconverted) titles

  • conversion of chunk references to hyperlinks during weaving is specific to the syntax highlighter; currently we only support the rogue syntax highlighter

  • other aspects of the weaving process introduce converter-specific output. These include:

    • uplinks to including chunks (due to a limitation in Asciidoctor’s link management that prevents adding a title attribute for tooltips)

    • links to the included chunks within source blocks

    • expansion of abbreviate chunk titles to the full form in source blocks

      For these features, the only supported converter is the HTML converter.

5.2. TODO list

improve chunk title parsing

the block title should only be used up to the first full stop or colon; the biggest problem in implementing this is arguably the ambiguity of the full stop vs ellipsis.

support for the eWEB and nowasp syntax

the nowasp/noweb syntax support in particular will require support for inline chunk reference expansion, escaping of inline <</>> pair as well as start-of-line @ symbols (see the test/noweb-alike.adoc test file); this will probably require some flag to enable/disable (probably a document attribute :litprog-syntax: with possible values aweb and noweb).

lineno configuration
  • ✓ global setting implemented via litprog-line-template document attribute;

  • ✓ per-language overrides (possibly with good defaults);

  • ❏ per-file overrides; this should be doable adding other keys to the @line_directive_template hash.

auto-indent configuration

the preservation of leading whitespace during tangling should be optional (again, globally + per-file / per-language and possibly per-chunk overrides). We see the need in this source file in the graph output code that outputs headers with an indent.

selective writing

in particular, avoid overwriting the destination file if the content would be unchanged; this is important to support large-scale projects where we want to avoid recompiling unchanged modules.

support other kinds of formatters

chunks will not be hyperlinked in syntax highlighters different from rouge presently.

allow swapping file names in litprog-file-map

as pointed out in the relevant note, this is currently not supported due to the way the check for uniqueness is done, but could be supported with a smarter check.

graph output tuning

currently the graph output is very barebone. Several possible improvements include:

  • ❏ customizable header

  • ❏ customizable node width (currently wrapping is hard-coded at 33)

  • ❏ take file mapping into consideration for the root chunks output

  • ✓ output chunks by their (HTML) ID and only quote/limit the label

  • ✓ output multi-block chunks differently from single-block ones (maybe as records?)

  • ❏ add links in the chunks to the source in the HTML documentation