boot.janet `loop` docstring to markdown #507

uvtc · 2020-11-25T14:04:22Z

Format loop docstring as markdown, fenced with double-backtick quotes.

Format `loop` docstring as markdown, fenced with double-backtick quotes.

uvtc · 2020-11-25T14:13:43Z

Is there any interest in formatting docstrings in markdown like that?

Benefits:

sites like JanetDocs could possibly render them as lovely html
easier for committers to edit
clarifies some docstrings, differentiating between keywords and args from the prose surrounding them.
possibly even easier and nicer to read in the terminal

Drawbacks:

makes docstrings a little longer
docstrings could then contain typos that would goof up rendering if converting to html

Concern:

I don't know what janet does with indentation of docstrings. I'm hoping that it sees that everything is indented (in this example) 2 spaces, and so strips off the leading 2 spaces from each line before doing anything.
I don't know if janet would reformat any docstrings, possibly messing up the markdown formatting.

Note, in the PR commit above, there's two types of lists in there:

The first one is a Pandoc-markdown definition list (each definition list item is prefixed with :).
The second one is a regular unordered list with hyphens in there to separate the item being discussed with comments on it. This seems ok since each list item is pretty short.

uvtc · 2020-11-25T14:29:56Z

A couple of examples of other spots where markdown formatting in docstrings would improve readability:

Docs for unless:

Before: Shorthand for (when (not condition) ;body).

After: Shorthand for (when (not condition) ;body).
Docs for let:

Before: Create a scope and bind values to symbols. Each pair in bindings is
assigned as if with def, and the body of the let form returns the last
value.

After: Create a scope and bind values to symbols. Each pair in bindings is
assigned as if with def, and the body of the let form returns the last
value.

For that last one, keywords like if, for, and let, mixed together with prose can
be tricky to read without formatting separating out the keywords. Also, bindings
here is an arg, rather than prose, and so gets marked as code.

Looks better without each of those keywords in backticks.

uvtc · 2020-11-25T14:56:24Z

Here's that docstring converted to html by Pandoc with zero css styling applied, and also as a pdf.

bakpakin · 2020-11-25T15:08:50Z

There is some formatting to docstrings already, although it can be a bit hard to read just reading the code. Not really against improving this, but there needs to be some way of formatting this when using the doc macro to view documentation. The current doc macro is able to do line-wrapping for you when printing to the terminal, so the new docstring style should be able to do this to. (doc loop) with the old macro should actual look ok in a terminal. The new docstring looks a bit worse IMO - poor use of screen strange indentation.

The current docstrings formatting rules are very simple - any number of spaces or a single newline is considered a word break so we can wrap on them, but tabs are preserved for indentation. Multiple newlines in a row are also preserved. This is how we can do formatting without any complicated syntax - this lets you do
list and such, albeit a bit cumbersomely. I think tweaking these rules to work with markdown might help. We could probably just do something like textwrap.dedent from python on the docstring, and then only drop spaces as long as they are not the first characters on a line (in which case they would be for indentation). Some care also needs to be taken for defn vs def, as defn prepends so stuff the the docstring that may mess with the dedent process.

So yeah, I'm fine with merging this after we can make the doc macro more intelligently handle markdown - which means a markdown parser (at least a subset of that) in boot.janet. Might just be fixing/adding support for lists - and maybe even optional support for VT100 ansi escape codes for nicer formatting. As for preference, I think the bulleted list is much nicer looking than a definition list.

uvtc · 2020-11-25T17:37:01Z

Ah, ok, I see. That's why the doc macro currently works when each line is indented by 2 spaces. It's condensing that leading whitespace into a single space, then wrapping.

One thing I like about the current docstrings is that you don't have to make them flush left (at column zero). They're indented (usually by 2 spaces), but that doesn't show up in the terminal window doc output. This makes the code look nice (docstrings are indented the same amount as the code starts below them).

Yes, I think it would be an excellent idea to dedent docstrings --- find out whatever leading space all non-blank lines have in common, and lstrip that (and only that) away. That way, the author can be confident that, say, leading space before a list marker won't be corrupted.

It sounds like it would also be good if the doc macro were changed to not condense down multiple spaces into one space, since indentation with spaces is part of what makes markdown (and any preformatted text in general) easy to read.

So, if you have doc:

de-dent,
preserve whitespace, and also
continue to rewrap long lines to fit the terminal window,

then that would leave any docstring markdown formatting alone and the result should look good in the terminal (as well as having the docstring be suitable for conversion to html via something like Pandoc).

Note, I don't mean to suggest that the doc macro should do more work, nor that it should even touch or know about markdown. Other tools are already really excellent at that. Just have doc de-dent and re-wrap for terminal window width.

Regarding the Pandoc definition list format, while maybe not ideal, IIRC they went through a lot of discussions on how to choose something that:

looks ok (doesn't look like too much like markup), and is readable
doesn't interfere with existing markdown syntax
most markdown implementers could agree on

I think it looks a little suboptimal here because:

the list items are colon-prefixed, so there's a lot of colons, and
the syntax requires a blank line between each definition list item, so it may look kinda spaced-out when you have short definition definitions.

But for list items where the definition takes up multiple lines, it works out pretty well.

uvtc · 2020-11-25T17:58:34Z

More regarding the definition list syntax, using unordered lists instead for this will make it look too dense. Here's the definition list:

and here it is rendered as a bullet-list:

uvtc · 2020-11-25T18:06:56Z

As plain (markdown) text, to make it less dense, you could go with a bullet list separated by blank lines:


  * :iterate -- repeatedly evaluate and bind to the expression while it is truthy.

  * :range -- loop over a range. The object should be a two-element tuple with a start
    and end value, and an optional positive step. The range is half open, [start, end).

  * :range-to -- same as :range, but the range is inclusive [start, end].

  * :down -- loop over a range, stepping downwards. The object should be a two-element tuple
    with a start and (exclusive) end value, and an optional (positive!) step size.

  * :down-to -- same :as down, but the range is inclusive [start, end].

  * :keys -- iterate over the keys in a data structure.

  * :pairs -- iterate over the key-value pairs as tuples in a data structure.

  * :in -- iterate over the values in a data structure.

  * :generate -- iterate over values yielded from a fiber. Can be paired with the generator
    function for the producer/consumer pattern.

but I think the definition list syntax, even though it contains that extra colon, still looks better:

:iterate
  : repeatedly evaluate and bind to the expression while it is truthy.

:range
  : loop over a range. The object should be a two-element tuple with a start
    and end value, and an optional positive step. The range is half open, [start, end).

:range-to
  : same as :range, but the range is inclusive [start, end].

:down
  : loop over a range, stepping downwards. The object should be a two-element tuple
    with a start and (exclusive) end value, and an optional (positive!) step size.

:down-to
  : same :as down, but the range is inclusive [start, end].

:keys
  : iterate over the keys in a data structure.

:pairs
  : iterate over the key-value pairs as tuples in a data structure.

:in
  : iterate over the values in a data structure.

:generate
  : iterate over values yielded from a fiber. Can be paired with the generator
    function for the producer/consumer pattern.

Having those definitions all indented the same amount makes it look more consistent and easier to scan, IMO.

And Pandoc (or other markdowns that support this syntax) can render it nicely as html.

uvtc · 2020-11-25T19:10:34Z

Another alternative (the best-looking one, IMO) to the definition list, is a table:

Where `binding` is a binding as passed to def, `:verb` is one of a set of keywords,
and `object` is any expression. The available verbs are:

----------  --------------------------------------------------------------------
:iterate    repeatedly evaluate and bind to the expression while it is truthy.

:range      loop over a range. The object should be a two-element tuple with a
            start and end value, and an optional positive step. The range is
            half open, [start, end).

:range-to   same as :range, but the range is inclusive [start, end].

:down       loop over a range, stepping downwards. The object should be a
            two-element tuple with a start and (exclusive) end value, and an
            optional (positive!) step size.

:down-to    same :as down, but the range is inclusive [start, end].

:keys       iterate over the keys in a data structure.

:pairs      iterate over the key-value pairs as tuples in a data structure.

:in         iterate over the values in a data structure.

:generate   iterate over values yielded from a fiber. Can be paired with the
            generator function for the producer/consumer pattern.
----------  --------------------------------------------------------------------

`loop` also accepts conditionals to refine the looping further. Conditionals are
of the form:

From which Pandoc yields what you'd expect (this is styled a bit on my system):

Though, any line-wrapping on that for a narrow terminal window would mess it up.

What do you think about setting a hard mininum where no wrapping would happen for lines, say, 80 chars or shorter? That way, if an author wanted to make sure a table (or any docstring line) would remain untouched by line wrap, they could just keep lines to 80 columns or less.

As for comparison, the Python docs (in the terminal) are pretty difficult to read below 80 col width, and so I keep it much wider than that if I plan on reading any of its docs in the terminal.

uvtc · 2020-11-25T21:25:20Z

What do you think about not doing any wrapping at all in the terminal?

If I have a really wide terminal window, and I open a man page, I don't really appreciate it line-wrapping to that huge width.

I also just noticed that Python doesn't line-wrap its docs in the terminal. I opened a wide window, ran python3, help(str), and the output just stays regular reading width. It only starts to wrap if I resize/narrow the window enough that there's no more room (but I assume that's just the terminal doing its job). Works fine.

Could Janet simply not do any line-wrapping, and just spit out the docstring as-is? That way, if it looks good in the code, it will look good in the terminal, as long as the terminal is at least 80 chars wide.

uvtc · 2020-11-25T21:26:36Z

^^ That is, spit out the docstring after dedenting it.

uvtc · 2020-11-25T22:02:18Z

I'm sorry for the high volume of comments here. Hope they're useful.

I just experimented some more with reading Janet docs in the terminal. And regardless of whether I make the terminal window narrow, or quite wide, the docs display at the same reading width. Whatever line-wrapping doc is doing, it seems to be doing it simply to neaten up the docstring. I'd just assumed it was trying to match the width of my terminal window, but it is not at all doing that (unless maybe my terminal window is misconfigured?).

So, it seems even more reasonable now to suggest that the Janet doc system be simplified and just de-dent and spit out docstrings as-is. If a docstring needs formatting, it's easy enough to M-q or reflow text or whatever on the docstring next time you're editing that file.

The only drawback to this I see is if some third-party Janet lib has docs that need formatting help, then they will display poorly in the terminal until tidied up --- which may actually work out as a net positive (more attention to docs).

sogaiu · 2020-11-26T04:19:45Z

I don't know how to thread comments so it's a bit difficult to reply, but here's a bit.

For: #507 (comment) I agree the final result looks a bit better with definition lists, but not enough to be worth using them over ordinary lists. So overall I prefer the first thing you showed in your comment (ordinary list with space between each list item) over the definition list.

sogaiu · 2020-11-26T04:34:12Z

I looked through the PR and the specific "Markdown" constructs I found included:

codeblocks (the backtick stuff)
paragraphs
ordinary lists
definition lists

Did I miss any?

(I put "Markdown" in quotes because "definition lists" are not common to all flavors, while the other three were present in the original implementation and possibly exist in most/all flavorts.)

bakpakin · 2020-11-26T15:51:46Z

@utvc there is a dynamic binding :doc-width to set the amount of wrapping. (setdyn :doc-width 120) will make documentation wrap at 120 characters - no fancy terminal auto detection here.

As for making rendered output look nice with Pandoc, that should not really be a concern - the docstring format needs to be as universally consumable as possible (hence why we are using dumb strings to start with). Markdown is good because it looks nice even if you just treat it as dumb text.

bakpakin · 2020-11-26T16:01:43Z

As for not doing line wrapping, the problem is that most docstrings will contain lots of leading white space.

For example:

(defn my-fn
  ``
  Some documentation here.
  ``
  [x]
  (+ x x))

Will end up " Some documentation here.\n " (2 leading and trailing spaces). I guess just printing that out would look ok, but would make markdown parsing a bit strange, hence the desire for dedent. Semantically, it also irks me a bit because that author clearly doesn't actually want that extra indentation in the docstring.

EDIT:
This was the original reasoning for a lot of this functionality in older versions, but things may have changed enough where we can deal with this in a better way now. It might even be worth changing how backtick strings interpret leading whitespace in the parser to avoid this problem in the first place.

pepe · 2020-11-26T19:13:05Z

The backtick string dedent sounds very reasonable to me! I can remember dedent in the core and its purge later. Now I think I can understand it much better.

bakpakin · 2020-11-27T01:13:06Z

The longstring-autoindent branch was created to try and deal with some of these underlying issues - it contains a change in the parser that makes that gives the author-desired indentation for long strings (but probably breaks spork/fmt) , as well as changes to the (doc-format str) function that tries and preserves formatting a bit better - leading indents are left as is and are not unindented.

I think that has all of the changes that would be needed to get some of this markdown formatting into boot.janet.

uvtc · 2020-11-27T01:19:59Z

@bakpakin wrote:

As for making rendered output look nice with Pandoc, that should not really be a concern - the docstring format needs to be as universally consumable as possible (hence why we are using dumb strings to start with). Markdown is good because it looks nice even if you just treat it as dumb text.

Right. A core value of markdown is that it's a little more work to write, but it looks good and natural in plain text. It looks like what it means, and doesn't hardly look like markup. So, that makes it excellent for docs viewable in the terminal.

That said, it looks even better when viewed rendered as html. So if you're reading docs outside of the terminal, it's a nice bonus to have them in html (and easy if the docstring is already in markdown).

Another thing is, if you want to include mathematics in your markdown docstring, LaTeX isn't really readable in plain text, but as html Pandoc will use MathJax to get beautiful real math output.

uvtc · 2020-11-27T01:32:51Z

@bakpakin wrote:

As for not doing line wrapping, the problem is that most docstrings will contain lots of leading white space.

Right. I expect a docstring will be written indented to match the code around it, so I'm expecting that the doc tools will do that smart "textwrap.dedent" behaviour automatically when displaying the docstring.

Another option might be to have a special string prefix literal that indicates the string should be automatically dedented when read in... ex., something like:

(defn ...
  dd``this string
  will automatically
  be dedented.``
  ...)

uvtc · 2020-11-27T02:15:57Z

@sogaiu wrote:

I looked through the PR and the specific "Markdown" constructs I found included: {snip} Did I miss any?

Not sure how to answer this in just a few words... Markdown of course has two types of markup: inline (span) and block (div).

Inline: *italics* **bold** `code` and some variations on those (like _italics_)

And there's two types of block syntax: side-marked and delimited (or "fenced").

~~~
fenced
code block
~~~

    indented code block
    where the side-mark is 4 spaces

> blockquote is
> side-marked with ">".

And, IMO, another rule of well-written markdown is always indent things a multiple of 4 spaces --- or, more specifically, 4 places (column nums). That is:

Lists:

  * Text of list item starts at 4 places in (list marker counts as a place)

    Second paragraph of first list item. Note the 4 space indent,
    and how everything is lining up vertically in a most delightful
    way.

  * Second list item.

        code block within
        second list item

    Second paragraph of 2nd list item.

      * 8 places in --- this is the start of a new list within the 2nd list item
      * and so on.

  * Third item.

Numbered list (compact):

 1. Again, starting at 4 places in.
     a. foo (indented list, so content starts at 8 places in)
     b. bar
     c. baz
 2. This is two!
 3. Tree.

Definition list follows.

foo
  : Again, content of list time starts at 4 places in.

    Second paragraph
    for definition of foo.

bar
  : Lorem ipsum...

Done.

(Edit: removed mistaken note about blockquotes and 4-space rule.)

Sticking to that 4-space rule is consistent and lets you create lists with multiple paragraphs and nested sublists and not get confused.

So, to answer your question, this particular PR contains some inline code, and also short (single line only) code blocks (indented, not fenced). It also contains some lists (unordered and definition).

sogaiu · 2020-11-27T02:28:25Z

Thanks for spelling things out :)

The background for my comment for looking at the specifics of the PR is that "Markdown" is not a specific enough target -- there is no spec, just an implementation. There are many flavors. "CommonMark" is a specific target.

I would guess that if one can choose a good subset of features that works across many flavors one may get the benefit of more tooling working well.

Another aspect is that the fewer the features are chosen, the less work there is for creation, maintenance, and testing.

Of course one needs a sufficient set of features to accomplish enough :)

Does that make sense?

uvtc · 2020-11-27T02:51:55Z

Hi @sogaiu . You're welcome!

I'm not sure I understand. It sounds like you may be implying that Janet should/would implement some amount of Markdown. But why? Why not instead just dedent and print out docstrings to the terminal as-is? They already look good --- and would just require some manual minor tweaking (ex, to replace backslash-escaped tabs and newlines with literal ones).

My 2 cents, I expect CommonMark to become the de-facto standard markdown if it isn't already. It comes standard with C and JS reference implementations, with many more available. Note further that one of the principals behind CommonMark is also the author of Pandoc. And, aside, the extensions that Pandoc makes to markdown are generally conservative and very carefully thought out. The thread to add div syntax went on for 6 years! :)

sogaiu · 2020-11-27T04:14:48Z

If it's not necessary for Janet to have any Markdown-ish thing in it, I'm not bothered by that :)

Just reading the discussion, it seemed possible that support for some Markdown(-ish?) constructs in Janet was on the table. May be I misunderstood.

bakpakin · 2020-11-27T18:28:18Z

With the merging of #511, I think this is ready to go. That patch contains the changes here so I think we can close this out.

uvtc · 2020-11-27T19:47:04Z

Thanks. Great to see this! Now I get it: you wanted Janet to know just enough markdown so it can linewrap docstrings that contain some markdown formatting.

boot.janet loop docstring to markdown

201527d

Format `loop` docstring as markdown, fenced with double-backtick quotes.

Update boot.janet

0e409cc

Looks better without each of those keywords in backticks.

pyrmont mentioned this pull request Nov 27, 2020

Add formatter for Markdown-formatted docstrings #511

Merged

bakpakin closed this Nov 28, 2020

uvtc deleted the patch-1 branch November 28, 2020 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

boot.janet `loop` docstring to markdown #507

boot.janet `loop` docstring to markdown #507

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

bakpakin commented Nov 25, 2020 •

edited

Loading

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020 •

edited

Loading

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

sogaiu commented Nov 26, 2020

sogaiu commented Nov 26, 2020 •

edited

Loading

bakpakin commented Nov 26, 2020

bakpakin commented Nov 26, 2020 •

edited

Loading

pepe commented Nov 26, 2020

bakpakin commented Nov 27, 2020

uvtc commented Nov 27, 2020

uvtc commented Nov 27, 2020

uvtc commented Nov 27, 2020 •

edited

Loading

sogaiu commented Nov 27, 2020

uvtc commented Nov 27, 2020

sogaiu commented Nov 27, 2020

bakpakin commented Nov 27, 2020

uvtc commented Nov 27, 2020 •

edited

Loading

boot.janet loop docstring to markdown #507

boot.janet loop docstring to markdown #507

Conversation

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

bakpakin commented Nov 25, 2020 • edited Loading

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020 • edited Loading

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

uvtc commented Nov 25, 2020

sogaiu commented Nov 26, 2020

sogaiu commented Nov 26, 2020 • edited Loading

bakpakin commented Nov 26, 2020

bakpakin commented Nov 26, 2020 • edited Loading

pepe commented Nov 26, 2020

bakpakin commented Nov 27, 2020

uvtc commented Nov 27, 2020

uvtc commented Nov 27, 2020

uvtc commented Nov 27, 2020 • edited Loading

sogaiu commented Nov 27, 2020

uvtc commented Nov 27, 2020

sogaiu commented Nov 27, 2020

bakpakin commented Nov 27, 2020

uvtc commented Nov 27, 2020 • edited Loading

boot.janet `loop` docstring to markdown #507

boot.janet `loop` docstring to markdown #507

bakpakin commented Nov 25, 2020 •

edited

Loading

uvtc commented Nov 25, 2020 •

edited

Loading

sogaiu commented Nov 26, 2020 •

edited

Loading

bakpakin commented Nov 26, 2020 •

edited

Loading

uvtc commented Nov 27, 2020 •

edited

Loading

uvtc commented Nov 27, 2020 •

edited

Loading