Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boot.janet loop docstring to markdown #507

Closed
wants to merge 2 commits into from
Closed

boot.janet loop docstring to markdown #507

wants to merge 2 commits into from

Conversation

uvtc
Copy link
Contributor

@uvtc uvtc commented Nov 25, 2020

Format loop docstring as markdown, fenced with double-backtick quotes.

Format `loop` docstring as markdown, fenced with double-backtick quotes.
@uvtc
Copy link
Contributor Author

uvtc commented Nov 25, 2020

Is there any interest in formatting docstrings in markdown like that?

Benefits:

  • sites like JanetDocs could possibly render them as lovely html
  • easier for committers to edit
  • clarifies some docstrings, differentiating between keywords and args from the prose surrounding them.
  • possibly even easier and nicer to read in the terminal

Drawbacks:

  • makes docstrings a little longer
  • docstrings could then contain typos that would goof up rendering if converting to html

Concern:

  • I don't know what janet does with indentation of docstrings. I'm hoping that it sees that everything is indented (in this example) 2 spaces, and so strips off the leading 2 spaces from each line before doing anything.
  • I don't know if janet would reformat any docstrings, possibly messing up the markdown formatting.

Note, in the PR commit above, there's two types of lists in there:

  • The first one is a Pandoc-markdown definition list (each definition list item is prefixed with :).
  • The second one is a regular unordered list with hyphens in there to separate the item being discussed with comments on it. This seems ok since each list item is pretty short.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 25, 2020

A couple of examples of other spots where markdown formatting in docstrings would improve readability:

  • Docs for unless:

    Before: Shorthand for (when (not condition) ;body).

    After: Shorthand for (when (not condition) ;body).

  • Docs for let:

    Before: Create a scope and bind values to symbols. Each pair in bindings is
    assigned as if with def, and the body of the let form returns the last
    value.

    After: Create a scope and bind values to symbols. Each pair in bindings is
    assigned as if with def, and the body of the let form returns the last
    value.

For that last one, keywords like if, for, and let, mixed together with prose can
be tricky to read without formatting separating out the keywords. Also, bindings
here is an arg, rather than prose, and so gets marked as code.

Looks better without each of those keywords in backticks.
@uvtc
Copy link
Contributor Author

uvtc commented Nov 25, 2020

Here's that docstring converted to html by Pandoc with zero css styling applied, and also as a pdf.

@bakpakin
Copy link
Member

bakpakin commented Nov 25, 2020

There is some formatting to docstrings already, although it can be a bit hard to read just reading the code. Not really against improving this, but there needs to be some way of formatting this when using the doc macro to view documentation. The current doc macro is able to do line-wrapping for you when printing to the terminal, so the new docstring style should be able to do this to. (doc loop) with the old macro should actual look ok in a terminal. The new docstring looks a bit worse IMO - poor use of screen strange indentation.

The current docstrings formatting rules are very simple - any number of spaces or a single newline is considered a word break so we can wrap on them, but tabs are preserved for indentation. Multiple newlines in a row are also preserved. This is how we can do formatting without any complicated syntax - this lets you do
list and such, albeit a bit cumbersomely. I think tweaking these rules to work with markdown might help. We could probably just do something like textwrap.dedent from python on the docstring, and then only drop spaces as long as they are not the first characters on a line (in which case they would be for indentation). Some care also needs to be taken for defn vs def, as defn prepends so stuff the the docstring that may mess with the dedent process.

So yeah, I'm fine with merging this after we can make the doc macro more intelligently handle markdown - which means a markdown parser (at least a subset of that) in boot.janet. Might just be fixing/adding support for lists - and maybe even optional support for VT100 ansi escape codes for nicer formatting. As for preference, I think the bulleted list is much nicer looking than a definition list.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 25, 2020

Ah, ok, I see. That's why the doc macro currently works when each line is indented by 2 spaces. It's condensing that leading whitespace into a single space, then wrapping.

One thing I like about the current docstrings is that you don't have to make them flush left (at column zero). They're indented (usually by 2 spaces), but that doesn't show up in the terminal window doc output. This makes the code look nice (docstrings are indented the same amount as the code starts below them).

Yes, I think it would be an excellent idea to dedent docstrings --- find out whatever leading space all non-blank lines have in common, and lstrip that (and only that) away. That way, the author can be confident that, say, leading space before a list marker won't be corrupted.

It sounds like it would also be good if the doc macro were changed to not condense down multiple spaces into one space, since indentation with spaces is part of what makes markdown (and any preformatted text in general) easy to read.

So, if you have doc:

  • de-dent,
  • preserve whitespace, and also
  • continue to rewrap long lines to fit the terminal window,

then that would leave any docstring markdown formatting alone and the result should look good in the terminal (as well as having the docstring be suitable for conversion to html via something like Pandoc).

Note, I don't mean to suggest that the doc macro should do more work, nor that it should even touch or know about markdown. Other tools are already really excellent at that. Just have doc de-dent and re-wrap for terminal window width.


Regarding the Pandoc definition list format, while maybe not ideal, IIRC they went through a lot of discussions on how to choose something that:

  • looks ok (doesn't look like too much like markup), and is readable
  • doesn't interfere with existing markdown syntax
  • most markdown implementers could agree on

I think it looks a little suboptimal here because:

  1. the list items are colon-prefixed, so there's a lot of colons, and
  2. the syntax requires a blank line between each definition list item, so it may look kinda spaced-out when you have short definition definitions.

But for list items where the definition takes up multiple lines, it works out pretty well.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 25, 2020

More regarding the definition list syntax, using unordered lists instead for this will make it look too dense. Here's the definition list:

image


and here it is rendered as a bullet-list:

image

@uvtc
Copy link
Contributor Author

uvtc commented Nov 25, 2020

As plain (markdown) text, to make it less dense, you could go with a bullet list separated by blank lines:


  * :iterate -- repeatedly evaluate and bind to the expression while it is truthy.

  * :range -- loop over a range. The object should be a two-element tuple with a start
    and end value, and an optional positive step. The range is half open, [start, end).

  * :range-to -- same as :range, but the range is inclusive [start, end].

  * :down -- loop over a range, stepping downwards. The object should be a two-element tuple
    with a start and (exclusive) end value, and an optional (positive!) step size.

  * :down-to -- same :as down, but the range is inclusive [start, end].

  * :keys -- iterate over the keys in a data structure.

  * :pairs -- iterate over the key-value pairs as tuples in a data structure.

  * :in -- iterate over the values in a data structure.

  * :generate -- iterate over values yielded from a fiber. Can be paired with the generator
    function for the producer/consumer pattern.

but I think the definition list syntax, even though it contains that extra colon, still looks better:

:iterate
  : repeatedly evaluate and bind to the expression while it is truthy.

:range
  : loop over a range. The object should be a two-element tuple with a start
    and end value, and an optional positive step. The range is half open, [start, end).

:range-to
  : same as :range, but the range is inclusive [start, end].

:down
  : loop over a range, stepping downwards. The object should be a two-element tuple
    with a start and (exclusive) end value, and an optional (positive!) step size.

:down-to
  : same :as down, but the range is inclusive [start, end].

:keys
  : iterate over the keys in a data structure.

:pairs
  : iterate over the key-value pairs as tuples in a data structure.

:in
  : iterate over the values in a data structure.

:generate
  : iterate over values yielded from a fiber. Can be paired with the generator
    function for the producer/consumer pattern.

Having those definitions all indented the same amount makes it look more consistent and easier to scan, IMO.

And Pandoc (or other markdowns that support this syntax) can render it nicely as html.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 25, 2020

Another alternative (the best-looking one, IMO) to the definition list, is a table:

Where `binding` is a binding as passed to def, `:verb` is one of a set of keywords,
and `object` is any expression. The available verbs are:

----------  --------------------------------------------------------------------
:iterate    repeatedly evaluate and bind to the expression while it is truthy.

:range      loop over a range. The object should be a two-element tuple with a
            start and end value, and an optional positive step. The range is
            half open, [start, end).

:range-to   same as :range, but the range is inclusive [start, end].

:down       loop over a range, stepping downwards. The object should be a
            two-element tuple with a start and (exclusive) end value, and an
            optional (positive!) step size.

:down-to    same :as down, but the range is inclusive [start, end].

:keys       iterate over the keys in a data structure.

:pairs      iterate over the key-value pairs as tuples in a data structure.

:in         iterate over the values in a data structure.

:generate   iterate over values yielded from a fiber. Can be paired with the
            generator function for the producer/consumer pattern.
----------  --------------------------------------------------------------------

`loop` also accepts conditionals to refine the looping further. Conditionals are
of the form:

From which Pandoc yields what you'd expect (this is styled a bit on my system):

image


Though, any line-wrapping on that for a narrow terminal window would mess it up.

What do you think about setting a hard mininum where no wrapping would happen for lines, say, 80 chars or shorter? That way, if an author wanted to make sure a table (or any docstring line) would remain untouched by line wrap, they could just keep lines to 80 columns or less.

As for comparison, the Python docs (in the terminal) are pretty difficult to read below 80 col width, and so I keep it much wider than that if I plan on reading any of its docs in the terminal.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 25, 2020

What do you think about not doing any wrapping at all in the terminal?

If I have a really wide terminal window, and I open a man page, I don't really appreciate it line-wrapping to that huge width.

I also just noticed that Python doesn't line-wrap its docs in the terminal. I opened a wide window, ran python3, help(str), and the output just stays regular reading width. It only starts to wrap if I resize/narrow the window enough that there's no more room (but I assume that's just the terminal doing its job). Works fine.

Could Janet simply not do any line-wrapping, and just spit out the docstring as-is? That way, if it looks good in the code, it will look good in the terminal, as long as the terminal is at least 80 chars wide.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 25, 2020

^^ That is, spit out the docstring after dedenting it.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 25, 2020

I'm sorry for the high volume of comments here. Hope they're useful.

I just experimented some more with reading Janet docs in the terminal. And regardless of whether I make the terminal window narrow, or quite wide, the docs display at the same reading width. Whatever line-wrapping doc is doing, it seems to be doing it simply to neaten up the docstring. I'd just assumed it was trying to match the width of my terminal window, but it is not at all doing that (unless maybe my terminal window is misconfigured?).

So, it seems even more reasonable now to suggest that the Janet doc system be simplified and just de-dent and spit out docstrings as-is. If a docstring needs formatting, it's easy enough to M-q or reflow text or whatever on the docstring next time you're editing that file.

The only drawback to this I see is if some third-party Janet lib has docs that need formatting help, then they will display poorly in the terminal until tidied up --- which may actually work out as a net positive (more attention to docs).

@sogaiu
Copy link
Contributor

sogaiu commented Nov 26, 2020

I don't know how to thread comments so it's a bit difficult to reply, but here's a bit.

For: #507 (comment) I agree the final result looks a bit better with definition lists, but not enough to be worth using them over ordinary lists. So overall I prefer the first thing you showed in your comment (ordinary list with space between each list item) over the definition list.

@sogaiu
Copy link
Contributor

sogaiu commented Nov 26, 2020

I looked through the PR and the specific "Markdown" constructs I found included:

  • codeblocks (the backtick stuff)
  • paragraphs
  • ordinary lists
  • definition lists

Did I miss any?

(I put "Markdown" in quotes because "definition lists" are not common to all flavors, while the other three were present in the original implementation and possibly exist in most/all flavorts.)

@bakpakin
Copy link
Member

@utvc there is a dynamic binding :doc-width to set the amount of wrapping. (setdyn :doc-width 120) will make documentation wrap at 120 characters - no fancy terminal auto detection here.

As for making rendered output look nice with Pandoc, that should not really be a concern - the docstring format needs to be as universally consumable as possible (hence why we are using dumb strings to start with). Markdown is good because it looks nice even if you just treat it as dumb text.

@bakpakin
Copy link
Member

bakpakin commented Nov 26, 2020

As for not doing line wrapping, the problem is that most docstrings will contain lots of leading white space.

For example:

(defn my-fn
  ``
  Some documentation here.
  ``
  [x]
  (+ x x))

Will end up " Some documentation here.\n " (2 leading and trailing spaces). I guess just printing that out would look ok, but would make markdown parsing a bit strange, hence the desire for dedent. Semantically, it also irks me a bit because that author clearly doesn't actually want that extra indentation in the docstring.

EDIT:
This was the original reasoning for a lot of this functionality in older versions, but things may have changed enough where we can deal with this in a better way now. It might even be worth changing how backtick strings interpret leading whitespace in the parser to avoid this problem in the first place.

@pepe
Copy link
Member

pepe commented Nov 26, 2020

The backtick string dedent sounds very reasonable to me! I can remember dedent in the core and its purge later. Now I think I can understand it much better.

@bakpakin
Copy link
Member

The longstring-autoindent branch was created to try and deal with some of these underlying issues - it contains a change in the parser that makes that gives the author-desired indentation for long strings (but probably breaks spork/fmt) , as well as changes to the (doc-format str) function that tries and preserves formatting a bit better - leading indents are left as is and are not unindented.

I think that has all of the changes that would be needed to get some of this markdown formatting into boot.janet.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 27, 2020

@bakpakin wrote:

As for making rendered output look nice with Pandoc, that should not really be a concern - the docstring format needs to be as universally consumable as possible (hence why we are using dumb strings to start with). Markdown is good because it looks nice even if you just treat it as dumb text.

Right. A core value of markdown is that it's a little more work to write, but it looks good and natural in plain text. It looks like what it means, and doesn't hardly look like markup. So, that makes it excellent for docs viewable in the terminal.

That said, it looks even better when viewed rendered as html. So if you're reading docs outside of the terminal, it's a nice bonus to have them in html (and easy if the docstring is already in markdown).

Another thing is, if you want to include mathematics in your markdown docstring, LaTeX isn't really readable in plain text, but as html Pandoc will use MathJax to get beautiful real math output.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 27, 2020

@bakpakin wrote:

As for not doing line wrapping, the problem is that most docstrings will contain lots of leading white space.

Right. I expect a docstring will be written indented to match the code around it, so I'm expecting that the doc tools will do that smart "textwrap.dedent" behaviour automatically when displaying the docstring.

Another option might be to have a special string prefix literal that indicates the string should be automatically dedented when read in... ex., something like:

(defn ...
  dd``this string
  will automatically
  be dedented.``
  ...)

@uvtc
Copy link
Contributor Author

uvtc commented Nov 27, 2020

@sogaiu wrote:

I looked through the PR and the specific "Markdown" constructs I found included: {snip} Did I miss any?

Not sure how to answer this in just a few words... Markdown of course has two types of markup: inline (span) and block (div).

Inline: *italics* **bold** `code` and some variations on those (like _italics_)

And there's two types of block syntax: side-marked and delimited (or "fenced").

~~~
fenced
code block
~~~

    indented code block
    where the side-mark is 4 spaces

> blockquote is
> side-marked with ">".

And, IMO, another rule of well-written markdown is always indent things a multiple of 4 spaces --- or, more specifically, 4 places (column nums). That is:

Lists:

  * Text of list item starts at 4 places in (list marker counts as a place)

    Second paragraph of first list item. Note the 4 space indent,
    and how everything is lining up vertically in a most delightful
    way.

  * Second list item.

        code block within
        second list item

    Second paragraph of 2nd list item.

      * 8 places in --- this is the start of a new list within the 2nd list item
      * and so on.

  * Third item.

Numbered list (compact):

 1. Again, starting at 4 places in.
     a. foo (indented list, so content starts at 8 places in)
     b. bar
     c. baz
 2. This is two!
 3. Tree.

Definition list follows.

foo
  : Again, content of list time starts at 4 places in.

    Second paragraph
    for definition of foo.

bar
  : Lorem ipsum...

Done.

(Edit: removed mistaken note about blockquotes and 4-space rule.)

Sticking to that 4-space rule is consistent and lets you create lists with multiple paragraphs and nested sublists and not get confused.

So, to answer your question, this particular PR contains some inline code, and also short (single line only) code blocks (indented, not fenced). It also contains some lists (unordered and definition).

@sogaiu
Copy link
Contributor

sogaiu commented Nov 27, 2020

Thanks for spelling things out :)

The background for my comment for looking at the specifics of the PR is that "Markdown" is not a specific enough target -- there is no spec, just an implementation. There are many flavors. "CommonMark" is a specific target.

I would guess that if one can choose a good subset of features that works across many flavors one may get the benefit of more tooling working well.

Another aspect is that the fewer the features are chosen, the less work there is for creation, maintenance, and testing.

Of course one needs a sufficient set of features to accomplish enough :)

Does that make sense?

@uvtc
Copy link
Contributor Author

uvtc commented Nov 27, 2020

Hi @sogaiu . You're welcome!

I'm not sure I understand. It sounds like you may be implying that Janet should/would implement some amount of Markdown. But why? Why not instead just dedent and print out docstrings to the terminal as-is? They already look good --- and would just require some manual minor tweaking (ex, to replace backslash-escaped tabs and newlines with literal ones).

My 2 cents, I expect CommonMark to become the de-facto standard markdown if it isn't already. It comes standard with C and JS reference implementations, with many more available. Note further that one of the principals behind CommonMark is also the author of Pandoc. And, aside, the extensions that Pandoc makes to markdown are generally conservative and very carefully thought out. The thread to add div syntax went on for 6 years! :)

@sogaiu
Copy link
Contributor

sogaiu commented Nov 27, 2020

If it's not necessary for Janet to have any Markdown-ish thing in it, I'm not bothered by that :)

Just reading the discussion, it seemed possible that support for some Markdown(-ish?) constructs in Janet was on the table. May be I misunderstood.

@bakpakin
Copy link
Member

With the merging of #511, I think this is ready to go. That patch contains the changes here so I think we can close this out.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 27, 2020

Thanks. Great to see this! Now I get it: you wanted Janet to know just enough markdown so it can linewrap docstrings that contain some markdown formatting.

@bakpakin bakpakin closed this Nov 28, 2020
@uvtc uvtc deleted the patch-1 branch November 28, 2020 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants