Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list item continuation line versus indented code block versus subsequent list #497

Closed
mity opened this issue Sep 16, 2017 · 4 comments
Closed

Comments

@mity
Copy link

mity commented Sep 16, 2017

(Note: This issue report is about disambiguation of the following snippet within context of current specification 0.28. There is also issue #495 discussing whether the future specification versions should be changed to interpret it as a nested list.)

How should this be interpreted?

100. foo
    * bar

Please note the 4 spaces of indentation in the 2nd line are important for the merit of the question.

Using some common sense I can see two possible interpretations, marked A and B below:

Interpretation A: List item continuation line

<ol start="100">
<li>foo
* bar</li>
</ol>

Rationale: * is not indented enough to start a nested list. And if there would be no * on the 2nd line, it would be continuation line for sure as well. See also http://spec.commonmark.org/0.28/#example-201 which is in some way analogous to this interpretation.

Interpretation B: An indented code block

<ol start="100">
<li>foo</li>
</ol>
<pre><code>* bar
</code></pre>

Rationale: If we do not see the 2nd line as a continuation line, it has to start a new top-level block. Taking the principle of uniformity into account, the 2nd line should be then interpreted the same way as if there is no 1st line at all. And four spaces imply an indented code block.

Interpretation C: Subsequent list

CMark currently interprets it differently, as follows:

<ol start="100">
<li>foo</li>
</ol>
<ul>
<li>bar</li>
</ul>

However I see that interpretation troublesome as it is imho clearly against the principle of uniformity as explained in interpretation B.

@jgm
Copy link
Member

jgm commented Sep 16, 2017

I agree that cmark's current behavior isn't supportable by
the spec. I think the spec is unclear about what should
happen here.

Reasoning in Example 201 is (using periods to represent
spaces):

foo
....- bar

is

just a paragraph, because the - bar is indented too far to start a list, and indented code can't interrupt a paragraph.

So, the result of prepending >. to each line,

>.foo
>.....-bar

is a block quote with a paragraph inside, and so is its lazy abbreviation

>.foo
....-bar

That seems right to me still. Let's try parallel reasoning.

foo
....* bar

is just a paragraph "foo * bar", so

100. foo
.........* bar

is a list item with this paragraph inside. Since * bar is a paragraph continuation line in this paragraph, we can delete some leading whitespace by the laziness rule, without changing the meaning:

100. foo
....* bar

So this argues for an interpretation where * bar is a continuation line. How to modify the parsing algorithm to get this result is another matter.

@mity
Copy link
Author

mity commented Sep 16, 2017

So this argues for an interpretation where * bar is a continuation line.

Thanks for the feedback. It pleases me twice because this is exactly how MD4C sees it right now.

How to modify the parsing algorithm to get this result is another matter.

I don't know Cmark's internals to help in this regard. But I can say I was quite surprised to see that in Cmark, the continuation line works differently in list items and in blockquotes, as this example exhibits. In MD4C, the blockquotes and lists are mostly treated in quite a uniform way on the level of block start/end/nesting recognition.

@aidantwoods
Copy link
Contributor

aidantwoods commented Sep 17, 2017

Perhaps related to this is the following example, with representing a space (all results obtained from the reference JS parser):

>•••••foo
••••* bar
<blockquote>
<pre><code>foo
</code></pre>
</blockquote>
<pre><code>* bar
</code></pre>

The above is inspired by http://spec.commonmark.org/0.28/#example-199, and also the reasoning given in example 201. Where * bar is a code block because it does not interrupt a paragraph (as foo is a code block also).

However, somewhat inconsistently:

100.•••••foo
••••* bar
<ol start="100">
<li>
<pre><code>foo
</code></pre>
</li>
</ol>
<ul>
<li>bar</li>
</ul>

Again there is no paragraph being interrupted, (as foo is indented just far enough to be a code block), however this time * bar is a list.

It may be tempting to look at the following line in the spec to support this from
http://spec.commonmark.org/0.28/#indented-code-blocks

If there is any ambiguity between an interpretation of indentation as a code block and as indicating that material belongs to a list item, the list item interpretation takes precedence

However, since this list item occurs adjacent to (and not nested inside) the ordered list, the unordered list in question has unambiguous indentation.
Furthermore we note that alone,

••••* bar
<pre><code>* bar
</code></pre>

Is indented too far to be a list, leading to the question of why 4 spaces is sufficient only if there is a previous list?

@jgm
Copy link
Member

jgm commented Feb 18, 2018

I think this is related.
http://spec.commonmark.org/0.28/#example-274

1. a

  2. b

    3. c

The interpretation SHOULD be:

<ol>
<li>
<p>a</p>
</li>
<li>
<p>b</p>
</li>
</ol>
<pre><code>3. c
</code></pre>

but the current spec has:

<ol>
<li>
<p>a</p>
</li>
<li>
<p>b</p>
</li>
<li>
<p>c</p>
</li>
</ol>

Here there's not even an issue about paragraph continuations, so this is a simpler case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants