Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amsthm support #1608

Closed
kasperpeulen opened this issue Sep 6, 2014 · 36 comments
Closed

amsthm support #1608

kasperpeulen opened this issue Sep 6, 2014 · 36 comments

Comments

@kasperpeulen
Copy link

Is there any chance that pandoc will ever support amsthm package ?
So that I can convert my mathematical articles to html.

@mpickering
Copy link
Collaborator

What kind of support do you envisage?

@timtylin
Copy link
Contributor

timtylin commented Sep 7, 2014

The precursor to this will be some sort of delimiter for formatted blocks of text. Unfortunately, this does not exist for now in any major flavors of Markdown aside from the use of HTML div blocks.

@timtylin
Copy link
Contributor

timtylin commented Sep 7, 2014

(You can obviously insert begin/end theorem statements directly as latex commands do get theorem environments)

@kasperpeulen
Copy link
Author

As html output I would suggest something lime this: http://drz.ac/2013/01/17/latex-theorem-like-environments-for-the-web/

I thought that pandoc doesn't support latex environments, so that would probably be something that is necessary to implement amsthm.

@kasperpeulen
Copy link
Author

I also noted that there is written some very basic version of amsthm:
https://github.com/jgm/pandocfilters/blob/master/examples/theorem.py

@mpickering
Copy link
Collaborator

I'm still very confused about what precise feature you are suggesting. This is what I'm guessing -

  • LaTeX Reader recognises a amsthm environments
  • HTML Writer does something special with theorem environments

Is this correct?

@kasperpeulen
Copy link
Author

@mpickering Yeah, that is correct.

I'm using the tool authorea. Which uses pandoc to convert the latex I write, to html. For example:
https://www.authorea.com/users/9325/articles/9455/_show_article

So, what I would like, is to have the amsthm environments be recognized by pandoc, and have pandoc converted it to html which looks as close as possible to the pdf output.

The amsthm package, doesn't has predefined theorem environments if I recall correctly, but you have to define them yourself, so I think it would be good if pandoc works the same.

\usepackage{amsthm}
\newtheorem{thm}{Theorem}  
\newtheorem{lem}{Lemma}

And then you can create a theorem or lemma like:

\begin{thm}Here is a theorem
\end{thm}
\begin{lem}Here is a lemma.
\end{lem}

Well, amsthm, gives those theorems a specific style, so I would like if the HTML writer would do similar styling to those blocks. Something like:

.theorem {
    display: block;
    margin: 12px 0;
    font-style: italic;
}
.theorem:before {
    content: "Theorem [theorem count].";
    font-weight: bold;
    font-style: normal;
}

@j2kun
Copy link

j2kun commented Sep 21, 2015

This is a sorely needed feature. Literally every math paper I have ever read uses amsthm, and I cannot convert any paper without it.

@ickc
Copy link
Contributor

ickc commented Apr 21, 2016

I'm interested in such feature as well.

A Working Demo of Using amsthm in pandoc markdown to HTML and LaTeX generations

I opened a repository in ickc/pandoc-amsthm: provide amsthm environments in pandoc with valid output in LaTeX and HTML.

It provides CSS, LaTeX file, and some pandoc filters to target HTML and LaTeX output, using native pandoc divs with amsthm environments defined through the class of the divs. Supports for 13 environments is provided. CSS is unfinished though. See https://ickc.github.io/pandoc-amsthm/index.pdf for demo.

For example, in the markdown, the following will define a theorem:

<div class="theorem">
...
</div>

It's only a proof of concept (and used personally). I am thinking about using YAML front matter together with pandoc native divs instead to create general amsthm environments. But there are some obstacles.

Generalizing it by defining the amsthm environments through YAML front matter?

For example, looking at my TeX file:

\usepackage{amsthm}

\theoremstyle{plain} % default
\newtheorem{theorem}{Theorem}[chapter]
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem*{corollary}{Corollary}

\theoremstyle{definition}
\newtheorem{definition}{Definition}[chapter]
\newtheorem{conjecture}{Conjecture}[chapter]
\newtheorem{example}{Example}[chapter]
\newtheorem{postulate}{Postulate}[chapter]
\newtheorem{problem}{Problem}[chapter]

\theoremstyle{remark}
\newtheorem*{remark}{Remark}
\newtheorem*{note}{Note}
\newtheorem{case}{Case}

% proof is predefined, see documentation

We could define YAML variables like:

amsthm: true
amsthmplain:    [theorem, lemma, propostion, corollary]
amsthmdef:  [definition,conjecture,example,postulate,problem]
amsthmremark:   [remark,note,case]

It is very easy to use a for loop in the templates to create a in line style sheet in HTML template, and create those definitions in LaTeX template.

But the problem is how to decide the optional argument to use through YAML front matter. (Nested array?)

Another problem is how to write a filter which use the information in the metadata to decide which divs are converted into amsthm environments. I'm inexperienced in writing pandoc filter so I'm not sure if it can be done. If the 3 arrays amsthmplain, amsthmdef and amsthmremark can be extracted in the filter, it should be quite easy to look for the divs with such classes.

Another possible way is to provide a general mapping from a divs to latex environment. For example,

<div class="theorem" latex="true">
...
</div>

Then a filter is wrote to look for any divs that has latex="true" and turn it into a \begin{theorem}...\end{theorem} pairs. (I wish such kind of universal switch can make into the official pandoc too.)

Summary

The current approach is totally manual, each environments are defined one by one, and in all 3 formats: .css, .tex, .py (although a script can automate it a bit further then I did already). Are there any interests to make this a general amsthm usage? An if there's such general way of working with amsthm, will @jgm be interested in making it into the official pandoc?

@ickc
Copy link
Contributor

ickc commented Apr 21, 2016

I made a filter in pandocfilters/latexdivs.py at master · ickc/pandocfilters that do what I described above:

Pandoc filter to convert divs with latex="true" to LaTeX
environments in LaTeX output. The first class
will be regarded as the name of the latex environment
e.g.
<div latex="true" class="theorem etc">...</div>
will becomes
\begin{theorem}...\end{theorem}

I wish this syntax can make into the official pandoc so that a filter is not needed.

Anyway, this filter can be used instead to target amsthm in LaTeX and HTML output.

The filters I mentioned previously will do something in the HTML output as well. But I think a better practice is to leave the native div as native div and use CSS and CSS counter to do the tricks. The CSS I used above did that, without the CSS counter yet.

@amacfie
Copy link

amacfie commented Apr 21, 2016

@ickc That looks really useful. Is there support for environment numbering?

@ickc
Copy link
Contributor

ickc commented Apr 21, 2016

@amacfie

Yes, I just updated the css to have basic numbering by css counter. And in the .tex I defined, there are numbering.

@ickc
Copy link
Contributor

ickc commented Apr 22, 2016

Hi, guys, I rewrote my pandoc amsthm package and see if any of you are interested. It is again at ickc/pandoc-amsthm: provide amsthm environments in pandoc with valid output in LaTeX and HTML.

It provides a general way (with some limitations) to setup the use of amsthm through the YAML front matter. By some templates and a general filter to turn div into latex environment, it can output to HTML and LaTeX.

You can check out the example in

@j2kun
Copy link

j2kun commented Apr 22, 2016

Can you include an example with non-offset text included in environment? I think most theorem/conjectures are mostly text.

@ickc
Copy link
Contributor

ickc commented Apr 22, 2016

Done. Clicks the links above again. There are some typographical difference and numbering difference, because the HTML output are by CSS and a simple CSS counter.

Another pandoc + amsthm-like fusion

There's another amsthm-like pandoc filter by @chdemko:

I've also implemented a kind of automatic numbering filter similar to theorem-like environment https://pypi.python.org/pypi/pandoc-numbering/

His approach guarantee the output are the same across different output, but it doesn't use amsthm package in LaTeX output.

Mine is primarily for LaTeX output, using amsthm and defining them in the YAML, but the HTML are more basic and only increment by itself (I think it is possible to reproduce the same numbering as in LaTeX though, it just have to add more counters, which makes the CSS more complicated). So it is like writing primarily for LaTeX output but optionally provide a way to share it through HTML while showing "sensible" output. If a lot of nesting is used, probably one should improve the CSS counters.

Manual & versatile vs Automated & more restrictive?

In these 2 days I tried both approaches, but I'm still undecided between having

  1. a manual CSS and TeX (that defines the amsthm in the preamble); or
  2. having templates for both HTML and LaTeX, and generate those CSS and TeX in the preamble by YAML variables.

The former manual one is less restrictive, but more manual (although one might set once and forget it), the latter is convenient if different kinds of amsthm environments are often needed depending on the cases, but more restrictive on the optional argument used (e.g. in the YAML approach, the 1st optional argument is omitted, otherwise the YAML will becomes very complicated).

Which one do you guys prefer?

@j2kun
Copy link

j2kun commented Apr 22, 2016

In general, I'd favor more manual customizability but with sensible defaults.

@ickc
Copy link
Contributor

ickc commented Apr 22, 2016

Approach 1: amsthm environment definition

The manual TeX file I inserted:

\usepackage{amsthm}

\theoremstyle{plain}
\newtheorem{theorem}{Theorem}[chapter]
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem*{corollary}{Corollary}

\theoremstyle{definition}
\newtheorem{definition}{Definition}[chapter]
\newtheorem{conjecture}{Conjecture}[chapter]
\newtheorem{example}{Example}[chapter]
\newtheorem{postulate}{Postulate}[chapter]
\newtheorem{problem}{Problem}[chapter]

\theoremstyle{remark}
\newtheorem*{remark}{Remark}
\newtheorem*{note}{Note}
\newtheorem{case}{Case}

Approach 2: amsthm defined by YAML:

amsthm: true
amsthm-plain:   Theorem
amsthm-plain-unnumbered:    [Lemma, Proposition, Corollary]
amsthm-def: [Definition,Conjecture,Example,Postulate,Problem]
amsthm-def-unnumbered:  []
amsthm-remark:  [Case]
amsthm-remark-unnumbered:   [Remark,Note]
amsthm-parentcounter:   chapter

will define the following (by template):

\usepackage{amsthm}

\theoremstyle{plain}
\newtheorem{Theorem}{Theorem}[chapter]
\newtheorem*{Lemma}{Lemma}
\newtheorem*{Proposition}{Proposition}
\newtheorem*{Corollary}{Corollary}

\theoremstyle{definition}
\newtheorem{Definition}{Definition}[chapter]
\newtheorem{Conjecture}{Conjecture}[chapter]
\newtheorem{Example}{Example}[chapter]
\newtheorem{Postulate}{Postulate}[chapter]
\newtheorem{Problem}{Problem}[chapter]

\theoremstyle{remark}
\newtheorem{Case}{Case}[chapter]
\newtheorem*{Remark}{Remark}
\newtheorem*{Note}{Note}

Basically only the first optional argument is missing. If that is included the YAML syntax could becomes much more complicated.

@j2kun is the YAML definition good enough?

Motivation of the 2nd approach through YAML

And I forgot to explain the primary motivation I wrote the YAML approach: This approach has a better chance to go into the official pandoc. The only new syntax defined is the YAML variables, and the syntax that defines native pandoc Divs as a LaTeX environment like <div class="Theorem" latex="true">. It is then up to @jgm to deicide if such kind of syntax (not necessarily the same but of a similar idea) can make into the official pandoc.

Even if it cannot make into an official pandoc syntax, since the syntax is simple and general, and it let the templates, filter to do the heavy lifting, the document is then future proof in the sense that improvements on the templates, filters, even pandoc syntaxes, can improve the document output without changing the document. Another benefit is something like <div class="Theorem" latex="true" docx="true">, then someone can wrote a filter and a template to target docx output.

But if it can make into an official pandoc syntax, then its even more powerful. e.g. one can improve the LaTeX reader to recognize amsmath environments defined, and corresponding wrote the YAML front matter in Markdown output and define the corresponding native pandoc divs in the main content.

In contrast, the manual approach has none of these benefits. The only benefits is more freedom on the definition of amsthm like using first optional argument.

@j2kun
Copy link

j2kun commented Apr 22, 2016

@ickc I think the YAML is good enough.

@ickc
Copy link
Contributor

ickc commented Apr 23, 2016

Hi, I updated the YAML front matter definition to this:

---
amsthm:
  plain:    
    numbered:   Theorem
    unnumbered: [Lemma, Proposition, Corollary]
  definition:   
    numbered:   [Definition,Conjecture,Example,Postulate,Problem]
    unnumbered: []
  remark:   
    numbered:   [Case]
    unnumbered: [Remark,Note]
  parentcounter:    chapter
---

Compared to the old one, this one has a better syntax since it is nested (and do not need to say amsthm: true). It's again over ickc/pandoc-amsthm: provide amsthm environments in pandoc with valid output in LaTeX and HTML.

@ickc
Copy link
Contributor

ickc commented Apr 23, 2016

Hi, all.

I finalized the YAML syntax. The latest one looks like this:

amsthm:
  plain:    [Theorem]
  plain-unnumbered: [Lemma, Proposition, Corollary]
  definition:   [Definition,Conjecture,Example,Postulate,Problem]
  definition-unnumbered:    []
  remark:   [Case]
  remark-unnumbered:    [Remark,Note]
  proof:    [proof]
  parentcounter:    chapter

The reason is related to a comment by @jgm mentioned in #2542

I want to avoid generating environments and commands that
aren't defined (and similarly for styles in Word and ICML).
If we parse the styles, and thus know what is available,
that may not be a big problem. In LaTeX it's harder,
because commands and environments may be defined in
included packages. The idea of having a special prefix
like style- might be a good one.

So it seems a general syntax to define a native pandoc div in the official pandoc language is unlikely (e.g. like the one I used with latex="true").

Because of this, I throw away the idea of having a general defined syntax of div. Instead, I used the idea of @chdemko mentioned above. And since I have already defined the amsthm environments in the YAML front matter, there's no need to define again which divs should be converted into a LaTeX environment. So I rewrote a filter, pandoc-amsthm.py, that do just that.

So currently, the use of amsthm environment in the markdown is then just this:

<div class="proof">
A Proof.
</div>
<div class="Theorem">
A Theorem.
</div>
<div class="Theorem boxed">
A boxed theorem if you define so in CSS. (Of course other filters needed if you want it boxed in LaTeX too.)
</div>

i.e. very minimal.

Note that pandoc-amsthm depends on native pandoc div, and a markdown style syntax is being considered in #168. Once that new syntax is out, the use of amsthm environment in pandoc markdown is even cleaner.

So I considered the tool finished and released it on Release Finalizing the syntax · ickc/pandoc-amsthm. Also see the documentation in ickc/pandoc-amsthm: provide amsthm environments in pandoc with output in LaTeX and HTML. I tested it throughout so it should be stable. But there could be some unknown problem and please let me know if you find one.

Lastly, @jgm: would you consider to include it in some way in the official pandoc, e.g. put it in the documentation or even include the filter in the official release (I also submitted a pull request of those templates in pandoc-template)?

@ickc
Copy link
Contributor

ickc commented Jan 5, 2017

Hi, all,

After a few months of personally using pandoc-amsthm v1, I decided to completely rewrite it, partly due to its limitation and partly due to the newly developed panflute (announced around the time I finished pandoc-amsthm v1).

I now have a working prototype of pandoc-amsthm v2 in the panflute branch, supporting LaTeX output only for the meanwhile.

The goal is to support all predefined amsthm styles and commands (previously only a subset of simpler commands are supported), and support all output formats (previously, LaTeX and HTML related output only, and the later via CSS which is limited and doesn't look identical to LaTeX's output). Hence it will break backward-compatibility. e.g. In the past, unnumbered environments are defined as, say, plain-unnumbered, in the new syntax, it will simply be defined as plain with the environment name appended by an *. Another bonus is it eliminates the need of a custom template. i.e. the filter is standalone.

You can see all the new syntaxes in ickc/pandoc-amsthm/panflute/tests/model-source.md.

Here's a few questions for you:

  1. In order to support the optional arguments allowed in amsthm, an additional attribute is needed in the Div. Currently I defined the key of the attribute as info, e.g. <div class="proof" info="Proof of the Main Theorem">. This is up to discussion and perhaps a more specific one can be defined to avoid collision (I have thought above using just amsthm, but it doesn't seem to tell what its content means).

  2. The amsthm environments are defined as native pandoc Div with classes. The problem is, class with space means a couple of different classes. So right now my approach to support environments with space is that in the YAML definition it is with a normal space (With Space), and any classes in the Div will have the space replaced by underscore (With_Space), in the hopes that underscore typically won't be used in the actual wordings (remember these environment name will also be used as the display name of the "Theorem"). But this is open to discussion and I wonder if anyone would see any problem with such approach.

  3. The remaining work will be to support any other output formats. This involves replicating all the amsthm predefined styles with native pandoc elements. So it might takes a while. Some of the known difficulties are (feel free to collaborate or make suggestions):

    1. There is no way to extract the top-level-division in filters. This will make matching the LaTeX output the other output formats difficult and fragile.

    2. In some amsthm style, the whole block within the environment will be italicized. The naive approach will be to walk through each element in the Div and emphasize it. But I'm not sure if this is the best and reliable approach. (e.g. what if some elements is already emphasized.)

In the end, I hope after pandoc-amsthm v2 is finished, this issue can be finally closed. (i.e. the situation will be more similar to pandoc-citeproc, where the feature is supported via a filter rather than natively in pandoc.)

@grea09
Copy link

grea09 commented Feb 14, 2017

Well I did my own support for general science with pandoc oriented for LaTeX :
https://github.com/grea09/pancake
You might want specifically https://github.com/grea09/pancake/blob/master/filter/pandoc-science.py
Everything is configured from the YAML block. It supports smart refs and custom blocks.

@jgm
Copy link
Owner

jgm commented Jul 22, 2020

I've just pushed some support for amsthm in the LaTeX reader.
This includes numbering and cross-references.
I don't use amsthm myself so I have no idea how complete it is; it could use some eyes.

@ickc
Copy link
Contributor

ickc commented Jul 23, 2020

Wow, that's great. Are the commits involved only 65865b3 and 9d07d18? Does it only concerns LaTeX reader, or it will also affects raw LaTeX in markdown? Also for LaTeX writer, would it be left as it is (i.e. let LaTeX to parse it, not pandoc)?

I'll try to make some time to test it in a couple of days (but if you didn't hear me in a few days then likely I can't make time.) The example I'll probably use is https://github.com/ickc/pandoc-amsthm/blob/master/docs/test/test.tex if anyone is interested. (and test.md was the markdown input.)

@jgm
Copy link
Owner

jgm commented Jul 23, 2020

It only affects the LaTeX reader.
Raw LaTeX in Markdown will be passed through as raw LaTeX as always.
I haven't yet added support to the LaTeX writer that would allow this to round-trip.

Notes:

  • I'm still working on the numbering options provided by the optional arguments
  • \theoremstyle isn't yet supported

@jgm
Copy link
Owner

jgm commented Jul 23, 2020

Updates:

  • The first optional argument is implemented (so, you can have Corollary and Theorem in the same number sequence, for example).
  • \theoremstyle is implemented

@grea09
Copy link

grea09 commented Jul 23, 2020

If it helps, here is what I did in my own version of the asmth support in the template file:

$if(amsthm)$
	\usepackage{amsthm}
	\usepackage{mfirstuc}
	\theoremstyle{plain}
	$for(amsthm.plain)$
	\newtheorem{$amsthm.plain$}{\capitalisewords{$amsthm.plain$}}[$amsthm.parentcounter$]
	$endfor$
	$for(amsthm.plain-unnumbered)$
	\newtheorem*{$amsthm.plain-unnumbered$}{\capitalisewords{$amsthm.plain-unnumbered$}}
	$endfor$
	\theoremstyle{definition}
	$for(amsthm.definition)$
	\newtheorem{$amsthm.definition$}{\capitalisewords{$amsthm.definition$}}[$amsthm.parentcounter$]
	$endfor$
	$for(amsthm.definition-unnumbered)$
	\newtheorem*{$amsthm.definition-unnumbered$}{\capitalisewords{$amsthm.definition-unnumbered$}}
	$endfor$
	\theoremstyle{remark}
	$for(amsthm.remark)$
	\newtheorem{$amsthm.remark$}{\capitalisewords{$amsthm.remark$}}[$amsthm.parentcounter$]
	$endfor$
	$for(amsthm.remark-unnumbered)$
	\newtheorem*{$amsthm.remark-unnumbered$}{\capitalisewords{$amsthm.remark-unnumbered$}}
	$endfor$
$endif$

@jgm
Copy link
Owner

jgm commented Jul 23, 2020

This issue mixes the question of LaTeX reader support, LaTeX writer support, and a markdown syntax for using these environments. These are all quite different issues and should perhaps be separated.

@jgm
Copy link
Owner

jgm commented Aug 14, 2020

I'm going to close this issue, as it was originally about latex parsing.

@jgm jgm closed this as completed Aug 14, 2020
@chtenb
Copy link

chtenb commented Nov 19, 2020

I'm unsure about the status of this thread. Is pandoc supposed to natively support the theorem environments for exporting to html? Because a quick try with the newest version of pandoc doesn't seem to yield anything.

@jgm
Copy link
Owner

jgm commented Nov 19, 2020

Yes. If something isn't working, please give an example.

@jgm
Copy link
Owner

jgm commented Nov 19, 2020

Here's an example:

% pandoc -t html -f latex
\usepackage{amsthm}
\newtheorem{thm}{Theorem}  
\newtheorem{lem}{Lemma}
\begin{thm}Here is a theorem
\end{thm}
\begin{lem}Here is a lemma.
\end{lem}

which yields

<div class="thm">
<p><strong>Theorem 1</strong>.  <em>Here is a theorem</em></p>
</div>
<div class="lem">
<p><strong>Lemma 1</strong>.  <em>Here is a lemma.</em></p>
</div>

@jgm
Copy link
Owner

jgm commented Nov 19, 2020

If you find something that doesn't work, you can open a new issue.

@chtenb
Copy link

chtenb commented Nov 19, 2020

My bad, I got linked here from a place that was about markdown to html conversion, which gave me the impression that that was included in this thread. Am I correct there is no support for such a thing then?

@jgm
Copy link
Owner

jgm commented Nov 19, 2020

not in markdown -> html
though there may be third-party filters that add something like this

@jschlatow
Copy link

@Chiel92 you can have a look at pandoc-theoremnos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests