Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citation Typing Ontology (CiTO) annotation support #420

Open
agitter opened this issue Mar 19, 2021 · 14 comments
Open

Citation Typing Ontology (CiTO) annotation support #420

agitter opened this issue Mar 19, 2021 · 14 comments

Comments

@agitter
Copy link
Member

agitter commented Mar 19, 2021

This issue summarizes some of the discussion in a recent Twitter thread about options for supporting CiTO annotations in Manubot.

The most promising lead is the cito lua filter for pandoc. It uses a colon-separated prefix for the CiTO annotation, for example [@cites_as_evidence:Li95], so we'll have to see how that interacts with Manubot's identifier prefixes, which are also colon-separated prefixes. Manubot's recent ability to infer the prefix further complicates the pattern matching. If the cito filter is run first and no cito properties overlap with the identifiers Manubot supports, there's a chance it could work without modifications.

A pandoc-scholar example demonstrates the usage of this filter.

@tarleb
Copy link

tarleb commented Mar 19, 2021

Also linking an experimental repo where the CiTO info is added to the reference entry. It requires a second filter which is run after the citeproc processor completed its work.

@agitter
Copy link
Member Author

agitter commented Mar 20, 2021

I did some initial testing with this version of cito.lua. I confirmed it worked as I expected on sample.md from the same repository. I ran into minor problems trying to combine it with the pandoc-manubot-cite filter.

I believe the issue is that the pattern used to find a CiTO property is ^(.+):(.+)$. When I test agrees_with:doi:10.1371/journal.pcbi.1007128, the first group is agrees_with:doi and the second is 10.1371/journal.pcbi.1007128.

The pattern ^([^:]+):(.+)$ may work better. The first group is agrees_with and the second is doi:10.1371/journal.pcbi.1007128. Making that change in cito.lua looks promising. I modified sample.md to use two Manubot-style citations-by-identifier, one with a prefix and one with an inferred prefix. I get a reasonable looking output.md. The full command was:

 pandoc --lua-filter=cito.lua --filter=pandoc-manubot-cite --output=output.md --standalone sample.md

Input and output files (with .txt extension for GitHub):

I haven't tried running the cito filter in the Manubot build script or checked the filters in the example showing how to insert the CiTO properties in the bibliography.

Does this seem promising? If so, what should we consider for next steps? I'll note that this adds the CiTO support per-reference instead of per-citation. Is that correct @tarleb?

@tarleb
Copy link

tarleb commented Mar 20, 2021

That is correct. I changed the behaviorof the filters in the jcheminf repo such that it's per citation.

Thanks for debugging the pattern, I should make that change in pandoc-scholar as well.

@duerrsimon
Copy link

I'd love to see this fully supported in Manubot. Is there anything that one could be of help with?

@dhimmel
Copy link
Member

dhimmel commented Sep 2, 2021

Is there anything that one could be of help with?

One approach would be to try to integrate the existing cito lua filter as discussed above. However, I have two worries about that approach:

  1. I think it makes most sense for citation typing to occur at the citation rather than reference level. For example, you can agree with a study in one section and disagree in another.
  2. I think the cito term would be best encoded into the citation suffix rather than key. For example, [@doi:10.1371/journal.pcbi.1007128, cito:agrees_with] rather than [@agrees_with:doi:10.1371/journal.pcbi.1007128]. Or possibly even the prefix like [cito:agrees_with @doi:10.1371/journal.pcbi.1007128].

So a precursor to contextualizing citations with CiTO would be to figure out how we can pass citation prefixes and suffixes into the pandoc HTML output. Some initial notes on this at #423 (comment). I am not currently sure what happens to citation prefix/suffixes... they might just drop out in the pandoc conversion. If we figure that out, we could then make a filter that inteprets CiTO prefixes/suffixes and validates and renders them.

@duerrsimon
Copy link

duerrsimon commented Sep 2, 2021

  1. I think it makes most sense for citation typing to occur at the citation rather than reference level. For example, you can agree with a study in one section and disagree in another.

Totally agree. In the example the bibliography contains all different uses. So if I cite @disagrees_with:X_2020 and @uses_method_in:X_2020. The output is X (2020) .... [cito:usesMethodIn] [cito:disagreesWith]

For the Manubot HTML output I guess would be nice if we can add a data-cito attribute to the span elements that contain the references so that in the tooltip this information can be displayed. In more static output (docx, pdf) I guess one could work with symbols and render a table at the end explaining the mapping of the symbols. [🗸1] is agrees_with, [✗1] is disagrees_with, etc. In the bibliography the full CiTo keys could be rendered. This would be like some journals that use dots and double dots to indicate references of special importance (some CellPress journals do this afaik).

  1. I think the cito term would be best encoded into the citation suffix rather than key. For example, [@doi:10.1371/journal.pcbi.1007128, cito:agrees_with] rather than [@agrees_with:doi:10.1371/journal.pcbi.1007128]. Or possibly even the prefix like [cito:agrees_with @doi:10.1371/journal.pcbi.1007128].

Not sure if I'd prefer the suffix. I like @agrees_with:doi:10.1371/journal.pcbi.1007128 for being somewhat human readable even for multiple citations.

@dhimmel
Copy link
Member

dhimmel commented Sep 2, 2021

For the Manubot HTML output I guess would be nice if we can add a data-cito attribute to the span elements that contain the references

Agreed, but I think you mean the span elements that contain the citations rather than the references. For example this:

<p>Citation by DOI <span class="citation" data-cites="IhliSZDo">[<a href="#ref-IhliSZDo" role="doc-biblioref">1</a>]</span>.</p>

One challenge appears to be how multiple citations are encoded like data-cites="YuJbg3zO 126Wi5Us4 mSMVRkoc PZMP42Ak":

Manubot plugins provide easier, more convenient visualization of and navigation between citations <span class="citation" data-cites="YuJbg3zO 126Wi5Us4 mSMVRkoc PZMP42Ak">[<a href="#ref-mSMVRkoc" role="doc-biblioref">2</a>,<a href="#ref-126Wi5Us4" role="doc-biblioref">3</a>,<a href="#ref-PZMP42Ak" role="doc-biblioref">7</a>,<a href="#ref-YuJbg3zO" role="doc-biblioref">8</a>]</span>.</p>

So perhaps we'd need something like data-citos that defaults to the general cites when no cito term is specified.

Not sure if I'd prefer the suffix. I like @agrees_with:doi:10.1371/journal.pcbi.1007128 for being somewhat human readable even for multiple citations.

No hard opinion on my end. My motivation is to do what will be most compatible with the Pandoc syntax and design. Including the cito term in the citation key would mean that it might be easier for our pandoc-manubot-cite filter to add CiTO support, since we're already extracting and updating citation keys. We'd just need to strip the CiTO portion and then see if there is a way to put that in the HTML output.

@Adafede
Copy link

Adafede commented Nov 1, 2021

Any advances on the topic? Happy to crash test if needed

@Adafede
Copy link

Adafede commented Nov 10, 2021

In case it might help:
biohackrxiv/bhxiv-gen-pdf#10

@Adafede
Copy link

Adafede commented Nov 26, 2021

@egonw Had a look at it, nice because they include the insert cito in biblio afterwards!

I was only able to make it run correctly with the short IDs generated by manubot to avoid clashes...

doi:12.345/678 -> short ID ABCD
foo bar [@doi:12.345/678] # so that ABCD is created in references.json
lorem ipsum [@agrees_with:ABCD] # works correctly
... [@agrees_with:doi:12.345/678] # does not work with any pattern

@larsgw any hints on this as you helped with the lua filters implementation?

@egonw
Copy link

egonw commented Nov 27, 2021

[@agrees_with:doi:12.345/678] does not work because : is used to separate CiTO annotation from the key. So, here it finds two annotations (agrees_with and doi) and the BibTeX key 12.345/678, which does not exist in your database, I guess.

@Adafede
Copy link

Adafede commented Jul 1, 2022

Any news on this?
Did someone managed to hack something? 😊

larsgw added a commit to larsgw/lua-filters that referenced this issue Jul 1, 2022
Fix input parsing when the second part includes a colon.

See manubot/rootstock#420 (comment)
@agitter
Copy link
Member Author

agitter commented Jul 1, 2022

On the Manubot side, no one has been actively working on this. We like the idea but don't have a lot of time for developing new features right now.

tarleb pushed a commit to pandoc/lua-filters that referenced this issue Jul 1, 2022
Fix input parsing when the second part includes a colon.

See manubot/rootstock#420 (comment)
tarleb pushed a commit to pandoc-ext/cito that referenced this issue Aug 24, 2022
Fix input parsing when the second part includes a colon.

See manubot/rootstock#420 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants