Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cleanup action for "LaTeX to LaTeX aware Unicode" #8715

Open
JasonGross opened this issue Apr 23, 2022 · 2 comments
Open

Add cleanup action for "LaTeX to LaTeX aware Unicode" #8715

JasonGross opened this issue Apr 23, 2022 · 2 comments
Labels

Comments

@JasonGross
Copy link

Problem:

  • There is no cleanup action that allows converting (old) bibliographic data that is (still) formatted in LaTeX with Non-Unicode characters to Unicode aware LaTeX formatting (newer LaTeX engines (e.g. LaTeX2e) can now read most Unicode characters).
  • Current workarounds include converting to from LaTeX to Unicode and then back to LaTeX, while manually checking, if any characters were wrongly converted. This is inefficient and takes a long time.

Desired Solution:

  • Create cleanup action for "LaTeX to Unicode aware LaTeX".

Example workflow:

  1. Have the following entry (BEFORE using the cleanup action):

    @Article{Testkey,
      author   = {Testauthor},
      title    = {Bibliographic data that can be read by LaTeX engines},
      a = {Here is a backslashed percentage sign \% and it should be excluded from conversion},
      b = {Here is a \textcopyright{} and it should be converted to Unicode}, 
    }
    

    (Comment: \textcopyright{} can be converted to © by the inputenc package. When using the LaTeX to Unicode aware LaTeX cleanup action, the result of the conversion should also be ©)

  2. Use cleanup action "LaTeX to Unicode aware LaTeX"

  3. AFTER using the cleanup action, the following result should emerge:

    @Article{Testkey,
      author   = {Testauthor},
      title    = {Bibliographic data that can be read by LaTeX engines},
      a = {Here is a backslashed percentage sign \% and it should be excluded from conversion},
      b = {Here is a © and it should be converted to Unicode}, 
    }
    

"Special Symbols" that would need to be excluded from conversion:

  • The list should be similar to the symbols mentioned in Add integrity check for LaTeX special characters #8712.
  • At the very least Page 15 (Tables 1); Table 1 lists escapable special characters in LaTeX.
  • Maybe also Page 15 Table 2 and Page 16 Table 3.
  • There might be a lot more, but I am not knowledgable enough to list them here. If you know of any, just post it in this thread.

Additional Information

  • When working on this, The Comprehensive LATEX Symbol List will be of help. Especially chapters about "Unicode" (Page 272) and "Special Characters" (Page 15-16).
  • JabRef currently uses https://github.com/tomtung/latex2unicode; Maybe it can be adapted internally in JabRef (e.g. some pre-processing). Another solution would be to fork it or ask tomtung about creating a LaTeX2UnicodeAwareLaTeX converter.

Originally posted by @ThiloteE in #8490 (comment)

@ThiloteE ThiloteE added cleanup-ops unicode unicode related issues labels Apr 23, 2022
@zkl-ai
Copy link
Contributor

zkl-ai commented Apr 30, 2022

Hello, can I take this issue? I have done something related to cleanup actions.

@ThiloteE
Copy link
Member

Sure you can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Free to take
Status: Normal priority
Development

No branches or pull requests

3 participants