-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warning should be displayed when using illegal/unescaped characters in bibtex fields (e.g., %, #, &) #1188
Comments
In this case, it is an issue in JabRef 2.1 (from August 9th, 2006), which we will not fix as we focus on JabRef 3.x. 😇 |
To clarify @koppor, I used Ver2.1 because that is a version I have left installed, to check the search delay issue and the message appeared, requiring me to remove the # before saving. My thought was "WHY DOES AN OLDER VERSION PICK UP A BibTeX file ERROR" while the current one does not! The older version isn't the problem, but it is IDENTIFIYING AN ERROR in the bib file accepted by the current version. I am not expert, but assume it may cause issues if you attempt to use the citation in a LaTeX paper. Lots of features are being removed, and I wondered if parsing the bib file for errors was one that should be retained. Maybe BibLateX handles non-BibTeX characters, so the check has been intentionally removed from the current versions. I hope someone who knows can identify if this is a problem in Ver 3.2 and JabRef_windows-x64_3_3dev--snapshot--2016-04-05--fast-search--e0380b7, or an intentional removal of code. |
@ajbelle Please write me a personal email stating the features you miss. I personally fight for keeping all available issues and support everyone wanting to add an issue. The only thing I currently see being removed and really affecting someone is #496. - I know that there are other things being removed listed at https://github.com/JabRef/jabref/blob/master/CHANGELOG.md, but does anything affect you? Everything else seem to be issues being raised, because other issues (affecting other users). I made a minimal example:
The error is
Another example:
Result:
So, you are right, JabRef allows saving files not being treatable by pdflatex. |
Is this something for our integrity check? Should the integrity check being run "on save"? |
THX @ koppor, I hoped someone clever could see if what I noticed was a problem. I am never sure it isn't my old brain.
Could this change be related to the above issue, and possibly part of the solution given – is HML for – It would be good if the integrity check could make sure all 'unacceptable characters' are eliminated/converted rigourously. |
Correct HTML is I'm surprised to see that JabRef now allows saving fields with a single # in it. This was definitely not allowed in earlier versions and I do not really know when this bug was introduced. If you run the HTML to LaTeX converter it should be replaced with Another interesting aspect is that if you happen to have two of these, say |
Regarding %, I am not convinced that it should be automatically escaped. Apart from # which JabRef deals with explicitly, what is written in the field is LaTeX code. If you would have written – in a text editor you would have got exactly that behaviour. JabRef cannot check that you don't write With that said I still believe that JabRef should have functionality to automatically escape & etc, not just blindly on save. There may be cases where I actually want to have a & in my file. Say, if there is a tabular in the abstract. Note that the errors for @koppor are LaTeX errors, not bibtex errors, similar to whatever bad LaTeX code you might have written in your entries. (Btw, the file saved with the |
Regarding LaTex, all reserved chars should be escaped: |
To clarify, as a user, I do not input HTML codes and ligatures on purpose. They get pulled in with the various imports, often without my awareness. THX for the HTML to LaTeX converter hint @oscargus. Ver 2.1 behaviour that identified the offending character for manual intervention, was excellent, making it obvious to me as a user what the problem was and where to fix it without RTFM. Where is can't be made automatic, can JabRef highlight possible problem characters allowing the user to decide (A Regex Search entry, written by someone smarter than me, is all I would need). Not the issue discussed, but related in a users mind as extra characters you have to deal with, issue #1153 means I have many junk characters in some imports and it would be nice to reverse there utf8-ASCII conversion on a per entry basis using |
@Siedlerchr: No. Obvious from the example @ajbelle Are the imports JabRef searches or BibTeX-files provided to you from colleagues and websites? For the first case, it would be good to now about as we should provide conversion automatically. For the second case one can think of having general import converters. The idea in the later versions of JabRef is though to implement this as "save actions", i.e., conversions/clean-ups that are always applied on saving so that HTML-encoded characters are never saved, but converted to LaTeX-sequences, when that is activated. Clearly, JabRef could warn for characters that are unlikely to be what the user actually wants, as single #, unescaped & and % etc. It is not, I believe, in the general case possible to figure out which two consecutive 8-bit characters actually should be combined to a 16-bit character. I see the point though and it would be nice if it did work... |
@oscargus Oh yes, you are correct about automatic reverse conversion to 16-bit. Could it be coded for a user selected character sequence as manual correction requires you to know what the symbol should be? There are usually only a few mashed characters. A suggested feature, requiring no reply as I doubt the team has time for such a specialist feature given the current philosophy. @oscargus All of my offending imports are from downloaded .bib files, from reputable sources that should know better. I am amazed at the entry content and formatting served up as BibTeX and see JabRef as offering an authoritative implementation of the BibLaTeX standard. Specific to the 16bit encoding issure see #1153 . I have tried to reset the encodings before import but it doesn't always seem to work on my Windows box. A cut and paste does not suffer the encoding translation problem. I could be making a mistakes, but I am sure everyone at some time encounters this issue. |
I should be quite feasible to, say, mark two characters, right-click and with some magic ( In the latest master there is an integrity checker that checks for an odd number of unescaped Both these might come in a master build in the near future. I'll update here if it happens. I'm surprised (but somehow not) to hear that bad .bib-entries are produced by knowledgable sources. If you happen to directly search from such a source in JabRef, just let us know and we'll at least add automatic conversion there. |
The current master have checkers for an odd number of #-signs in a field and for any HTML characters, which should help in many of the cases if nothing else. I've also implemented the two 8-bit characters to one 16-bit character conversion, but when reading up it seems like it is only applicable to UTF-16. Do you have any example string that I can try on? Writing dd gives 摤 which is correct ( |
And for example string: just copy from the field editor in JabRef and paste here. That should work. |
@oscargus I think you have it understood (better than me anyway). Since you said copy and paste I enclose examples from my file. I am not sure if this is what you wanted. My encoding is set to utf-8 at all times in all editors. I do have entries from Endnote file import (original sources unknown) into various versions of JabRef. Due to their size and deeper coverage (including maths) the Abstracts are normally the problem. Embedded # have been a common annoyance (JabRef wouldn't save), along with the unbalanced {} which JabRef picks up. As previously mentioned ° came in as ° which is very annoying for me given I use it regularly as a special file marker. I just retested import of utf-8 to utf-8 and it worked perfectly! Previously nothing I did seemed to get it in unchanged. Maybe it is a setting save glitch. Punctuation: Maths symbols: OCD pdf due to fi, ligature http://ilovetypography.com/2007/09/09/decline-and-fall-of-the-ligature/ Basically any character beyond the 8bit encoding as the following indicate Accented characters |
What is the status of this issue? Can it be closed? What else is missing? |
What are the two first lines of the bib file? Could it be that the encoding information is wrong there? |
@koppor if your question is to me I always have JabRef UTF-8 and try ensuring the input file is UTF-8 (except when trying to figure out what is wrong). The first line in my files is: The garbage content posted is a function of importing from many sources (including content obviously from pdf OCR) and are not necessarily a single translation fault. They simply list of the sort of garbage JabRef has to deal with, that could/will cause problems. The list is not supposed to represent the original problem I identified, which has been fixed per @tobiasdiez. I have experienced situations where changing the encoding format to UTF-8 with Notepad++/Notpad2/CrimsonEditor... didn't seem to correct the import translation problem (but cut and paste on my Win7 box was fine!). I didn't have time to work out with certainty if the PFEs were actually changing the format, so did not report it as an issue with JabRef. While I changed the file format by PDF I did not change the encoding line in the file, so I guess JabRef was being told it was CP1252 when in fact I had converted it to UTF-8. If true that would finally explain why I could never make it import without character corruption, and maybe a clarification should be added to a help file somewhere. Sorry if I have missed it in the manual. The bottom line from my experience is that all users at some stage will get this sort of character corruption and if JabRef could help in any way that would be great. Reversing the one-way translation isn't usually possible I understand. IMHO @oscargus understands the situation best, and I believe he thinks everything that can be implemented has been. I have therefore closed the issue, on the understanding no-one can see how to strengthen JabRef further. I am not sure if I am/was expected to close it. |
JabRef version <3.2 and 2.1> on <Windows 7>
Steps to reproduce:
It may not be a fault in BibLaTeX which I have selected, but I report it in case it indicates that the BibTeX save parsing algorithm is no longer working/deleted in ver3.2 and later.
The text was updated successfully, but these errors were encountered: