Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all BibLaTeX date formats for parsing #2753

Closed
1 task
LinusDietz opened this issue Apr 16, 2017 · 21 comments · Fixed by #10511
Closed
1 task

Support all BibLaTeX date formats for parsing #2753

LinusDietz opened this issue Apr 16, 2017 · 21 comments · Fixed by #10511
Assignees
Labels
bib(la)tex entry-editor FirstTimeCodeContribution Triggers GitHub Greeter Workflow good first issue An issue intended for project-newcomers. Varies in difficulty. import

Comments

@LinusDietz
Copy link
Member

LinusDietz commented Apr 16, 2017

This would be a followup for #2731, which was at #2731 (comment) by @Siedlerchr:

  • Provide for complete biblatex support for parsing Dates

A PR would have to investigate possible date patterns from the standard and then implement them with some tests

@tobiasdiez
Copy link
Member

Further examples that should be parsed successful:
November 2013 and January 31 2005

@Siedlerchr
Copy link
Member

Some examples from the biblatex manual, Section 2.3.8.

grafik

Biblatex follows the EDTF-Standard

@wujastyk
Copy link

wujastyk commented Jul 20, 2017

I don't know if this report belongs here under #2753, because I think it's not about date parsing as such, but about the entry editor. But here goes:

Biblatex permits strings in the date field of the form "1899~" and "1899?" and even "1899?~". The current snapshot of JabRef doesn't allow such strings to be entered in the "Required fields" tab. If one tries, then the content of the field disappears. But one can enter the strings in the "biblatex source" tab. And the data is preserved and even displays in the "required fields" tab.

Desired behaviour: allow the entry editor's "Required fields" tab to accept "date" field strings in all legal Biblatex formats, including Biblatex's "Enhanced date specifications" as listed in the Biblatex manual, table 5 (as well as tables 3, mentioned in the previous post @Siedlerchr, and 4 "unspecified date parsing").

Best,
Dominik

screenshot from 2017-07-20 10-59-34

JabRef 4.0-dev--snapshot--2017-07-19--master--2f12fecfb
Linux 4.8.0-58-generic amd64
Java 1.8.0_131

@crystalfp
Copy link

crystalfp commented Sep 17, 2017

JabRef 4.0-beta3
Windows 10 10.0 amd64
Java 1.8.0_131
Check integrity complains on this data field: 2008-07/2008-08 (JabRef more: biblatex)
It is a journal article published on the July/August 2008 issue.
The complete entry is:

@Article{Carr2008,
  author       = {Nicholas Carr},
  title        = {Is Google Making Us Stupid?},
  journaltitle = {The Atlantic},
  date         = {2008-07/2008-08},
  url          = {http://www.theatlantic.com/magazine/archive/2008/07/is-google-making-us-stupid/306868/},
}

Thanks!
mario

@moewew
Copy link

moewew commented Dec 5, 2017

Please note that the EDTF draft standard has been superseded by the upcoming ISO8601 novella, so biblatex will cease to support EDTF format and will move to ISO8601. biblatex 3.9 has a deprecation notice for EDTF out and the next release (due in a few days/weeks) will support the new ISO8601 syntax and drop support for EDTF.
The changes are backwards incompatible, but we hope that they were niche enough that this does not cause too many problems. Only a few rarely-used bits of EDTF are not ISO8601 compatible.
Since ISO8601 has not been released yet, that could mean that things have to change later on. This might happen with only short notice from the biblatex/Biber side.

See https://github.com/plk/biblatex/wiki

@Siedlerchr
Copy link
Member

Siedlerchr commented Dec 5, 2017

@moewew Thank you very much for your information, helps a lot. So I suggest we make us compatible.
Here is the current draft: ISO8601-2 Clause 4, Level 1 Extended Format
https://www.loc.gov/standards/datetime/ISO_DIS%208601-2.pdf

Summary:

4.1.1 Extended format
For features described in this part of ISO 8601, Clause 4, only the extended format (YYYY-MM-DD) SHALL BE used. Basic format (YYYYMMDD) SHALL NOT be used.

The character '?' (question mark) is used to mean "uncertain". The character '~' (tilde) is used to mean "approximate". The character '%’ (percent) is used to mean “both uncertain and approximate".
“Uncertain” and/or “approximate” may apply to full representations as well as representation with reduced precision.
4.2.1 Level 1
For level 1, ‘?,’ ‘~’, or ‘%’ may only occur at the end of the date string, and it applies to the entire date

4.3 Unspecified
The character 'X' may be used as a replacement character, in place of a digit to indicate that the value of that digit is unspecified.
4.3.1 Level 1
The replacement character ‘X’ may be substituted for the right-most digits in the following cases:

Year and month specified, day unspecified.

Year specified, day and month unspecified.

Entire date unspecifie

@moewew
Copy link

moewew commented Dec 5, 2017

In case you want to read up on what PLK has done to implement these changes, a few comments are scattered in plk/biblatex#540, plk/biblatex#644, plk/biblatex#656. The dev branch of biblatex has the changes for ISO8601 applied, so you can have a look at the documentation and change files there.

@LinusDietz
Copy link
Member Author

LinusDietz commented Dec 8, 2017

We decided in the devcall, that we follow the Biblatex team and use the new ISO8601 standard without backwards compatibility to EDTF. https://github.com/plk/biblatex/wiki

Addtionally, there is an EBNF in Annex A in the spec doc:
https://www.loc.gov/standards/datetime/ISO_DIS%208601-2.pdf

@koppor koppor added the PE1718 label Dec 16, 2017
@stefan-kolb stefan-kolb added import and removed parsing labels Feb 6, 2018
@koppor koppor added good first issue An issue intended for project-newcomers. Varies in difficulty. and removed PE1718 labels Mar 9, 2018
@LinusDietz
Copy link
Member Author

this doesn't make sense @LinusDietz

@wujastyk
Copy link

This came up again here: #7864

@Siedlerchr Siedlerchr mentioned this issue Feb 24, 2022
2 tasks
@Siedlerchr
Copy link
Member

To solve this issue, one needs to add new Patterns to JabRef's Date class and write tests.

static {
List<String> formatStrings = Arrays.asList(
"uuuu-MM-dd'T'HH:mm:ss[xxx][xx][X]", // covers 2018-10-03T07:24:14+03:00
"uuuu-M-d", // covers 2009-1-15
"uuuu-M", // covers 2009-11
"d-M-uuuu", // covers 15-1-2012
"M-uuuu", // covers 1-2012
"M/uuuu", // covers 9/2015 and 09/2015
"M/uu", // covers 9/15
"MMMM d, uuuu", // covers September 1, 2015
"MMMM, uuuu", // covers September, 2015
"d.M.uuuu", // covers 15.1.2015
"uuuu.M.d", // covers 2015.1.15
"uuuu", // covers 2015
"MMM, uuuu"); // covers Jan, 2020
SIMPLE_DATE_FORMATS = formatStrings.stream()
.map(DateTimeFormatter::ofPattern)
.reduce(new DateTimeFormatterBuilder(),
DateTimeFormatterBuilder::appendOptional,
(builder, formatterBuilder) -> builder.append(formatterBuilder.toFormatter()))
.toFormatter(Locale.US);
}

The order of the date patterns seem to matter as well, as I recently learned when extending the class to support Timestamps. The longest? pattern needs to come first.

See Section 2.3.9 of the biblatex manual http://mirrors.ibiblio.org/CTAN/macros/latex/contrib/biblatex/doc/biblatex.pdf

@calixtus
Copy link
Member

JDK 19 feature:
https://bugs.openjdk.org/browse/JDK-8282277

@yanyanliu1400
Copy link
Contributor

Hi, I would like to make an attempt at resolving this issue. Could it please be assigned to me?

I have followed the instructions and set up a local workspace.

I have also read the contribution guide.

It would be great if I could receive some advice/pointers on where to begin with this issue.

@ThiloteE
Copy link
Member

@yanyanliu1400 I think you could start here:

public class Date {
private static final DateTimeFormatter NORMALIZED_DATE_FORMATTER = DateTimeFormatter.ofPattern("uuuu[-MM][-dd]");
private static final DateTimeFormatter SIMPLE_DATE_FORMATS;
static {
List<String> formatStrings = Arrays.asList(
"uuuu-MM-dd'T'HH:mm[:ss][xxx][xx][X]", // covers 2018-10-03T07:24:14+03:00
"uuuu-MM-dd'T'HH:m[:ss][xxx][xx][X]", // covers 2018-10-03T17:2
"uuuu-MM-dd'T'H:mm[:ss][xxx][xx][X]", // covers 2018-10-03T7:24
"uuuu-MM-dd'T'H:m[:ss][xxx][xx][X]", // covers 2018-10-03T7:7
"uuuu-MM-dd'T'HH[:ss][xxx][xx][X]", // covers 2018-10-03T07
"uuuu-MM-dd'T'H[:ss][xxx][xx][X]", // covers 2018-10-03T7
"uuuu-M-d", // covers 2009-1-15
"uuuu-M", // covers 2009-11
"d-M-uuuu", // covers 15-1-2012
"M-uuuu", // covers 1-2012
"M/uuuu", // covers 9/2015 and 09/2015
"M/uu", // covers 9/15
"MMMM d, uuuu", // covers September 1, 2015
"MMMM, uuuu", // covers September, 2015
"d.M.uuuu", // covers 15.1.2015
"uuuu.M.d", // covers 2015.1.15
"uuuu", // covers 2015
"MMM, uuuu"); // covers Jan, 2020
SIMPLE_DATE_FORMATS = formatStrings.stream()
.map(DateTimeFormatter::ofPattern)
.reduce(new DateTimeFormatterBuilder(),
DateTimeFormatterBuilder::appendOptional,
(builder, formatterBuilder) -> builder.append(formatterBuilder.toFormatter()))
.toFormatter(Locale.US);
}

and address #2753 (comment)

Additional info: There was a proposal to use JDK 19 features, but JabRef is still using JDK 18, so this proposal can be ignored for now.

@GuyPuts
Copy link
Contributor

GuyPuts commented Mar 16, 2023

Would like to take this issue.

@yanyanliu1400 yanyanliu1400 removed their assignment Mar 16, 2023
@ThiloteE
Copy link
Member

Meanwhile, newest development version of JabRef uses JDK 19. See d3bb827

@Siedlerchr
Copy link
Member

Current state: the: The following date formats do not yet work and need to be added and tested and the parsing might need to be adjusted

   "u G",                                  // covers 1 BC
   "u G / u G",                            // covers 30 BC / 5 AD
    "uuuu G / uuuu G",                      // covers 0030 BC / 0005 AD
    "uuuu-MM G / uuuu-MM G",                // covers 0030-01 BC / 0005-02 AD
     "u'-'",                                 // covers 2015-
     "u'?'",                                 // covers 2023?
         

@ThiloteE ThiloteE added good first issue An issue intended for project-newcomers. Varies in difficulty. and removed good first issue An issue intended for project-newcomers. Varies in difficulty. labels Oct 9, 2023
@XiaotongHe123
Copy link
Contributor

Hi, I'm interested in this issue and would love to work on it. Can I get assigned to this issue?

@ThiloteE
Copy link
Member

Sure, go ahead :-)

@ThiloteE ThiloteE added the FirstTimeCodeContribution Triggers GitHub Greeter Workflow label Oct 16, 2023
@github-actions
Copy link
Contributor

As a general advice for newcomers: check out Contributing for a start. Also, guidelines for setting up a local workspace is worth having a look at.

Feel free to ask here at GitHub, if you have any issue related questions. If you have questions about how to setup your workspace use JabRef's Gitter chat. Try to open a (draft) pull-request early on, so that people can see you are working on the issue and so that they can see the direction the pull request is heading towards. This way, you will likely receive valuable feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bib(la)tex entry-editor FirstTimeCodeContribution Triggers GitHub Greeter Workflow good first issue An issue intended for project-newcomers. Varies in difficulty. import
Projects
Archived in project
Archived in project
Development

Successfully merging a pull request may close this issue.