Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetimes. The draft ISO 8601-201x and EDTF. #656

Closed
JohnLukeBentley opened this issue Nov 21, 2017 · 31 comments
Closed

Datetimes. The draft ISO 8601-201x and EDTF. #656

JohnLukeBentley opened this issue Nov 21, 2017 · 31 comments

Comments

@JohnLukeBentley
Copy link

JohnLukeBentley commented Nov 21, 2017

Intro

Biblatex formerly (between v3.5 and v3.8, inclusive) supported an "Extended Date/Time Format (EDTF)" specification (to "level 1") located at http://www.loc.gov/standards/datetime/pre-submission.html. A new International Standards Organisation (ISO) draft specification has emerged that incorporates, with minor changes, that specification and "supersedes" it (thanks @moewew for flagging the change).

I've read the recent draft ISO and would like to clarify the differences between the old EDTF spec, former and current biblatex support, and the new draft ISO.

@plk indulge me addressing you in the third person even though you'll be the chief audience member in virtue of being the person who has implemented all this datetime stuff ...

@plk, with laudable enthusiasm, has gone ahead and started implementing the draft ISO for v3.9. @plk's partial implementation of the draft ISO spec has established that backwards compatibility is impossible (at least, and probably at most, with respect to open and unknown date ranges). And it seems that @plk has broken backwards compatibility in other ways on the basis that ....

[conforming to the old EDTF spec] syntax [was] too messy to do so and few people were using this anyway since it is a relatively new and specialised feature. https://github.com/plk/biblatex/blob/dev/doc/latex/biblatex/CHANGES.md

I think that's reasonable. However, I'd would have suggested holding off for now, or at least, for now, having biblatex not break the old EDTF spec. Given that the new draft ISO warns "it is subject to change without notice" and that, indeed, it is we who might very well want to lobby ISO to make such changes.

Therefore, for biblatex, given the option between:

  • Waiting until the draft ISO is ratified before making the jump from the old EDTF; and
  • Chasing the changing draft ISO until ratification.

... I suppose we are going to roll with the latter, having started down that path. That need not be too onerous: the ISO revisions are probably going to be slow and few. And I also suppose that I'm happy that @plk has been prepared to break backwards compatibility in all sorts of ways. For that better allows us to evaluate the new draft ISO as a general matter freed, to a greater extent, from parochial worries about biblatex.

Anyway I suggest the purposes of this thread be:

  • To clarify the former and current relationship between biblatex and the EDTF specs.
  • To see if the new draft ISO throws up biblatex specific problems for us.
  • To see if we, any of us that feel so inclined, want participate in the ISO standard review process and lobby that community for changes to the draft ISO. That is, either because of biblatex specific issues; or just because we have become EDTF experts to some extent and are well placed to form views about an EDTF spec as a general matter.

And by "we" I mean to invite anyone who reads this post to participate. But I have in mind specific individuals who may find this of interest: @plk, @moewew, @retorquere, @njbart. ... but, as always, feel free not to engage this thread.

In this first post I largely confine myself to describing the issues. In the post to follow I'll make recommendations about those issues.

Semantics

To speak about this I'll use the following terms for the different standards/specifications:

  • "EdtfLoc", the draft EDTF specification hosted at the library of congress: http://www.loc.gov/standards/datetime/pre-submission.html. That is, the spec we've been going off until recently.
  • "EdtfIso201x", the EDTF profile as found in the draft ISO documents: ISO DIS 8601-201x Part 1; and ISO DIS 8601-201x Part 2 (as made available from http://www.loc.gov/standards/datetime/pre-submission.html). The current version of these are dated 2016-10-26. Essentially what matters for EDTF purposes is Part 2 (that is, you could get away with only reading Part 2). Within Part two, "Annex C", stipulates the EDTF profile. However, that profile references earlier parts of the Part 2 document.
  • "Iso201x", the full draft datetime specification as found in the above ISO documents. That is, ISO DIS 8601-201x Part 1 and ISO DIS 8601-201x Part 2.
  • "Iso2004", the old ISO ... ISO 8601:2004 https://www.iso.org/standard/40874.html. This standard doesn't stipulate an EDTF profile.
  • "IsoOrg". The International Organization for Standardization (ISO) itself. For the sake of making recommendations for that organisation.

As mentioned in a former version of biblatex.pdf ("2.3.8 Date and Time Specifications", p36) biblatex supported EdtfLoc levels 0 and 1. EdtfIso201x also uses the concept of levels but, naturally enough, what falls under those levels are (slightly) different with respect to EdtfLoc. A further subtle difference occurs between EdtfIso201x levels and Iso201x (the larger document) levels:

This [EDTF] profile specifies three levels: level 0, level 1, and level 2. Level 0 specifies features of ISO 8601 Part 1.
Levels 1 and 2 specify features of Part 2/level 1 and Part2/level 2 respectively. ("Annex C", Iso201x Part 2, p28)

All this is largely to make the point that we need to be specific when referencing levels. For example EdtfLoc level 1, EdtfIso201x level 1, and Iso201x level 1 are (slightly) different.

Note also that generally, unless exceptions are mentioned, in all of those standards or profiles support for a higher level entails support for a lower level. E.g. To claim that an application, like biblatex, supports "EdtfLoc level 1" would be to say that it supports "EdtfLoc level 0 and level 1" (because this is what the specs oblige of an implementer). I have no objection, however, to anyone explicitly enumerating level support (i.e. speaking of support for "EdtfLoc level 0 and level 1" rather than "EdtfLoc level 1"). Indeed it is probably clearer to be explicit.

The differences

By and large EdtfIso201x (and the overall Iso201x) looks well thought through (as was EdtfLoc) and feature complete. But the overarching issue of interest is: What are the relevant differences between between EdtfLoc and EdtfIso201x?

Levels

The first difference is on the matter of levels. And the relevant issue is what EdtfIso201x level should biblatex, in the future, support?

Both EdtfLoc and EdtfIso201x categorize datetime formats into levels: 0, 1, 2. As mentioned above the levels in each spec are slightly different in terms of what they reference. However, there is a rough correspondence.

Biblatex formerly supported EdtfLoc level 1 (with a few extras). EdtfLoc level 2 seemed to target rare edge cases.

EdtfIso201x level 2 seems to target highly rare edge cases. For example,

Uncertain and/or approximate date ...
Level 2 ...
2004-06~-11
year and month are both approximate; day known
...
Unspecified ...
Level 2 ...
15XX-12-XX
Some day in December in some year during the 1500s

So both EdtfLoc level 2 and EdtfIso201x level 2 seem to target rare, or highly rare, edge cases; although there's a slight variation in what those edge cases are.

Unknown and open ranges

EdtfLoc supports "unknown" and "open" ranges at EdtfLoc Level 1 as follows:

  • 'unknown' may be used for the start or end date when it is unknown.
  • 'open' may be used when no end date is specified, either because there is none or for any other reason.
    ...

Examples

  • 2004-06-01/unknown
    beginning June 1, 2004, end unknown
  • 2004-01-01/open
    beginning January 1 2004 with no end date
    ...

(See http://www.loc.gov/standards/datetime/pre-submission.html#extendedintervall1)

EdtfIso201x supports "unknown" and "open" ranges (at EdtfIso201x Level 0) as follows:

  • Unknown
    Start or end date unknown. The start or end date may be left blank to indicate “unknown”.
  • Open Start or End
    Double-dot (..) may be used for the start/end date when there is no start/end date.

(See "4.4 Enhanced time interval", ISO DIS 8601-201x Part 2, p11).

Biblatex formerly supported "unknown" and "open" ranges as follows:

  • Unknown: "unknown", "*";
  • Open: "open", [blank]

In summary ...

EdtfLoc EdtfIso201x Biblatex (formerly)
Unknown "unknown" [blank] "unknown", "*"
Open "open" ".." "open", [blank]

That's a problem. If biblatex is to transition to support EdtfIso201x (as currently specified) the meaning of blank, in the biblatex context, has to change from open to unknown.

Approximate and uncertain dates

EdtfIso201x preserves, from EdtfLoc, the representation of approximate (e.g. "1985-04-12~") and uncertain (e.g. "1985-04-12?") dates.

However, the representation of dates that are both approximate and uncertain has changed:

Century

EdtfIso201x supports a mere two digits as representing a century ...

... which is the hundred year time interval consisting of years beginning with those two digits.

For example ‘19’ may be used to indicate the time interval represented by ‘1900/1999’. (See "C.4.4 Century", ISO DIS 8601-201x Part 2, p30).

EdtfLoc doesn't support this. But under EdtfLoc a century could be represented: with the unspecified range as in "19uu"; in addition to an explicit range like "1900/1999".

Other issues

There are other issues that arise that are not due to differences between Edtfloc and EdtfIso201x. Rather, they arise because of absolute reasons about the current form of EdtfIso201x.

Reduced time precision

When a time is present, shouldn't EdtfIso201x support minute level precision?

Edtfloc doesn't appear to support precisions less granular than a second, when a time is present at all. That is, something like "09:30:01" appears to be permitted but minute level precision, "09:30", appears to be forbidden.

Edtfloc stipulates:

A date/time string MUST be composed according to one of three representations as illustrated in the following three examples:

2001-02-03T09:30:01
2004-01-01T10:10:10Z
2004-01-01T10:10:10+05:00

Note: 'T' separating date and time must be upper case.

The date/time string MUST use 8601 extended form, i.e. date with hyphen, time with colon. Zone-offset may be omitted or included. 8601 extended format time zone designation consists of either a 'Z' to indicate UTC, or a '+' or '-' to indicate "ahead of UTC" or "behind UTC", followed by a 2-digit hour, followed optionally by a colon and the 2-digit minutes.

And the Edtfloc BNF stipulates

dateAndTime = date "T" time
     time = baseTime zoneOffset?
           baseTime = hour ":" minute ":" second | "24:00:00" 

So, as written, a reduce precision datetime, like "2017-10-28T04:47" appears to be forbidden under Edtfloc.

Biblatex currently (and formerly) reflects this with a message like:

WARN - Entry 'musk_2017_picture' (Biblatex-Tester-EdtfFull-FromZotero.bib): Invalid format '2017-10-28T14:30' of date field 'date' - ignoring

... although biblatex permits output where the seconds are dispensed with.

EdtfIso201x essentially repeats the specification on this issue (See "C.4.2 Date and Time", ISO DIS 8601-201x Part 2, p29). That is, EdtfIso201x also appears to forbid minute level precision, where a time is expressed.

Move EDTF to its own Part 3?

Should the EDTF section of EdtfIso201x be an annex in part 2, or be moved to its own part 3?

Take the (currently existing) sentence like ...

This [EDTF] profile [in Annex C of Part 2] specifies three levels: level 0, level 1, and level 2. Level 0 specifies features of ISO 8601 Part 1.
Levels 1 and 2 specify features of Part 2/level 1 and Part2 /level 2 respectively.

("Annex C (informative) The Extended Date/Time Format - A Profile of ISO 8601 (Parts 1 and 2)", ISO DIS 8601-201x Part 2, p28)

... I think that (quoted) sentence would be a easier to follow if the EDTF profile were in its own part 3.

Participating in the ISO forum?

Participating in an ISO forum seems rather labyrinth. However, the relevant clues seem to be ...

From How we develop standards there is ...

Do you want to get involved in standards development? ... Whether you’re a consumer or in business you can be part of the next generation of standards. ...

https://www.iso.org/get-involved.html

Standards are developed by groups of experts called technical committees. These experts are put forward by ISO’s national members. If you are interested in getting involved, contact your national member. Contact details can be found in the list of national members. [Link original]

From ISO/DIS 8601-1 the relevant technical committee is: ISO/TC 154 Processes, data elements and documents in commerce, industry and administration

... perhaps some of you are already participating and have a shortcut route to join?

In any case this very thread might serve as a calling card for those of us that want to join the ISO process.

The ISO 8601 landing page

Just to have it to hand:

Edit: 2017-12-28 Added "IsoOrg" term.

@JohnLukeBentley
Copy link
Author

JohnLukeBentley commented Nov 21, 2017

The following are my recommendations about:

  1. What to do about biblatex in relation to EdtfIso201x;
  2. EdtfIso201x in general; and
  3. Iso201x in general.
  4. IsoOrg: the International Organization for Standardization itself.

I don't necessarily feel strongly about all of my recommendations. That is, I'd be very much open to be persuaded differently. Especially since creative solutions might provide a way out of problems. So I provide the recommendations as something, at least, to push against.

Biblatex and EdtfIso201x

Levels

Recommendation:

Future biblatex support be directed at EdtfIso201x level 1 (and so also EdtfIso201x level 0).

Biblatex formerly supported EdtfLoc level 1 and level 0 (with a few extras). Given the rough correspondence between the levels between EdtfLoc and EdtfIso201x, and that both specs seem only to support rare, or very rare, edge cases at level 2: probably coding time could be saved in the biblatex context by not bothering with EdtfIso201x level 2.

It is pleasing that the EdtfIso201x profile should attempt an exhaustive (and novel) handling of possibilities at level 2. If need to support the relevant rare edges cases arose, then EdtfIso201x looks well built to provide the relevant stipulation to implementers.

EdtfIso201x

Unknown and open ranges

Recommendation:

EdtfIso201x should use:

* For open: blank, as in "1982/" or "/1572".
* For unknown: Caret "^", as in "^/1970" or "1890/^".

The designators used for unknown and open in EdtfLoc, the strings "unknown" and "open", have an obvious problem: they are English words. That's not a good thing for internationalization reasons. So it is right that EdtfIso201x should address this by having, instead, characters (counting blank as a character).

In EdtfIso201x the double dot ".." for open seems OK. Although it risks looking unintentional at the end of a sentence 2006/... Given, that is, that the end of a sentence might be marked by ellipses ... So for those reasons I think a double dot ".." should be rejected.

Blank for unknown seems like an odd choice conceptually. Biblatex historically uses blank not for unknown but for open. And that seems right conceptually. Blank simply seems better suited to indicate what open means: "some date that shouldn't be fixed".

So I recommend blank for open dates. In making this recommendation I give no weight to the convenience of preserving biblatex backward compatibility.

An argument against this recommendation, or the use of blank in any date context, is that blank might be confused for a value that has not yet been entered. That is, that the author of "/1572" has not yet asserted anything about the start date (neither that it is open nor unknown). If that argument has purchase then it speaks in favour have having explicit (non blank) characters for any kind of value (an explicit symbol for both open and unknown dates).

But assuming blank should be for open what, then, what should be the symbol for one of the dates in a range that is unknown?

It can't be "?" on its own, as in "2006/?", for "?" is already used to designate the that whole term is "uncertain". I take it that in this bibliographic context "uncertain" and "unknown" are conceptually distinct. "Uncertain", expressed as the question mark in "1587?", meaning "date whose source is considered dubious" (See "3 Terms and definitions", ISO DIS 8601-201x Part 2, p6). But an "unknown" date seems fitted to identify some date you wouldn't hazard a guess at, the "unknown" in "1783/unknown". So for those reasons the question mark "?" should be preserved for uncertain and not be used for unknown (whether alone or in combination with other characters).

What about, for unknown, the asterisk "*", as formerly supported by biblatex? For me the connotations of "*" come from SQL where it means: all possible values. That's a little bit jarring because the asterisk "*" in "2006/*" doesn't mean "all years" but "all years greater than 2006". However, that's no large objection. One could come to understand that, in an EDTF context, the asterisk "*" takes on a slightly different meaning.

However I think I'd rather another character that didn't carry those connotations. Perhaps the caret "^": e.g. "^/1982"; "2006/^". I think that lends itself more readily to a stipulated connotation of being a pointer to a series dates either above or bellow the date that is specified. So, for the sake of presenting alternatives, that's my recommendation.

Century

Recommendation:

In EdtfIso201x remove support for two digit centuries. 

EdtfIso201x's provision of a two digit number to represent a century ...

For example ‘19’ may be used to indicate the time interval represented by ‘1900/1999’. ("C.4.4 Century", ISO DIS 8601-201x Part 2, p30).

... is superfluous if an implementation (like biblatex) supports EdtfIso201x to level 1. For at EdtfIso201x level 1 the unspecified "A year with one or two (rightmost) unspecified digits." is supported ("4.3 Unspecified", ISO DIS 8601-201x Part 2, p10). That is, "19XX" could be used to represent to "1900/1999".

Even more significantly, even if an implementation only supported EdtfIso201x to level 0, then a century can be represented by the EdtfIso201x level 0 supported time interval "1900/1999".

Any spec should be made simpler if it can. So I think removing support for centuries as two digits ought be done to make the spec simpler, given that centuries can be expressed at EdtfIso201x:

  • Level 0 via time intervals (e.g. "1900/1999"); or
  • Level 1 via unspecifieds (e.g. "19XX").

Reduced time precision

Recommendation:

Permit minute level precision at EdtfIso201x level 0 
(Or, if it is to remain prohibited, make it explicit that it is so).

Times with minute precision are common; and forcing such a time to go from "2017-10-28T04:47" to "2017-10-28T04:47:00" misrepresents the time.

It remains possible that the Edtfloc writers didn't intend to forbid reduced precision times. If their intention was to forbid they could have, and should have, written in the spec something like "omitting seconds in a time is forbidden". And so it remains possible that EdtfIso201x has simply inherited this oversight.

Iso201x

Move EDTF to its own part 3?

Recommendation:

Yes, with regard to Iso201x move the EDTF profile to its own part 3.

As mentioned in the post above: this would make referring to sections in part 2 that aren't part of EDTF, conceptually easier.

Bookmarks

Recommendation:

Include bookmarks in the output.

... for the love of easy navigation.

IsoOrg

Github

Recommendation:

That Iso201x, and all other draft ISO standards, be hosted on Github to take advantage
of the open issue tracker.

The github issue tracker is a good one. And, moreover, provides an easy route to participation.

No cost

Recommendation:

That Iso201x, and all other ISO standards, be made available to the world without cost.

It is an absurd state of affairs that any standard, being a thing the we want to be adopted as widely as possible, should have a barrier to adoption like cost.

Edit: "supports" to "supported"; added commas to fix grammar.

Edit: 2017-12-28 Grouped some recommendations under the "IsoOrg" term.

@JohnLukeBentley JohnLukeBentley changed the title Datetimes. The draft ISO 8601-201x. Datetimes. The draft ISO 8601-201x and EDTF. Nov 21, 2017
@retorquere
Copy link

retorquere commented Nov 21, 2017

I have no strong opinions on the matter other than that:

  • I will have a pragmatic preference for whatever ETDF.js can parse, because I would very much prefer not to get into the EDTF parsing business
  • I have a conceptual preference for explicit over implicit, so I'd object to blank for that reason; "missing" is not a value. It also makes it rather annoying to detect what spec a date is in and to reformat it to a differen spec. I'm less sure about rejecting ... I dislike .. because it doesn't stand out, but are people really going to have EDTF dates in prose where the EDTF date should be parsed?
  • I lean towards going with "symbols over English words" indeed because it's a little odd to build language-specific keywords into a spec (remembering with displeasure here that e.g. Excel's functions would be localised and would thus be tied to that-language office... "Oh, GEMIDDELDE does not work in your Excel? How quaint"), but there's a real risk of running out of non-word symbols, and in the final analysis, words are symbols.

@JohnLukeBentley
Copy link
Author

JohnLukeBentley commented Nov 21, 2017

Note to @inukshuk (the EDTF.js author).

@plk
Copy link
Owner

plk commented Nov 21, 2017

I have just released biblatex/biber 3.9/2.9 to correct some bugs but these do not have the ISO changes. I did put in a deprecation notice in the changes file to warn people what is coming in terms of syntax changes, which are the only things users really need to care about.

@JohnLukeBentley
Copy link
Author

@plk thanks, that better gives us breathing room given the state of flux/uncertainty.

@retorquere all your reflections seem reasonable.

On ...

but are people really going to have EDTF dates in prose where the EDTF date should be parsed?

... while I think we can be confident about some of the contexts where EDTF is likely to be applied, I don't we can't be confident all of the contexts where these sort of standards might end up being applied. And the EDTF standard really ought be built to be context robust, as far as we can anticipate. EDTF datetimes might end up contexts where they are parsed or just presented. They might end up in urls, epub documents; and future protocols and tools not yet conceived.

In the specific context of presented prose it seems entirely plausible (although extremely rare given open date ranges are extremely rare) someone might write ...

The set of essays in the anthology analyzing Great trucking song of the renaissance are divided into two volumes: Vol. 1 for ../2020-02-01, and Vol. 2 for 2020-02-02/... Form and meaning reach ultimate communion in all of these essays. Lorem Ipsum ...

And so I think we need to test our candidate symbols by imagining as many plausible contexts as we can, and ask "Will that symbol break in that context?". Would a caret "^", or a double dot ".." cause parsing difficulties when using regex to parse? And, does a double dot ".." look ok in prose? etc,.

I don't think any of the above is likely to be a novel thought for you (nor anyone else likely to read this far in the thread): it's just what leaks out of my head in response to your question. :)

At all: I've made an email request via my national standards body to join the relevant ISO committee (i.e. to get access to the relevant online forum).

@JohnLukeBentley
Copy link
Author

JohnLukeBentley commented Nov 30, 2017

To join up to the relevant ISO committee I emailed the committee secretary directly. I have received no response from them.

But there has been some progress in my joining up via my national standards body (which seems to be the prescribed method). But I'll be waiting on a further response from that body not before 2017-12-07.

Edit: "2017-11-07" to ""2017-12-07" (whoops).

@JohnLukeBentley
Copy link
Author

JohnLukeBentley commented Nov 30, 2017

So I see from #665 you've (@plk and @moewew) decided to release changes to conform to EdtfIso201x fairly soon, on the basis that:

  • Holding off is starting to create, from a coding point of view, branch juggling problems;
  • "we are anyway currently running an non-standard draft which I’d [plk] rather not gave people [to] adopt ..."; and
  • moewew: "I believe that the features likely to change [as EdtfIso201x progresses toward and beyond ratification] are niche enough that this does not cause too much upheaval." and plk: "I'm fine with making the ISO change now and tweaking as necessary".

All that seems reasonable.

Perhaps @plk you could let us know in this thread when EdtfIso201x features are merged to the master (and released).

Edit: "ratification"; "[to]".

@moewew
Copy link
Collaborator

moewew commented Dec 18, 2017

We're hoping to release the next version of biblatex this week. That release will also feature the EDTF/ISO (draft) transition.

@JohnLukeBentley
Copy link
Author

Thanks @moewew. Great. I'm still progressing through the ISO committee joining up process.

@moewew
Copy link
Collaborator

moewew commented Jan 23, 2018

@JohnLukeBentley Any news here? The LOC has removed its link to the draft standard at the request of the ISO (https://www.loc.gov/standards/datetime/). And apparently the ISO 8601 draft is near the end of the enquiry phase, judging from how long things took the last time it will take another six months until the norm is published. And then of course we need to lay our hands on the standard. My university library does not have a subscription for standards any more.

@JohnLukeBentley
Copy link
Author

JohnLukeBentley commented Jan 25, 2018

@moewew (and all) I thought I replied to this minutes after your post: it probably got lost in a browser mishap at my end ...

Read my post first at inukshuk/edtf.js#12 (comment). That thread in general also demonstrates: that others in the ecosystem are depending on the standard; and it demonstrates an issue that requires a change in the standard (the specific case is to make it clear whether seasons in intervals are permitted ... if not to accept seasons in intervals).

I haven't heard back from my national standards body since that application I made 10 days ago (but I took some weeks to send in the application; and the national standards body was quite welcoming towards my making the application).

As noted in that other post of mine one of the first things I've intended to do, on the assumption that I'd be accepted, is suggest that standards process be opened up. E.g., By putting the standard on github (or an equivalent open platform).

But ISO ...

The LOC has removed its link to the draft standard at the request of the ISO (https://www.loc.gov/standards/datetime/)

... has made an obnoxious move.

Evidently EDTF was created as an open standard, then apparently given to ISO to be the shepherds of. ISO has turned around to the EDTF community and denied them (and therefore all) access to the evolved standard. It's like giving your cousin land by the creek only for your cousin to kick you off as a trespasser when you wander down for a swim.

Given that obnoxious move I'm tempted to suggest that we, all parties in the relevant ecosystem who have an interest in it, create an open EDTF datetime standard (which would probably have to march under a new name).

Against that suggestion is:

  • It's generally bad to create multiple standards;
  • That ISO is THE world standards body ... whose standards are much more likely to be adopted in the face of whatever any other small group does;
  • EdtfIso201x, being nearly complete, has done most of the hard work. A few issues aside it is in a fairly robust state;
  • We'd have do a lot of work.

So the issue is: as weighty as those reasons are, might they nevertheless be overridden by the odiousness of a closed standards development process and that ISO standards are not made available made at zero cost, once published?

I mean, by comparison, I found something (trivially) wrong with the W3C HTML 5.2 standard. I posted the error on github: w3c/html#1119 (comment). Three days later it was all fixed. There was no need to submit detailed applications to a standards body (detailing relevancy to stakeholders, importance for the nation, the industries that will benefit from the standard, etc.) simply to be allowed to express the wrong. And of, course, at zero cost we can read the W3C HTML 5.2 standard via a simple link https://www.w3.org/TR/html52.

(A similar real world example happened when I found a trivial error with the WHATWG standard: I'm not, here, favouring one side in the W3C V WHATWG split).

Edit: Added "(and all)" ... as all posts on github are ... the half-baked suggestion is very much to all.

Edit: Note to @njbart, @inukshuk, @retorquere,

Edit: "who's" to "whose".

@njbart
Copy link
Contributor

njbart commented Jan 26, 2018

“Obnoxious move” quite nails it – what the **** were these people thinking? Evading scrutiny by hiding a document most of us have downloaded long ago anyway?

Still, I don’t think creating our own, independent standard is a realistic project. But what we could do is to create and maintain an ISO8601 profile. Quoting from ISO/DIS 8601-2:2016(e) (2016-10-26), p. 27 (my emph.):

Different communities may define different profiles. Community is used loosely to mean a group with a common interest in 8601. It is not intended that 8601 profiles be approved by any formal body; any person or community can develop a profile. There should however be a unique name for every profile so that it may be referenced. The registration agency for ISO 8601 should register profiles upon request, and help to assure uniqueness of names.

According to p. 26, a profile may specify the following:

  1. It may list features of 8601 to be supported.
  2. In cases where there are multiple methods specified in 8601 to support a particular function, the profile may select a single method.
  3. In cases where there are different interpretations of a particular function, the profile may select a single interpretation, or provide clarification.
  4. It might list features that are not relevant and need not be supported.
  5. It might specify several levels of support.

Unless we feel that cases such as clarifying the validity of season intervals are already covered by 3., we should of course reserve our right to amend the profile accordingly.

What would be needed is a name – possibly “ISO 8601 – biblatex profile” would do – and a clear idea of which features to permit – possibly simply the date/time expressions currently valid in biblatex.

Ideally, functions for dealing with ISO 8601 profiles could be added to parsers such as edtf.js, so they would not only be able to parse specific ISO 8601 levels (“1”, “2”, or the [experimental] “3” in the case of edtf.js), but also specific ISO 8601 profiles.

@JohnLukeBentley
Copy link
Author

.... I don’t think creating our own, independent standard is a realistic project. . But what we could do is to create and maintain an ISO8601 profile. [Emphasis original].

That lateral and clever suggestion is very helpful. It probably avoids two problems:

  • The risk of copyright violations (I find it difficult entertain the thought that a date time standard might violate copyright ... but in the circumstances it is a thought that must entertained); and
  • A great deal of work (the existing EDTF standard, as found in ISO 8601:201X, can be leveraged).

So, from the parts you quoted, it seems perfectly consistent with ISO 8601:201X for the profile to be open.

I'm a bit unclear on what you mean by ...

Unless we feel that cases such as clarifying the validity of season intervals are already covered by 3., we should of course reserve our right to amend the profile accordingly.

Of course season intervals was a mere example to make your point. But, on that example, I do think season intervals is ambiguous in ISO/DIS 8601-2:2016(e) (2016-10-26) and therefore, in a profile, be clarified under 3. But by "we should of course reserve our right to amend the profile accordingly" I suppose that could mean, whatever your intentions, either:

  • That we reserve the right to amend the profile by maintaining and amending it as an open standard; and/or
  • That we reserve the right to amend the profile by maintaining and amending it in a manner that extends ISO 8601-2:2016; and/or
  • That we reserve the right to amend the profile by maintaining and amending it in a manner that contradicts ISO 8601-2:2016.

On my reading of ISO/DIS 8601-2:2016(e) (2016-10-26), essentially from the parts you quoted, an ISO 8601 profile can clarify and define a subset of ISO 8601. So, for example, we could clarify that seasons are permitted in intervals; and drop support for two digit centuries.

But, on my reading, an ISO 8601 profile can't extend ISO 8601 (Part 2 is already an extension to which a profile could refer): for that would make it cease to be an ISO 8601 profile ... and become an independent standard.

For example, ISO/DIS 8601:2016(e) (2016-10-26), part1 or 2, doesn't appear to support quarters, e.g. 2018-Q1 (See ISO/DIS 8601-2:2016(e), p14 "Divisions of a year").

Also it doesn't seem that an ISO 8601 profile could contradict ISO 8601. E.g. By changing the "uncertain and approximate" symbol from % to something else (like the former ?~).

Let's assume that's the case, that under an ISO 8601 profile extensions and contradictions of that sort are not permitted. Does that necessarily cause us problems? Let's test it against the suggestions on the table.

We have, from you:

  • Seasons in interval.

From @plk:

  • An alternative symbol for "uncertain and approximate" than %.

From me, there was originally only three relevant suggestions:

  • To use particular symbols for open and unknown in intervals.
  • Remove support for two digit centuries.
  • Permit minute level precision at EdtfIso201x level 0 (Or, if it is to remain prohibited, make it explicit that it is so).

And from me lately:

  • Permit quarters e.g. 2018-Q1 (which would seem to open the door to citing financial reports).

Taking these in order from easiest to hardest:

  • Remove support for two digit centuries seems straightforwardly permitted under 4 ("It might list features that are not relevant and need not be supported.")
  • I'm not so wedded to my ideal symbols for open and unknown in intervals that it needs to be an obstacle. So we could just drop that and go with the existing ISO symbols.
  • If I recall correctly @plk has already implemented % for "uncertain and approximate" and, apparently, avoided the problems he initially feared.
  • Seasons in intervals, as mentioned, seem permissible under 3 ("In cases where there are different interpretations of a particular function, the profile may select a single interpretation, or provide clarification.")
  • Minute level precision is also, arguably, ambiguous under EdtfIso201x and could therefore be regarded as permitted in our profile also under 3 (as a clarifying interpretation).
  • Quarters. That seems really good to have. But as the last remaining obstacle I think we could just jettison it as a worry.

In short it seems we could get away with creating an ISO profile, if that requires merely clarifying or removing features in the larger ISO 8601.

Alternatively we could create an "ISO 8601 profile-but". That is, an ISO profile but for a few extensions (e.g. quarters) or contradictions (e.g. a chosen symbol for "uncertain and approximate"). In other words, bite the bullet and create what is, in effect, an independent standard - although for the most part it looks like an ISO 8601 profile.

On a name for either kind of profile I think we'd want it to be broadly useful, not just for biblatex.

From ISO/DIS 8601-2:2016(e) (2016-10-26), p28 ...

The Extended Date/Time Format (EDTF) profile of ISO 8601 was developed by the bibliographic
community along with the participation of communities with related interests.

Our profile ought have the same usefulness, I'd suggest: to the bibliographic community and any other community that would find it useful. For what's being suggested here is a minor tweaking to EDTF (in either its old or new form). We'd also want to invite the EDTF crowd from http://www.loc.gov/standards/datetime/pre-submission.html.

I hold off any suggesting any specific name until we know what kind of profile we'd want.

Of course, there remains the third alternative: just suffer under the limits and ambiguities of EdtfIso201x.

By the way looking at https://www.iso.org/obp/ui#iso:std:iso:8601:-2:dis:ed-1:v1:en (see "Preview our standards" on https://www.iso.org/iso-8601-date-and-time-format.html) we can see that the "Reference number" remains as ISO/DIS 8601:2016(e) although the document is laid out differently to ISO/DIS 8601:2016(e) (2016-10-26). Presumably this latest ISO/DIS 8601:2016(e) is substantially identical to ISO/DIS 8601:2016(e) (2016-10-26).

@njbart
Copy link
Contributor

njbart commented Jan 27, 2018

I'm a bit unclear on what you mean by ...

Unless we feel that cases such as clarifying the validity of season intervals are already covered by 3., we should of course reserve our right to amend the profile accordingly.

My point is, even if ISO unambiguously stated they consider season/season intervals, or minute level precision to be invalid according to the norm, I’d still want them to be considered valid according to the biblatex (or whatever) profile. Technically, this would be a contradiction, but pragmatically I’d just do what @inukshuk did and declare that “Seasons in intervals are supported at the experimental/non-standard level 3” (see https://github.com/inukshuk/edtf.js). (Worst case is, we’d have to call it “The profile formely known as ISO 8601/biblatex” …)

As to quarters, ISO/DIS 8601-2:2016(e) (2016-10-26), p. 14, “4.7 Divisions of a year”, lists “33-36 = Quarter 1, Quarter 2, Quarter 3, Quarter 4 (3 months each)” as a level 2 feature – apart from your 2018-Q1 notation, how is that different?

@JohnLukeBentley
Copy link
Author

On the profile scope it does seems you want what I described as the second alternative ...

Alternatively we could create an "ISO 8601 profile-but". That is, an ISO profile but for a few extensions (e.g. quarters) or contradictions (e.g. a chosen symbol for "uncertain and approximate"). In other words, bite the bullet and create what is, in effect, an independent standard - although for the most part it looks like an ISO 8601 profile.

I do like it that this is what you are going into bat for. You are probably right that we don't want to abandon the right to extend or contradict ISO 8601 - that's probably unnecessarily tying our own hands. Presumably if we extend or contradict ISO 8601 then ISO can't complain about copyright violation. Because to the extent it is a profile this is allowed to be openly published and to the extent that we express something new, a contradiction or extension, ... well you don't violate the copyright of an author for expressions they didn't make.

(Worst case is, we’d have to call it “The profile formerly known as ISO 8601/biblatex” …)

... yes, or something like: "IS0 8601 Bibliographic [or whatever] Datetime Profile With Extensions And Contradictions".

That might make it an "independent standard" after all. However, adjudicating between competing abstract natures, "independent standard" or "ISO 8601 Profile-Like-Thing", might be a bit arbitrary and for that reason unnecessary. It's probably more important to get clear on whether we'd want it be ISO 8601 profile conformant (we don't necessarily) or largely 8601 profile conformant with room for extensions and contradictions (the current position is that we (you and I at least) do want this).

On the incidental matter of quarters, thanks for pointing to it in ISO/DIS 8601-2:2016(e) (2016-10-26). I simply missed it.

Tony Benedetti has a very worthy suggestion with regard to divisions of a year DATETIME@LISTSERV.LOC.GOV > "Division of Year codes", Thu, 4 Jan 2018 20:01:04 -0500. A mnemonic scheme that, even though it would reject the traditional Q1 ... Q2 format, probably makes more sense given the existence of other divisions (Trimesters (wrongly and currently called "Quadrimester") and Semesters), and Tony's proposed notation for them.

This is worth mentioning here as evidence of continuing ideas about the EDTF standard (whichever one we have in mind). Moreover ideas that contradict the existing EDTF standard (whichever one we have in mind). It does seem right that if we created an open standard we'd want to do so to allowed that standard to openly evolve, to be open to evaluate plausibly good ideas like Tony's, in a way that could very well extend or contradict ISO 8601.

@JohnLukeBentley
Copy link
Author

@plk (biblatex datetime implementer) and @inukshuk (edtf.js implementer upon which zotero-better-bibtex depends) what say you ...

Would you, in principle, be interested in supporting an "Bibliographic [or some more general identifier] Datetimes: An IS0 8601 Profile with Extensions And Contradictions"?

That is:

  • As far as possible a "IS0 8601 Profile" of ISO 8601-201x. An "IS0 8601 Profile" can only define a subset of ISO 8601-201x and clarify any ambiguities in ISO 8601-201x (this clever idea that our standard, if we are to have one, should be based on an IS0 8601 Profile comes from @njbart a few posts above) ; But
  • We'll "Extend" ISO 8601-201x where necessary, introducing datetime kinds where we think ISO 8601-201x should have included them. E.g. ISO 8601-201x doesn't mention seasons in date ranges. We might, therefore, include seasons in date ranges.
  • And we'll "Contradict" ISO 8601-201x where necessary. E.g. ISO 8601-201x uses "%" for uncertain and approximate dates. We might therefore continue with the EDTF traditional "?~".

Those examples, of course, are mere examples. I don't mean to open debate on them here.

To save you reading the above thread, the ISO process just seems too odiously closed. So the motivation for creating our own standard, which nevertheless attempts to track parts of ISO 8601-201x as closely as possible, is allow for an open development and publication. Namely develop the standard on Github, thereby enabling anybody to raise issues about it, and make suggestions for its improvement. And allow anybody to read the standard, without cost.

For the moment I think we ought be, at least I am, agnostic on:

  • Governance (who makes decisions about the standard);
  • The form of publication (E.g. A link to a pdf doc on github, or a website with its own url, etc);
  • The source language (e.g. XHTML, Latex, Markdown, or a Wiki (noting a Wiki not really a "source" language));
  • Etc.

If you both approved I'd then make invitations to the EDTF folk about what they think of such a standard. Approval from both of you I think would be necessary, but sufficient, for we who are interested to proceed.

Note also to @njbart, @moewew (thanks for the stack overflow answers in recent days), @retorquere. Any further thoughts from you fine folk would be most welcome.

@plk
Copy link
Owner

plk commented Feb 9, 2018

I think that this is a good idea and I support it.

@JohnLukeBentley
Copy link
Author

@plk: awesome.

@retorquere
Copy link

I'll be happy to pitch in where I can, but you guys know a lot more than I do about this. I'm just a consumer of the standard, and I usually lean on @njbart to assess whether what I'm doing with it makes much sense.

@JohnLukeBentley
Copy link
Author

@retorquere, great. I'll note you'll still be busy with the start of your PhD, among other things in life. So I'm essentially wanting to keep you in the loop rather than place further time burdens on you.

And am I right to understand you essentially rely (at the moment) on whatever EDTF.js permits? That is, with respect to all those test datetime formats that I threw at you, that essentially entailed checking that zotero-better-bibtex would delegate to EDTF.js properly?

@Crissov
Copy link

Crissov commented Feb 9, 2018

I'd be interested in joining an effort to develop a public open-source profile of ISO 8601. For a v1.0 release, we'd obviously need to wait for ISO to finally release the two parts of its date and time notation standard. Meanwhile, work could be done under the umbrella of https://standards.github.io.

Full disclosure: I've been exploring the possibilities of extensions to ISO 8601 notation for years, resulting in the International Calendar. Much of this would be beyond the scope of a proper profile of ISO 8601, but it may be in scope of a community standard for date and time notation.

@JohnLukeBentley
Copy link
Author

@Crissov, great!

To be clear a "ISO 8601 profile with Extension and Contradictions" would not be a "ISO 8601 profile" as such. For to be an ISO 8601 profile, the nature of such a profile is defined ISO 8601:201x, you couldn't have extensions or contradictions of ISO 8601:201x.

That confusion might bear on the name of our standard.

So that you have been interested in extensions "beyond the scope of a proper profile of ISO 8601" means your aims are shared here.

Naturally there would be a future discussion around the scope of our standard: whether we want to keep to small, but large enough to cope with bibliographic, or other, needs V be permissive about what extensions are permitted. At the moment I'd be on the small side of the fence, in keeping with the heritage from EDTF.

we'd ... need to wait for ISO to finally release the two parts of its date and time notation standard.

For a v1.0 release, yes.

But the ISO draft already in the wild looks to be near completion. I think it unlikely any substantive changes will be made before release. And if there are changes, whether substantive or minor, on release they could be readily incorporated. So, as you suggest, I don't think we need to wait for ISO release to begin developing our standard.

https://standards.github.io.

Thanks very much for that! I was unaware of its existence. It looks like a new thing. Very timely. At the moment it essentially looks like a meta standard (there is https://standards.github.io/explore/meta/) for using github for standards.

http://calendars.wikia.com/wiki/International_Calendar

I'll have to give that a proper look later! Good to learn there's another standard around which a great deal of work and thought has been already done.

@retorquere
Copy link

retorquere commented Feb 9, 2018

@JohnLukeBentley thanks for understanding. I'm stretched pretty thin these days.

BBT uses EDTF.js as one of the ways to detect and parse dates, but it's not the only thing I do. BBT gets offered quite a bit of input that is easily recognizable by humans as dates but are pretty gnarly to parse. To parse dates:

  1. the input is passed through a number of heuristic detectors first (textual dates, fields in weird order, some patterns that are valid EDTF but which really never is the intended meaning, such as 2017/08 being a valid EDTF date range), and if that fails
  2. it goes through EDTF.js, and if that fails,
  3. I translate months and seasons from their textual representation to English best I can and it goes throught edtfy + EDTF.js, and if that fails
  4. I test whether I can split on a number of separators with both sides getting positive detection through steps 1-3 and treat it as a range, and if that fails
  5. I assume it's literal text.

step 2 (with EDTF.js) is important because it picks up the bulk of the parsing work, but the other 3 do substantial work nonetheless. It's not perfect, and there are known problems (such as that you can't specify the EDTF range 11/2010, just 0011/2010), but it gets the job done most of the time.

@inukshuk
Copy link

inukshuk commented Feb 9, 2018

EDTF.js was written primarily for its use in Tropy and (hopefully / eventually) Zotero; it was picked mostly because CSL / citeproc was leaning towards adopting EDTF some time ago. While the parser is mostly complete, we'll be improving locale-aware formatting and want to design a dedicated input widget (again, those needs arise mostly from its use in Tropy, but would extend to Zotero as well). Having said all that, I'm happy to support the format which will be most relevant for users of Tropy, Zotero and CSL-aware tools in general. As I'm not too keen on standards proliferation, I still hope that the upcoming ISO standard will manage to incorporate all your requirements, but if that's not possible I'm happy to make the necessary adjustments to EDTF.js. BibLaTeX, Tropy, Zotero, CSL – this is pretty much the target audience of the EDTF.js parser.

@retorquere
Copy link

Assuming tropy is an electron app, the widget would become relevant to zotero after their port to electron. It's currently Firefox (xul) based. That, and users entering date information manually isn't the primary way (in my experience) that date information gets into zotero; much is scraped from websites, and that date info is usually not EDTF. They have a hacky heuristic parser just like mine.

@moewew
Copy link
Collaborator

moewew commented Feb 9, 2018

Obligatory XKCD: https://xkcd.com/927/

Setting up a new standard means a lot of work. At least that is what I imagine - and it takes a lot of time if done properly. If that is done, having a standard that no one adopts it is hardly of any use. Even if biblatex and a few other bibliography-related tools were to use it, it would still just be 'that loony bibliography date format used by those tools that couldn't be bothered to adopt the proper ISO date format'.

I'm also worried about feature creep, but that might just be me. As soon as someone insists on being able to input their time zone not as UTC offset, but using their favourite zone abbreviation all hell will break loose. I assume that the ISO people have a good think about what would be 'too much'.

If the new standard is not supposed to be an ISO profile it would probably need to distance itself from ISO to avoid infringing upon their rights (whatever that would mean in practice).

@Crissov
Copy link

Crissov commented Feb 10, 2018

If this community standard was a small superset of a big subset of ISO 8601, the goal would probably be that it could and would become a proper profile of the international standard after its next revision in a couple of years. For instance, ISO probably won’t add established quarter notation 2018-Q1 this late in the game to ISO 8601-2:2018, but an extension that supported it would not break anything.

@retorquere
Copy link

(for the same reasons as I outlined above, locale-aware won't help in many use-cases; in western philosophy eg it is quite common to import French or German (textual) dates even if my locale is set to English. One of the major bonuses of using a tool like Zotero is that it automates a lot of crap-cleaning when scraping references, but it doesn't touch dates, probably to avoid unintentional data loss on misparse. Date and name parsing... Oy)

@JohnLukeBentley
Copy link
Author

JohnLukeBentley commented Feb 17, 2018

I've created a new thread "Should we create an Open Datetime Format?" in a new repository, open-datetime-standard-bootstrap. I'll link to that thread below. To segue there ...

@inukshuk wrote:

... I'm happy to make the necessary adjustments to EDTF.js ...

Terrific @inukshuk, thanks!

@inukshuk wrote:

I still hope that the upcoming ISO standard will manage to incorporate all your requirements.

At this stage, that's unlikely to happen. I've had no response back from my national standards body, Standards Australia, about my application to join the relevant ISO technical committee. Standards Australia have been most welcoming and encouraging during the application process. So I'm making the assumption that my email ended up in a spam trap, was missed in the inbox, or some such innocent explanation.

However, as mentioned earlier in this thread (which I'm not expecting you to have read) ISO pulled the pdf copy of their draft standard from the EDTF Library of Congress site (http://www.loc.gov/standards/datetime/). That move by ISO doesn't express a move away from developing standards under a closed regime. So that's become enough for me to not chase up my application to join ISO. So, as far as I'm aware, ISO has not seen my recommendations. Therefore my recommendations are unlikely to be incorporated.

Moreover, I didn't necessarily want my recommendation to be incorporated. Rather I'd want them discussed because there may well be good reasons not to incorporate them.

@retorquere

[EDTF.js] picks up the bulk of the parsing work, but the other 3 [zotero-better-bibtex tasks] do substantial work nonetheless.

Well then, I'll need to tar you with the "implementer" brush. It will be therefore necessary to get your provisional endorsement, as has been given by @plk and @inukshuk, to proceed. Provisional on further details of the proposed standard continuing to be feasibly a good idea. But I'd invite you to do that in the new thread, which provides some further details.

One of the major bonuses of using a tool like Zotero is that ... it doesn't touch dates,

Yes, that helps us here. We don't need to convince the Zotero devs of anything, although I do intend to keep them in the loop, if we go ahead with the standard.

@Crissov

If this community standard was a small superset of a big subset of ISO 8601,

Yes, that's current intention. The "subset" (it's hard to say whether it is "big" or not) of ISO 8601:201x is the ISO 8601:201x Profile - Annex C The Extended Date/Time Format.

the goal would probably be that it could and would become a proper profile of the international standard ... ISO probably won’t add established quarter notation 2018-Q1 this late in the game to ISO 8601-2:2018, but an extension that supported it would not break anything.

I address this sort of idea in the new thread, proposing a version of it. I've become much clearer about a possible relationship between our standard and ISO 8601 profiles.

@inukshuk and @moewew have raised the standards proliferation concern. @moewew further raised concerns about: the work needed to be done; copyright; that our standard would languish as incidental; and feature creep.

Apart from feature creep I've mentioned those concerns earlier in the thread. And (apart from feature creep) it's good for these issue to be raised again, for that has prompted me to flesh them out in the new thread.

Feature creep is also a legitimate concern. I'd agree that just because someone proposes a datetime format, it doesn't follow it should be admitted. I repeat this concern in the new thread.

So given the further details in the new thread, in that new thread, from @plk (biblatex datetime (and may other aspects of biblatex) implementer); @inukshuk (EDTF.js implementer); and @retorquere (zotero-better-bibtex implementer) - I'll invite you to make a further provisional endorsement of the proposal. A "Go:let's-create-it"/"no-go: let's not create it" decision.

Of course in that thread from all, @plk, @inukshuk, @retorquere, @njbart, @moewew, @Crissov, and any other folk who are interested, you should feel free to: press objections; mention creative solutions; explore ideas, etc.

Continues at "Should we create an Open Datetime Format?" JohnLukeBentley/open-datetime-standard-bootstrap#1

@danbri
Copy link

danbri commented Jul 12, 2018

Hi folks. Checking in here from Schema.org, where we are hoping to find an answer to the longstanding question of how to express open-ended data ranges (e.g. for datasets that are currently being generated/extended live). Currently we've been hoping an updated ISO 8601 will address this. Is that naive? Do you have any better suggestions?

@JohnLukeBentley
Copy link
Author

@njbart, @plk, @inukshuk, and @moewew.

Have you seen JohnLukeBentley/open-datetime-standard-bootstrap
/First Draft Uploaded. Does it relate to the ISO standards desirably? #2
? I'm worried that github notifications haven't been getting through to you from that thread; and I'm hoping they might from this thread.

To save you having to read the whole thread. I'm essentially wanting to know if BibliographicDatetimeFormat.pdf (direct download) looks worth developing. It would be better to reply in that thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants