Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve -00:00 offset #1042

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

pitdicker
Copy link
Collaborator

@pitdicker pitdicker commented Apr 28, 2023

RFC 3339 and RFC 2822 make a distinction between an offset of +00:00 and -00:00.

RFC 2822:

The form "+0000" SHOULD be used to indicate a time zone at Universal Time. Though "-0000" also indicates Universal Time, it is used to indicate that the time was generated on a system that may be in a local time zone other than Universal Time and therefore indicates that the date-time contains no information about the local time zone.

From RFC3339:

If the time in UTC is known, but the offset to local time is unknown, this can be represented with an offset of "-00:00". This differs semantically from an offset of "Z" or "+00:00", which imply that UTC is the preferred reference point for the specified time.

Currently chrono drops the distinction, but I would like to preserve it.

  • I noticed format/scan.rs was already written with the functionality of this RFCs in mind, but did not go all the way. It now follows the specs better and can sometimes give better parser errors.
  • Modifying Parsed to preserve this information is a bit messy as all the methods are public. Adding a public constant NO_OFFSET_INFO seemed the cleanest solution. Parsing doesn't support more than two digits for the offset hours, limiting the values that are passed to Parsed::set_offset to 99 * 3600 + 59 * 60 = 359940. The value of NO_OFFSET_INFO is way outside that range, so no chance for conflicts.
  • FixedOffset can be modified in a reasonably clean way to encode -0: shift the value 2 bits left and encode this special state as an OFFSET_UNKNOWN flag in the rightmost bits. Advantage of this encoding is that it only takes a single shift right to get the offset in seconds, so this should not measurably impact performance.
  • The same shifting trick enables using NonZeroI32, as long as we set one of the rightmost bits to something non-null with the OFFSET_NORMAL flag. This gives FixedOffset a niche. Option<DateTime<FixedOffset>> is now the same size as DateTime<FixedOffset>: 16 bytes instead of 20.
  • For comparisons and hashing the flags are ignored. This way -00:00 is treated as equal to +00:00, which is usually what is expected. Also this way we remain backwards compatible.
  • FixedOffset::no_offset_info returns whether the offset is -00:00.
  • FixedOffset::OFFSET_UNKNOWN can be used to initialize an offset to -00:00.

src/format/parse.rs Outdated Show resolved Hide resolved
@pitdicker pitdicker changed the title No offset info Preserve -00:00 offset Apr 28, 2023
@djc
Copy link
Member

djc commented Apr 28, 2023

Currently chrono drops the distinction, but I would like to preserve it.

Do you have an actual use case for this?

@pitdicker
Copy link
Collaborator Author

Ai, why do you have to ask that 🤣 ?
As a hobby I am making my own feed reader, and I would like to know the time an entry was published by the author in his own timezone. If the timestamp indicates it knows the time in UTC but the relation to the local time is unknown, I find that interesting. But currently chrono offers no easy way to know.

@pitdicker
Copy link
Collaborator Author

As a better reason: we would follow the RFCs more closely, and be able to round-trip these timestamps.

@djc
Copy link
Member

djc commented Apr 28, 2023

In general, I would concur we want to follow the RFCs closely. But at the same time, "you find it interesting" to know this doesn't seem like a strong justification for adding ~90 lines of code on net.

@pitdicker
Copy link
Collaborator Author

That is fair.

Honestly the changes felt small, so i did some counting:

  • 17 lines for testing the size of common types in chrono.
  • 30 would also be there if we wanted to use the part of encoding an offset in an NonZeroI32 to get the niche optimization.
  • that leaves ca 45 netto for the feature and documentation

@@ -853,7 +855,7 @@ fn test_rfc2822() {
("Tue, 20 Jan 2015 17:35:90 -0800", Err(OUT_OF_RANGE)), // bad second
("Tue, 20 Jan 2015 17:35:20 -0890", Err(OUT_OF_RANGE)), // bad offset
("6 Jun 1944 04:00:00Z", Err(INVALID)), // bad offset (zulu not allowed)
("Tue, 20 Jan 2015 17:35:20 HAS", Err(NOT_ENOUGH)), // bad named time zone
("Tue, 20 Jan 2015 17:35:20 HAS", Err(INVALID)), // bad named time zone
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would now report the correct error here.

// named single-letter timezone "J" is specifically not valid
("Tue, 20 Jan 2015 17:35:20 J", Err(NOT_ENOUGH)),
("Tue, 20 Jan 2015 17:35:20 J", Err(INVALID)),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

Copy link
Contributor

@jtmoon79 jtmoon79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very compelled by this standards-following PR. I thought something wasn't quite right about how the various RFC and +00:00 and -00:00 were being handled but I couldn't put my finger on it (and it seemed like a difficult bramblebush to crawl through).

Thanks much @pitdicker , this is a great change!

src/format/parse.rs Outdated Show resolved Hide resolved
@@ -853,7 +855,7 @@ fn test_rfc2822() {
("Tue, 20 Jan 2015 17:35:90 -0800", Err(OUT_OF_RANGE)), // bad second
("Tue, 20 Jan 2015 17:35:20 -0890", Err(OUT_OF_RANGE)), // bad offset
("6 Jun 1944 04:00:00Z", Err(INVALID)), // bad offset (zulu not allowed)
("Tue, 20 Jan 2015 17:35:20 HAS", Err(NOT_ENOUGH)), // bad named time zone
("Tue, 20 Jan 2015 17:35:20 HAS", Err(INVALID)), // bad named time zone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add extra asserts

("20 Jan 2015 17:35:20 +0000", Ok("Tue, 20 Jan 2015 17:35:20 +0000")),
("20 Jan 2015 17:35:20 -0001", Ok("Tue, 20 Jan 2015 17:35:20 -0001")),
 ("Tue, 20 Jan 2015 17:35:20 -9900", Err(OUT_OF_RANGE)),  // bad offset

(adjusted to whatever the correct value is)

src/format/scan.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated

#[test]
#[allow(deprecated)]
fn test_type_sizes() {
Copy link
Contributor

@jtmoon79 jtmoon79 Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if this has been tested on an x86 (32 bit) platform?

IME FreeBSD x86 and OpenBSD in-general is always little weird about sizings of things (often subtle differences from Linux anyway).

If I get around to it, I'll try to compile this PR on a FreeBSD 13 x86 VM.

// of +0:00 and -0:00.
// Advantage of this encoding is that it only takes a single shift right to get offset in
// seconds.
local_minus_utc: NonZeroI32,
Copy link
Contributor

@jtmoon79 jtmoon79 Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this code comment a docstring?
In the docstring, would you tell the user what helper functions they should call to extract various infos? Can you provide those helper functions? Can you have this PR use those bit-shift helper functions?
Even if it's only chrono developers that need to know, (even if those docstrings aren't actually published) IMO it's still worthwhile to write it down in "official-seeming" manner.

Otherwise I might assume I have to do the bit shifts and masks myself... and those are programming approaches I would like everyone to move on from. 😬


On the other hand ...

honestly, I'd prefer to have discrete struct members for each piece of information embedded here. I'm not persuaded the runtime efficiency-savings is worth the mental overhead.

Also, I'm not sure, but I'm concerned it might be a problem on less-common Big Endian platforms (I don't recall where and when Rust gives guarantees around bit endianness).

(IIUC, combining various information into one NonZeroI32 member is only done for a little less memory usage at runtime?)

Copy link
Collaborator Author

@pitdicker pitdicker May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly, I'd prefer to have discrete struct members for each piece of information embedded here. I'm not persuaded the runtime efficiency-savings is worth the mental overhead.

This is somewhat comparable to NaiveDate, which fits a year, ordinal, leap-year flag and weekday flags in an i32.

I agree that using individual struct members is usually better. But chrono's DateTime<FixedOffset> is a pretty fundamental type in the rust ecosystem. It should not increase in size without very good reason.

Also, I'm not sure, but I'm concerned it might be a problem on less-common Big Endian platforms (I don't recall where and when Rust gives guarantees around bit endianness).

In safe Rust, we do not have to worry about this. As soons as we start transmuting it becomes messy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this code comment a docstring?
In the docstring, would you tell the user what helper functions they should call to extract various infos? Can you provide those helper functions? Can you have this PR use those bit-shift helper functions?
Even if it's only chrono developers that need to know, (even if those docstrings aren't actually published) IMO it's still worthwhile to write it down in "official-seeming" manner.

I would like to keep the implementation details private.
But we definitely have helper functions which this PR uses: local_minus_utc() and no_offset_info().

I'll mention them in a comment:

    // Use `local_minus_utc()` to get the offset in seconds, and `no_offset_info()` to inspect the
    // `OFFSET_UNKNOWN` flag.

src/offset/fixed.rs Show resolved Hide resolved
@jtmoon79
Copy link
Contributor

jtmoon79 commented Apr 29, 2023

RFC 3339 and RFC 2822 make a distinction between an offset of +00:00 and -00:00.

RFC 2822:

The form "+0000" SHOULD be used to indicate a time zone at Universal Time. Though "-0000" also indicates Universal Time, it is used to indicate that the time was generated on a system that may be in a local time zone other than Universal Time and therefore indicates that the date-time contains no information about the local time zone.

From RFC3339:

If the time in UTC is known, but the offset to local time is unknown, this can be represented with an offset of "-00:00". This differs semantically from an offset of "Z" or "+00:00", which imply that UTC is the preferred reference point for the specified time.

Currently chrono drops the distinction, but I would like to preserve it.

* I noticed `format/scan.rs` was already written with the functionality of this RFCs in mind, but did not go all the way. It now follows the specs better and can sometimes give better parser errors.

* Modifying `Parsed` to preserve this information is a bit messy as all the methods are public. Adding a public constant `NO_OFFSET_INFO` seemed the cleanest solution. Parsing doesn't support more than two digits for the offset hours, limiting the values that are passed to `Parsed::set_offset` to `99 * 3600 + 59 * 60 = 359940`. The value of `NO_OFFSET_INFO` is way outside that range, so no chance for conflicts.

* `FixedOffset` can be modified in a reasonably clean way to encode `-0`: shift the value 2 bits left and encode this special state as an `OFFSET_UNKNOWN` flag in the rightmost bits. Advantage of this encoding is that it only takes a single shift right to get the offset in seconds, so this should not measurably impact performance.

* The same shifting trick enables using `NonZeroI32`, as long as we set one of the rightmost bits to something non-null with the `OFFSET_NORMAL` flag. This gives `FixedOffset` a niche. `Option<DateTime<FixedOffset>>` is now the same size as `DateTime<FixedOffset>`: 16 bytes instead of 20.

* For comparisons and hashing the flags are ignored. This way `-00:00` is treated as equal to `+00:00`, which is usually what is expected. Also this way we remain backwards compatible.

* `FixedOffset::no_offset_info` returns whether the offset is `-00:00`.

* `FixedOffset::OFFSET_UNKNOWN` can be used to initialize an offset to `-00:00`.

Can you find an appropriate docstring to place this information (especiially the quoted RFC stuff)? This is a great explanation of little finicky area that I suspect quite a few users will run into.
You might find it appropriate to split up the info and place within various user-facing functions. Whatever you think is good. So long as this subtlety is explained to the user.

@esheppa
Copy link
Collaborator

esheppa commented Apr 30, 2023

Interestingly, this was a common case that was causing failures in tests/dateutils.rs::verify_against_date_command_format_local, if this is merged, we may be able to ensure the offsets match as well, rather than just the date/time values.

@pitdicker
Copy link
Collaborator Author

Can you find an appropriate docstring to place this information (especially the quoted RFC stuff)? This is a great explanation of little finicky area that I suspect quite a few users will run into.
You might find it appropriate to split up the info and place within various user-facing functions. Whatever you think is good. So long as this subtlety is explained to the user.

Good idea, I have written an introduction for the FixedOffset type, and included a small reference in the docstring of the Parsed::offset field.

@pitdicker pitdicker force-pushed the no_offset_info branch 3 times, most recently from c31316b to e070c8a Compare May 4, 2023 09:15
@pitdicker
Copy link
Collaborator Author

Interestingly, this was a common case that was causing failures in tests/dateutils.rs::verify_against_date_command_format_local, if this is merged, we may be able to ensure the offsets match as well, rather than just the date/time values.

Great if this would help with an existing test 😄.

@pitdicker pitdicker force-pushed the no_offset_info branch 3 times, most recently from 60c0f59 to f6f0495 Compare May 10, 2023 11:54
@pitdicker pitdicker force-pushed the no_offset_info branch 3 times, most recently from f59ed7b to d82ca0e Compare May 17, 2023 15:21
src/lib.rs Outdated Show resolved Hide resolved
src/datetime/tests.rs Show resolved Hide resolved
@pitdicker pitdicker force-pushed the no_offset_info branch 2 times, most recently from fd30238 to 5ecc449 Compare May 19, 2023 15:07
Copy link
Contributor

@jtmoon79 jtmoon79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve still, with suggestion.

src/offset/fixed.rs Outdated Show resolved Hide resolved
Copy link

codecov bot commented Feb 4, 2024

Codecov Report

Attention: Patch coverage is 91.95402% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 92.16%. Comparing base (e292d9b) to head (cc18784).

Files Patch % Lines
src/offset/fixed.rs 79.31% 6 Missing ⚠️
src/format/parsed.rs 87.50% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1042   +/-   ##
=======================================
  Coverage   92.16%   92.16%           
=======================================
  Files          40       40           
  Lines       18052    18101   +49     
=======================================
+ Hits        16637    16683   +46     
- Misses       1415     1418    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants