Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move timezone conversion logic to DatetimeColumn #15545

Merged
merged 7 commits into from
May 3, 2024

Conversation

mroeschke
Copy link
Contributor

Description

Moves methods/logic in python/cudf/cudf/core/_internals/timezones.py to the newly created DatetimeColumn.tz_localize and DatetimeColumn.tz_convert.

Additionally adds typing and improves an error message when doing tz_convert(None) on a tz-naive Series/Index to raise a TypeError (like pandas) instead of an AttributeError

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 16, 2024
@mroeschke mroeschke requested a review from a team as a code owner April 16, 2024 22:16
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor cleanups, looks good to me (though I am definitely tz-naive).



@lru_cache(maxsize=20)
def get_tz_data(zone_name):
def get_tz_data(zone_name: str) -> Tuple[DatetimeColumn, TimeDeltaColumn]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Please update the docstring description of the return value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. Updated

@@ -77,99 +76,23 @@ def _find_and_read_tzfile_tzdata(zone_name):
raise zoneinfo.ZoneInfoNotFoundError(zone_name)


def _read_tzfile_as_frame(tzdir, zone_name):
def _read_tzfile_as_frame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polish: rename to _read_tzfile_as_columns, since we no longer return a frame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Change it to that

cond = clock_1 > clock_2
nonexistent_begin = clock_2.apply_boolean_mask(cond)
return (as_column([min_date]), as_column([np.timedelta64(0, "s")]))
return tuple(transition_times_and_offsets) # type: ignore[return-value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: This type ignore is because the cython wrapper has no typing information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I believe so. mypy is picking up the call from the cython function as list[Any]

) -> DatetimeTZColumn:
def check_ambiguous_and_nonexistent(
ambiguous: Literal["NaT"], nonexistent: Literal["NaT"]
) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: parse don't validate, check_ambiguous_and_nonexistent(...) -> tuple[Literal["NaT"], Literal["NaT"]] and return those values (and then use them in callers).

Though here it's a bit of a wash since you're not doing any type narrowing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing can change it to parse (although these arguments are not actually used)

@@ -688,6 +702,119 @@ def _with_type_metadata(self, dtype):
)
return self

def _find_ambiguous_and_nonexistent(
self, zone_name: str
) -> Tuple[NumericalColumn | bool, NumericalColumn | bool]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> Tuple[NumericalColumn | bool, NumericalColumn | bool]:
) -> Tuple[NumericalColumn, NumericalColumn] | Tuple[bool, bool]:

Comment on lines 712 to 713
size as `data`, that respectively indicate ambiguous and
nonexistent timestamps in `data` with the value `True`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size as `data`, that respectively indicate ambiguous and
nonexistent timestamps in `data` with the value `True`.
size as `self`, that respectively indicate ambiguous and
nonexistent timestamps in `self` with the value `True`.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Changed to self

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

python/cudf/cudf/core/column/datetime.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/datetime.py Outdated Show resolved Hide resolved
@wence-
Copy link
Contributor

wence- commented May 3, 2024

/merge

@rapids-bot rapids-bot bot merged commit ce6902f into rapidsai:branch-24.06 May 3, 2024
71 checks passed
@mroeschke mroeschke deleted the ref/tz_localize_convert branch May 3, 2024 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants