-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial timezone provider (location based + fallbacks) #96
Conversation
Cool, looks like a good starting point for something else I wanted to do. I've parsed the google takeout a bit differently (so Im not sure if the dateinfo on mine is the same); using lxml so I could keep my sanity, specifying lots of the other attributes from the takeout; it does take about a minute to parse the first time I launch it; after that its behind cachew. I also only have data going 3 years back from there, while I have data from facebook/forums going back to 2013 (when I was in a different timezone). So I'm unsure if I can just copy/paste this, I think I'll have to create some system that doesn't primarily use the google takeout. I still like the idea of being able to localize a datetime from some source into the timezone I was in at the time. Personally, I don't change timezones (i.e. physically move) that often, and I havent for the past few years. So, related to location data, my plans were to:
|
Ah I see! Makes sense, just somehow didn't occur to me to do it via
Yep, that's implemented here as well! https://github.com/karlicoss/HPI/pull/96/files#diff-17afdef3985dd7b19cb4ffc9fb86179dR71-R76 Timezones are kind of indirect (determined via coordinates), but I thought it makes sense because that way you have both location data and timezone (as opposed to if only the timezone was specified).
Yeah, gpslogger is great, I just need to play with its outputs for a bit, and maybe I'll even switch from google location completely.
Yep, I've seen it, great idea! Don't think I have any ip data going back further than the location data I have, but it's certainly a kind of interaction would be very cool to support. If implemented, it would probably belong here (I'm thinking now that from . import via_location
from . import via_ip
def localize(dt):
for provider in [via_location, via_ip]:
dt = provider.localize(dt)
if dt.tzinfo is not None:
return dt
return dt # or raise error? depending on what user prefers Alternatively, ipgeocache could be integrated into the location provider, and then IP data would be automatically handled by |
Ah, nice. I'll do the
nice! this is pretty much what I was planning to do
Yeah, I agree. I had looked at
👍
I think I'd agree, yeah. someone familiar with python scanning the directory for the first time is more likely to know
Only thing to watch out with the 99% of the time the location data from the IP data is fine though. Summarizing, in order, my 'resolution order' for location (and therefore timezone) for some timeframe (not sure if this would just be day-based, or maybe one could specify a
|
Indeed, good point. In many cases it's also ISP address, so not super accurate too (still good enough for timezone). Can be combated with some sort of outlier detection, but either way possible to fine tune by filtering out the offending ip ranges.
Hm I guess there are two usecases for manually marked stuff. One is the 'default/home' which I've implemented in this PR, used as a fallback when you have a gap in automatic data. |
Had a look at diff --git a/my/location/home.py b/my/location/home.py
index 896ebca..641a872 100644
--- a/my/location/home.py
+++ b/my/location/home.py
@@ -56,7 +56,17 @@ def get_location(dt: datetime) -> LatLon:
"""
Interpolates the location at dt
"""
+ if not config._past:
+ return config.current
+ prev_dt: datetime = datetime.now()
for loc, pdt in config._past:
- if dt <= pdt:
+ # iterating moving from today to the past,
+ # if this datetime is in between the last time reported
+ # and this one, return the location of the last time reported LatLon
+ # (prev_dt would be the next place I moved to)
+ if prev_dt >= dt and pdt < dt:
return loc
+ prev_dt = pdt
+ from ..core.warnings import medium
+ medium("Don't have any location going back further than {}, using current location".format(prev_dt))
return config.current If one just did |
Ah! I've specified them in the order from the oldest to newest in my config, whereas seems that you specified from the newest to oldest? I guess the safest thing would be to sort in |
Now that I look back at it, sorting from oldest to newest makes the loop a lot nicer, yeah. Removes both of my checks/the warning message. Will add a sort to |
Hmm. I still feel like I need to have two pointers through the loop though. Sorted and reverted back to your In [4]: from my.location.home import get_location, config
In [5]: config
Out[5]: home(current='here_now', past=[('location_one', '2019-05-15'), ('location_two', '2018-09-01')])
In [6]: config._past
Out[6]:
[('location_two', datetime.datetime(2018, 9, 1, 0, 0)),
('location_one', datetime.datetime(2019, 5, 15, 0, 0))]
In [7]: late_2018 = datetime.replace(datetime.now(), year=2018)
In [8]: late_2019 = datetime.replace(datetime.now(), year=2019)
In [11]: late_2018
Out[11]: datetime.datetime(2018, 10, 9, 16, 27, 4, 143679)
In [12]: late_2019
Out[12]: datetime.datetime(2019, 10, 9, 16, 27, 12, 221254)
In [9]: get_location(late_2019) # should be location_one
Out[9]: 'here_now'
In [10]: get_location(late_2018) # should be location_two
Out[10]: 'location_one' My past block specifies:
So, in english:
So any time between I may be misunderstanding something, dates always confuse me. |
Ah.
I guess it's a little counter-intuitive, but it's the only 'natural' way of interpreting past + current without duplicating any information. IMO it makes it easier to think if you sort the entires and squash together entries: |
I agree that this is technically the best way to represent it, but if I asked someone to write down 'places they've lived', they almost always list the dates as when they moved there, not when they moved away. I don't know if that's some bias I have, or if you think the same way.
I think this makes it more confusing, switching between types. Sort of annoying to document/explain as well
What about listing it like that instead? I think it is nice to have the Could instead have an interface like: class location:
home = [
((42.342, 120.13), '....199X'), # i.e. when you were born, first house you want to list
((...,...), 2015-...) # moved to this place
(43.934, 110.43), 2019-..) # moved to this place in 2019
] The last item in that list is your current location. That way there's no duplication. If the user doesn't want to specify multiple items, but just their current location, they could instead do: class location:
home = (43.324, 120.32) and that would be treated as Something like LatLon = Tuple[float, float]
DateIsh = Union[datetime, str]
LocationEntry = Tuple[LatLon, DateIsh]
class location(user_config):
# either just one LatLon (i.e. 'current'), or a list with dates of when you moved there
home: Union[Sequence[LocationEntry], LatLon] |
Yeah, agree, seems like a very good compromise! Will implement. |
Yeah, I think I agree, sort of like: 'on
|
done! Here's the new format in the test |
Updates related to this; (changes here) Did the
Also created a basic Have started using Think it'd be nice if this was more flexible, accepting any date, not just naive/UTC ones. A lot of my datetimes are 'by default' serialized from epoch time (what they're stored as) to Not sure if you want to change that function, or add another one, which:
Not sure what your view on logs is, I don't like them showing up in |
Nice! Also glad you managed to plug in ipgeocache
Yeah, I wasn't really sure what would make sense at first, but now that I played with it a bit more, feels like it's getting in the way more often than not, so a hard assert is definitely too annoying. Perhaps it could be configurable (i.e. assert/warn & localize/warn & keep the timezone), but silently converting the timezone makes the most sense.
Not sure if I got what you mean here? The way it's implemented now is always assuming naive (i.e. 'local') timestamp, and attaching the timezone then. What I'm not sure about the function is this cause, which could indeed benefit from assuming UTC: https://github.com/seanbreckenridge/HPI/blob/14a5dabe73bd734c6da3e3e371e67aef75ce3432/my/time/tz/via_location.py#L163-L164
Yep, agree, it's a bit spammy for me at times as well! The main reason I like them is because it gives a sense of progress. I was also thinking of sort of 'progress bar' via tqdm with a custom logging handler hooked to it. So you get both a sense of progress, but at the same time it's only taking a single line of screen space. Overall just need to give it a bit of a think how to make it more consistent. An env variable to toggle debug logs definitely makes sense, agree! Personally, for me perhaps the only non-warning stuff I actually look at often is the log from |
Oh, yeah, never mind, I think this was just me being confused a bit. If it receives some naive timezone, it should assume its local to the timezone you were in at the time, so Think it would be nice to put a comment here explaining:
Yeah, I think cachew is a majority of the spam/logs when I enable For reference, here are my current logs when using sometimes I have to change some external logger after its been imported to respect |
Have continued to use gpslogger, seems to work great; Created |
Oh nice! I wonder if you can also use some library for gpx files... less code and might be useful for other data providers! |
Ah, for some reason I thought this was GPSLogger specific, Will try optionally importing this and parsing with that, falling back to basic XML parsing if that isnt installed |
Damn, we discussed a lot of stuff in this pull request 😂
Actually gave it a go in Promnesia karlicoss/promnesia@01ab844 Looks kinda cool: https://twitter.com/karlicoss/status/1323122289517408259, but have a feeling that properly doing that, so it doesn't break for random people would be a lot of work, so not really worth it... |
@seanbreckenridge this is what I've mentioned in #90 (comment)
The idea is that you can use
localize
method against a timezone-unaware date, and it will determine it from google location (later can think of using other location provider too, I've got some gpslogger stuff). If it can't do it based on the precise location data, it determines based onmy.location.home
.For now the resolution is day-based, but I'll experiment a bit more and later will implement it to be more precise.
So far, I've only been using it on caller sites, whereas would be kind of cool to do it from within HPI itself (e.g. with sms provider, we already know it's UTC so no point in forcing the user to figure that out). But I feel like it would be good to experiment a bit more first to make sure. Also would need to make it a bit more defensive, so if the user doesn't have location data, it's not crashing everything.