Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look into analytics discrepancy on 2014-11-02 #147

Closed
GUI opened this issue Nov 6, 2014 · 4 comments
Closed

Look into analytics discrepancy on 2014-11-02 #147

GUI opened this issue Nov 6, 2014 · 4 comments

Comments

@GUI
Copy link
Member

GUI commented Nov 6, 2014

There are currently no logs showing up for this past Sunday in the "Filter Logs" view:

screen shot 2014-11-05 at 11 54 36 pm

Oddly, there are hits showing up in the "API Drilldown" view:

screen shot 2014-11-05 at 11 55 30 pm

So there's either something weird going on with the "Filter Logs" query, or we are missing data. Some current suspicions:

  • It's related to daylight savings time happening Sunday
  • It's related to the fact that log rotation isn't currently happening on the servers and that's messing with our analytics gathering. The logrotation issue has been fixed in code, I just need to deploy it to our servers: NREL/api-umbrella-router@7fd29fc
@GUI
Copy link
Member Author

GUI commented Nov 24, 2014

So this is slightly less of an issue than I first thought: All the data is there, there's just 2 points showing up for the Sunday date in the "Filter Logs" view (the real metrics for Sunday is the higher point to the left).

The underlying issues appears to be a bug in ElasticSearch's date histogram aggregations that leads to an extra "0" result for a day when daylight savings time rolls around. It seems related to one of these two issues:

elastic/elasticsearch#8209
elastic/elasticsearch#8209

There are ways we could work around this on the display-side of things, but since the data actually is present, this doesn't seem like a huge deal, and I'm somewhat inclined to wait until ElasticSearch fixes this on their end.

@GUI
Copy link
Member Author

GUI commented Dec 8, 2014

Elasticsearch appears to have fixed things on their end, and I believe this will be fixed by the Elasticsearch 1.3.6 or 1.4.1 updates (released Nov. 26). However, since this issues isn't super critical (the data's all there, it just adds this extra "0" result, and we're now further away from this DST oddity), I'll probably wait to do the ElasticSearch upgrade. There are some other new stuff in the pipeline that will also require lower-level server upgrades, so I'll probably wait until all that goes live and we can test all the upgraded components together (which I think should happen in the month-ish timeframe and definitely before we get hit by DST again in March).

@gbinal gbinal added the bug label Dec 10, 2014
@GUI GUI added this to the Sprint 14 (1/26-2/6) milestone Jan 26, 2015
@GUI GUI self-assigned this Jan 26, 2015
@GUI
Copy link
Member Author

GUI commented Feb 9, 2015

Hm, I thought the ElasticSearch 1.4.2 upgrade would fix this, but apparently not. It seems to have changed the bucketing behavior slightly (the first Sunday Nov 2 bucket I think is now for the hits from midnight-2AM, rather than just reporting 0), but there are still two buckets for that date. I believe there may still be issues on ElasticSearch's end related to DST (it may be related to our specific use of pre_zone_adjust_large_interval: elastic/elasticsearch#9491).

Since I don't think this is a super-critical issue, just a little odd and annoying twice a year, I'm going to remove this from the milestone and hope that a future ElasticSearch update more completely fixes this. But if anyone feels this is more important, let us know, and there are probably workarounds we could do on our end.

@GUI GUI removed this from the Sprint 14 (1/26-2/6) milestone Feb 9, 2015
GUI added a commit to NREL/api-umbrella-web that referenced this issue Apr 26, 2015
This underlying bug in Elasticsearch got fixed by our recent upgrade to
Elasticsearch 1.5. But let's add some tests to ensure we continue to get
the expected behavior around daylight savings time. See
18F/api.data.gov#147

This also uncovered a slight issue, in that the API Drilldown graphs
weren't taking into account time zones, so the daily totals might have
appeared different than "Filter Logs" charts (since the hours for each
day was shifted around).
@GUI GUI added this to the Sprint 20 (4/20-5/1) milestone Apr 26, 2015
@GUI
Copy link
Member Author

GUI commented Apr 26, 2015

It turns out this was fixed last week when we upgraded to ElasticSearch 1.5. To ensure this doesn't unexpectedly change in future ElasticSearch upgrades, I've added more specific tests surrounding the beginning and ending of daylight savings time in NREL/api-umbrella-web@981e31d

@GUI GUI closed this as completed Apr 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants