Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Efficient Feature Generation #172

Open
4 tasks
bitsofbits opened this issue Jun 28, 2017 · 0 comments
Open
4 tasks

More Efficient Feature Generation #172

bitsofbits opened this issue Jun 28, 2017 · 0 comments

Comments

@bitsofbits
Copy link
Contributor

bitsofbits commented Jun 28, 2017

We should be able to speed up feature generation a lot even without using windowing by:

  • Passing in or sniffing an end_date for old data. Choose start_date to be dt(1?, 7?)
    days earlier than the date we care about.

  • Compute features only from start_date onward

  • Merge the features with old data

  • All of this may get simpler by switching to the simpler 5 minute rule that David uses. Just
    pick one point out of every 5 minute interval of the day. (Perhaps use median across values,
    or point most at Median time-wise).


It may be better / simpler to switch to sharding by date, then have a utility that converts date shards into mmsi shards (for some date range).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant