Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider query when optimizing date rounding #63403

Merged
merged 11 commits into from
Oct 12, 2020

Commits on Oct 7, 2020

  1. Consider query when optimizing date rounding

    Before this change we inspected the index when optimizing
    `date_histogram` aggregations, precalculating the divisions for the
    buckets for the entire range of dates on the index so long as there
    aren't a ton of these buckets. This works very well when you query all
    of the dates in the index which is quite common - after all, folks
    frequently want to query a week of data and have daily indices.
    
    But it doesn't work as well when the index is much larger than the
    query. This is quite common when dumping data into ES just to
    investigate it but less common in the traditional time series use case.
    But even there it still happens, it is just less impactful. Consider
    the default query produced by Kibana's Discover app: a range of 15
    minutes and a interval of 30 seconds. This optimization saves something
    like 3 to 12 nanoseconds per document, so that 15 minutes would have to
    have hundreds of millions of documents for it to be impactful.
    
    Anyway, this commit takes the query into account when precalculating the
    buckets. Mostly this is good when you have "dirty data". Immagine
    loading 80 billion docs in an index to investigate them. Most of them
    have dates around 2015 and 2016 but some have dates in 1970 and
    others have dates in 2030. These outlier dates are "dirty" "garbage".
    Well, without this change a `date_histogram` across many of these docs
    is significantly slowed down because we don't precalculate the range due
    to the outliers. That's just rude! So this change takes the query into
    account.
    
    The bulk of the code change here is plumbing the query into place. It
    turns out that its a *ton* of plumbing, so instead of just adding a
    `Query` member in hundreds of args replace `QueryShardContext` with a
    new `AggregationContext` which does two things:
    1. Has the top level `Query`.
    2. Exposes just the parts of `QueryShardContext` that we actually need
       to run aggregation. This lets us simplify a few tests now and will
       let us simplify many, many tests later.
    nik9000 committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    e4c4f68 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3c6953c View commit details
    Browse the repository at this point in the history
  3. Update after merge

    nik9000 committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    b25aae8 View commit details
    Browse the repository at this point in the history
  4. Ooops

    nik9000 committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    788846f View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2020

  1. Configuration menu
    Copy the full SHA
    569e0e7 View commit details
    Browse the repository at this point in the history
  2. Update javadoc

    nik9000 committed Oct 12, 2020
    Configuration menu
    Copy the full SHA
    3d7e639 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    04caadd View commit details
    Browse the repository at this point in the history
  4. Iter

    nik9000 committed Oct 12, 2020
    Configuration menu
    Copy the full SHA
    3316c9f View commit details
    Browse the repository at this point in the history
  5. Hit it with the formatter

    nik9000 committed Oct 12, 2020
    Configuration menu
    Copy the full SHA
    4b3073b View commit details
    Browse the repository at this point in the history
  6. Iter

    nik9000 committed Oct 12, 2020
    Configuration menu
    Copy the full SHA
    6d8cd95 View commit details
    Browse the repository at this point in the history
  7. Ooops

    nik9000 committed Oct 12, 2020
    Configuration menu
    Copy the full SHA
    18ac471 View commit details
    Browse the repository at this point in the history