[Proposal] Query Profiling #539

jkowall · 2021-04-13T16:50:40Z

Requirements - What kind of business use case are you trying to solve?

Fixing performance issues on OpenSearch is difficult, this is due to limited observability within the database itself.
In order to take action or understand the performance and behavior of the database we need to understand the workload that a specific node or cluster is running.
This data would be collected in a sampled format to allow for the analysis of the data.
In the future this data would be used to block, throttle, or otherwise control the queries.

Problem - What blocks you from solving this today?

Today the profiling capabilities are deep but basic. They are also not part of the open source codebase. The only way to profile OpenSearch is via Java/JVM profiling which is highly inefficient and expensive from a performance perspective.

Proposal - what do you suggest to solve the problem or improve the existing situation?

The basic profiler capability within OpenSearch. This would allow for basic data collection and a pluggable architecture to allow the profiler to be extended with additional capabilities. Some of the future capabilities would be to allow queries to be actioned upon for example blocking, throttling, or prioritizing the queries being executed.

Selection of queries to profile

The initial proposal would be to allow queries to be named in order to be flagged as always profile.
Profiling can be configured to collect data on a specific query type or collect data based on a percentage of the total volume.
If the query is named it should always be collected unless specified with a percentage (ex: 10%)
Each query should be normalized to a general definition if there is no name, this should be defined by a query match

Data collected on queries

Number of times the query was run
Number of documents returned by the query
Total time
Number of shards scanned
success/failed across the shards

Data storage

The data should be stored in OpenSearch to be accessed by other plugins or the engine itself
Alternatively, the metrics could be exposed in an endpoint and be scraped with Prometheus for monitoring/observability use cases
Another option is to create a new kind of node for this purpose - a profiling node. This way the performance of the other nodes would remain the same and this new profiling node would have its own indexing settings optimized for the profiling data. Additionally this could allow admins to add such a node and remove it whenever they want.

Assumptions

We assume the user is running OpenSearch and we can define a new index for the data storage.

Any open questions to address

What did we miss? We don’t want this feature to get over-engineered, but I am sure more things should be added.

One open item is where this should live. This could be an internal plugin, or it could be an external self-contained node or cluster itself. There are pros and cons to each design decision.

We appreciate your input on this concept before design and coding start.

Thank you from @jkowall and @AmiStrn

rursprung · 2021-04-26T20:42:35Z

having this as an external cluster or self-contained node would create additional maintenance overhead when managing the cluster (esp. if it's not managed with a k8s operator; i.e. either run on bare metal or on k8s but w/o an operator). thus i'd vote for this being part of the nodes (be that as a built-in feature or a plugin).

storing this in opensearch vs. exposing it to prometheus comes down to whether you want to have run-time profiling (=> then it'd probably be something for prometheus; but that'd require people to actually have prometheus (which you do in a big setup, esp. with k8s)) or want to have performance metrics of a single query you're currently working on.
for those of us working with traditional RDBMS this could be compared to looking at the explain plan of a single query (while actively working on it, i.e. literally EXPLAIN PLAN FOR SELECT * FROM DUAL) or looking at things like ASH or AWR reports.

AmiStrn · 2021-04-27T13:05:29Z

@rursprung I agree with your point regarding the extra node idea proposed. I was thinking this could simplify the architecture by separation of concerns, but as you correctly point out:

[it] would create additional maintenance overhead when managing the cluster

Initially, I had been aiming for run-time profiling. Aggregating data by query context over time and exposing it as a metric. This way using something like a task throttling feature we could keep harmful query contexts at bay.

WDYT about exposing the metrics this way?

malpani · 2021-05-03T18:20:53Z

Couple of questions/suggestions

The initial proposal would be to allow queries to be named in order to be flagged as always profile. If the query is named it should always be collected unless specified with a percentage

Would it be better to have this as a query param? eg. there is already a "profile": true

Data collected

num-shards, latency, doc-counts, frequency will provide a nice high level view. Will this include time spent on queues and shard level latency? Is the use case here more on analytics of the kind of queries running on the cluster or is the end goal is to leverage this for query planning? If it is the latter, then is the long term plan here to add more granular metrics eg. memory footprint and should this proposal include granular metrics? If its the former, then i would prefer something more generic like a netty interceptor that could be used to track not just search but also all requests like bulk?

AmiStrn · 2021-05-04T15:05:04Z

Would it be better to have this as a query param? eg. there is already a "profile": true

As stated at the top of the doc in the link you provided :

The Profile API is a debugging tool and adds significant overhead to search execution.

So the idea is to have the context of the query added to the query request (perhaps as a query param, but that is already more of an implementation detail)
I am currently working on a breakdown for this proposal into phases, this first phase would add a nice way to monitor the queries in the slow logs grouped by their respective context params.
Regarding the profiler -- it is way more robust than what we are going for here, and no one should expect that to run all the time on each and every query.

Will this include time spent on queues and shard level latency?

Is that something you would like monitored per query context provided?

Is the use case here more on analytics of the kind of queries running on the cluster or is the end goal is to leverage this for query planning?

The first couple of phases are adding the basic metrics required to track queries by context and not by their actual query. So it won't be for query planning or for analytics on the kind of queries but rather analytics on the quey contexts running in the cluster.

For example, you could say that a query context is the origin of the request, and then scrape the metrics and see how the queries marked with that context performed. An origin could be a dashboard and its id, or a direct API call... this way if the cluster is really slow and you see that there is a massive spike in queries from a certain dashboard - you can act on this information in some way, and then fix that dashboard.

Later phases focus on the ability to derive part of the context from the query itself (categorizing them is an open question here).

malpani · 2021-05-05T22:25:52Z

thanks for the examples, now i understand the use case better. So the target audience for this feature is primarily multi-tenant/multi-user clusters where queries land up from different dashboards/tools and this feature is to allow labeling queries/tracking source referrer 'context' and in the first phase collect shard counts and overall latency per 'context'. My guess is this is to give insights into the kind of dashboards/referrers are popular/slow? Is it more apt to update the issue name to something like 'query source analytics'?

Is that something you would like monitored per query context provided?

Reason i asked about time spent on queue(at a shard level) and other finer metrics is e2e time could show up as slow but might actually be due to other queries, as currently search queues are FCFS

AmiStrn · 2021-05-06T16:23:02Z

So the target audience for this feature is primarily multi-tenant/multi-user clusters

This is a pain point for these audiences, however, also labeling the searches performed by the alerting system is a good idea since this will allow you to see what alerts are causing strain on the cluster. But generally speaking -- yes this is mostly usefull for monitoring multi-tenant/multi-user clusters.

Is it more apt to update the issue name to something like 'query source analytics'?

I'm fine with that idea, @jkowall WDYT? ☝️

e2e time could show up as slow but might actually be due to other queries, as currently search queues are FCFS

Thanks for this input I'll mention this in the phase breakdown 👍

jkowall · 2021-05-06T17:12:11Z

@AmiStrn I think the idea is to make some additional issues off this master issue as we break things up.

The capabilities to analyze the source of a query is merely one way the data is to be used.

We are also going to create metrics that summarize the performance, utilization, and errors of specific queries or the OpenSearch engine as a while which are far outside of "query source". This is more of the core of a query profiler which is why we called this feature that name. We could go deeper and generate more data that can help optimize or tune queries or where time should be spent to optimize the queries.

AmiStrn · 2021-05-30T08:19:06Z

Breaking this feature down into several phases:

Enhance existing search stats groups feature to accommodate the query contexts
- Index stats, currently, are summed from the shard level. They need to become level aware:
  - If we search in the index level we should get the max query_time_in_millis of the shards level stats
  - In the _all level we should be seeing the max query_time_in_millis of all the indices, which is the max of any of the shards.
Add metrics collecting (by context) for queries
- Use Java OTEL SDK to write out metrics
- The Elasticsearch Prometheus Exporter does not support index stats - Feature request: Allow indices stats search API to support groups. prometheus-community/elasticsearch_exporter#223
Enable automatic query context resolving so all queries can be profiled by either the defined context header OR an auto resolved context
- Create contexts on the fly defined by the actual query contents, for example:
  - Does the query contain a Wildcard (leading or trailing)?
  - Does the query contain nested aggregations? How many levels of aggregation are there? (“...,nested-agg-levels:3,...”)
- Enable profiling a sample of the queries (e.i. 10% of them) if they do not contain a query context header.

(This feature is optional - cluster settings will contain a flag to enable this feature, as well as definitions regarding sampling)

Future features that can build off this one - the ability to reject queries based on their context (i.e. reject all queries from dashboard-98765). This will be highly beneficial for managed environments.

**edited based on comments by @malpani

malpani · 2021-06-02T02:25:30Z

Search DSL today already provides a stats parameter which is an optional string array to track SearchStats per group/label. This is already integrated with stats apis and slow log. Instead of a new header and new apis, should the idea be to enhance the current stats groups functionality?

AmiStrn · 2021-06-02T05:11:02Z

Thanks! To be honest I never used this tag. After reading the docs it looks like it is way better than adding a new header.
I will check it out and update the feature breakdown accordingly.

The idea is to enhance this and also in the 3rd phase, to add the capability to add stats on the fly.

AmiStrn · 2021-06-08T13:04:22Z

@malpani oddly enough the stats data is really not useful in its current state unless you want the shard level data (which can be a massive response given high shard count and a multi-index search).
The reason I say this is that the stats appear to sum up the data -
if you query an index with 10 shards once then the stats at the index level would say your query_total is 10. And worse yet, the query_time_in_millis is a SUM (!!!) of the query_time_in_millis per shard. So anything above shard level is being summed up.
The first phase would enhance the stats by adding a max value metric to the stats. I will update the phases accordingly:)

nknize · 2021-06-17T16:29:11Z

Quickly having a look at this; query profiling typically has to disable the query cache to ensure valid results (e.g., you're profiling the actual query and not measuring a cache hit). This is one of the reasons "profiling adds significant overhead". Is the same intended here?

Can you edit the original issue and include the phased approach as a check list in the description? It'll make it easier to convert this to a meta issue.

jkowall · 2021-06-17T20:57:49Z

@nknize Depends on your use case for the query profiler. Sometimes this is used as an "explain" use case for developing efficient queries while other times it's used for observability and debugging. In the later case, I don't think any type of caching changes or disabling caching is required. In fact you wouldn't want to do that at all.

I would this of this as more of what we were looking at, especially with the Prometheus integration we want to do for scraping the data. Being observability people I think this is the use case we are looking to fulfill.

AmiStrn · 2021-06-17T23:01:25Z

@nknize I am going to use the indices search stats groups for the measurements. Cache hits and misses are part of the way the query is being sent (if it is the same then you get hits, so it is not a query that should worry us usually in terms of cluster health).
I second what @jkowall mentioned regarding explain vs observability.

What would you suggest regarding making this more clear in the issue description?

nknize · 2021-06-18T03:04:43Z

Depends on your use case for the query profiler. Sometimes this is used as an "explain" use case for developing efficient queries while other times it's used for observability and debugging.

So this is a great point. The Explain API in OpenSearch and Lucene is something different than what you're describing; it's a way to measure how a specific document was scored within a query context. This is different than the "way to write efficient queries" use-case which is more aligned with the Profile parameter; an "all or nothing" query analyzer to look not just at query performance (CPU time) but how Lucene's rewrite mechanism formulated the final 'Query' obect.

I think what you're after here for the "observability" use case is either

a new TRACE API (maybe start as a new search parameter) that enables users to trace the query request through the server and disclose "exactly" what happened without disabling query cache.

or

An enhancement to the profile parameter like @malpani was getting at? To achieve something like a TRACE described in 1. It would be nice to enhance the profile parameter to take an object json instead of a primitive Boolean to enable users to execute selective profiling; which would include this new trace feature. In this manner you'd only disable query caching say if you want "full scope" performance profiling.

nknize · 2021-06-18T03:29:17Z

What would you suggest regarding making this more clear in the issue description?

You can take your "Breaking the feature down into several phases" comment above and either:

update the description of this issue to include it as a checklist that can link to other feature implementation issues.

Or,

open a new meta issue with that as the description and use this as the incubated "discuss" issue.

I like 1. so everything is in one place but that's just my opinion.

nknize · 2021-06-18T15:57:41Z

Today the profiling capabilities are deep but basic. They are also not part of the open source codebase.

Just a minor correction that Profiling is part of the OSS codebase. Were you referring to something different?

AmiStrn · 2021-06-20T09:57:33Z

a new TRACE API (maybe start as a new search parameter) that enables users to trace the query request through the server and disclose "exactly" what happened without disabling query cache.

This is basically what the stats param gives us. You "tag" your searches with that so that the shard search stats will begin to aggregate stats according to the groups. Thestats param expects a list of these tags.

Regarding an enhancement to the profile parameter, I will play around with the cache enable/disable to see what we could learn in such cases.

I like 1. so everything is in one place but that's just my opinion.

Thanks, I also like this option.

Profiling is part of the OSS codebase

True, the thing that is not OSS is the Kibana side of this API.

AmiStrn · 2021-06-23T15:14:12Z

~~I will be opening a Meta issue for the stats and profiling pain points raised in this thread~~

jkowall · 2021-06-30T18:41:05Z

We decided that this issue was not easily solvable, at least not to get the outcome we wanted. I am going to close this out for now. We are going to use some of the query naming capabilities in the code to provide more control. We will be opening an issue/proposal on query control soon.

jkowall added the enhancement Enhancement or improvement to existing feature or request label Apr 13, 2021

jkowall changed the title ~~Proposal: Query Profiling~~ [Proposal] Query Profiling May 3, 2021

AmiStrn mentioned this issue Jun 16, 2021

Existing indices search stats feature returns SUM aggregation of metrics from the shard level #852

Closed

nknize added discuss Issues intended to help drive brainstorming and decision making community Issues raised by community members and contributors feature New feature or request idea Things we're kicking around. untriaged labels Jun 17, 2021

jkowall closed this as completed Jun 30, 2021

macohen added the Search:Performance label Nov 16, 2023

macohen added Search:Query Insights and removed Search:Performance Search:Query Insights labels Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Query Profiling #539

[Proposal] Query Profiling #539

jkowall commented Apr 13, 2021 •

edited

Loading

rursprung commented Apr 26, 2021

AmiStrn commented Apr 27, 2021

malpani commented May 3, 2021 •

edited

Loading

AmiStrn commented May 4, 2021

malpani commented May 5, 2021 •

edited

Loading

AmiStrn commented May 6, 2021

jkowall commented May 6, 2021

AmiStrn commented May 30, 2021 •

edited

Loading

malpani commented Jun 2, 2021

AmiStrn commented Jun 2, 2021

AmiStrn commented Jun 8, 2021 •

edited

Loading

nknize commented Jun 17, 2021

jkowall commented Jun 17, 2021

AmiStrn commented Jun 17, 2021 •

edited

Loading

nknize commented Jun 18, 2021 •

edited

Loading

nknize commented Jun 18, 2021

nknize commented Jun 18, 2021

AmiStrn commented Jun 20, 2021 •

edited

Loading

AmiStrn commented Jun 23, 2021 •

edited

Loading

jkowall commented Jun 30, 2021

[Proposal] Query Profiling #539

[Proposal] Query Profiling #539

Comments

jkowall commented Apr 13, 2021 • edited Loading

Requirements - What kind of business use case are you trying to solve?

Problem - What blocks you from solving this today?

Proposal - what do you suggest to solve the problem or improve the existing situation?

Assumptions

Any open questions to address

rursprung commented Apr 26, 2021

AmiStrn commented Apr 27, 2021

malpani commented May 3, 2021 • edited Loading

AmiStrn commented May 4, 2021

malpani commented May 5, 2021 • edited Loading

AmiStrn commented May 6, 2021

jkowall commented May 6, 2021

AmiStrn commented May 30, 2021 • edited Loading

Breaking this feature down into several phases:

(This feature is optional - cluster settings will contain a flag to enable this feature, as well as definitions regarding sampling)

Future features that can build off this one - the ability to reject queries based on their context (i.e. reject all queries from dashboard-98765). This will be highly beneficial for managed environments.

malpani commented Jun 2, 2021

AmiStrn commented Jun 2, 2021

AmiStrn commented Jun 8, 2021 • edited Loading

nknize commented Jun 17, 2021

jkowall commented Jun 17, 2021

AmiStrn commented Jun 17, 2021 • edited Loading

nknize commented Jun 18, 2021 • edited Loading

nknize commented Jun 18, 2021

nknize commented Jun 18, 2021

AmiStrn commented Jun 20, 2021 • edited Loading

AmiStrn commented Jun 23, 2021 • edited Loading

jkowall commented Jun 30, 2021

jkowall commented Apr 13, 2021 •

edited

Loading

malpani commented May 3, 2021 •

edited

Loading

malpani commented May 5, 2021 •

edited

Loading

AmiStrn commented May 30, 2021 •

edited

Loading

AmiStrn commented Jun 8, 2021 •

edited

Loading

AmiStrn commented Jun 17, 2021 •

edited

Loading

nknize commented Jun 18, 2021 •

edited

Loading

AmiStrn commented Jun 20, 2021 •

edited

Loading

AmiStrn commented Jun 23, 2021 •

edited

Loading