Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] dump query processing performance metrics from various stages #3324

Closed
himanshug opened this issue Aug 4, 2016 · 5 comments
Closed
Assignees
Labels

Comments

@himanshug
Copy link
Contributor

himanshug commented Aug 4, 2016

While executing a query, Druid Broker and Historicals (and realtime tasks) publish very useful metrics like

at broker -
query/time
query/bytes
query/node/time
query/node/ttfb

at historical
query/time
query/bytes
query/segment/time
query/segment/wait
...

all the metrics contain queryId, host etc in the dimensions. so if Druid metrics were ingested in another Druid cluster, then users can understand where all the time for a query execution was spent. And, we do have a druid cluster (aka metrics-cluser) to debug performance issues.

However,

  1. some users do not have bandwidth to maintain another druid cluster for metrics and push aggregated metrics to monitoring systems like Graphite. With aggregation, it becomes difficult to understand performance issues for specific queryId.

  2. even with having a druid "metrics" cluster, it takes some time for metrics to get ingested to that cluster. sometimes we want to be able to do the debugging interactively, that is be able to send a query and see all the performance metrics in one place.
    introduce /druid/v3 query endpoint that gives query responseContext #3319 and WIP: optionally configure DirectDruidClient to use /druid/v3 instead of /druid/v2 #3323 enable the ability to have large responseContext from broker (and same accumulated from all historicals)

This proposal is to enable dumping the query performance metrics in the responseContext if query context contains a flag, "dumpPerformance".

With the flag, end user would see a responseContext like below( which would be very useful to debug query performance problems).....

{
    "result": [ .... ],
    "context": {
        ....
        "broker": {
            "query/time" : 783,
            "query/bytes": 1234,
            "historical1": {
                "query/node/ttfb": 124,
                "query/node/time": 567,
                "query/node/bytes": 3564
            },
            "historical2": {
                "query/node/ttfb": 379,
                "query/node/time": 685,
                "query/node/bytes": 5632
            },
        },
        "historical1": {
            "query/time": 554,
            "query/bytes": 3564,
            "segments": [
                "segment_id1": {
                     "query/segment/time": 324,
                     "query/wait/time": 87
                },
                "segment_id2": {
                     "query/segment/time": 314,
                     "query/wait/time": 79
                }
            ]
        },
        "historical2": { .... }
    }
}

Depends on #3319 and #3323

@himanshug himanshug self-assigned this Aug 4, 2016
@erikdubbelboer
Copy link
Contributor

Yes please, this would be something we're interested in. See: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/druid-development/MNHqZl7weLw/3CoM1MrgBAAJ

@navis
Copy link
Contributor

navis commented Aug 4, 2016

👍

@himanshug
Copy link
Contributor Author

for some queries , number of segments scanned might be very large and that can blow up the context... so i will probably limit the number of segments reported per historical to something like 10 (...assuming other segments behaved similarly, this much information would be enough)

also may be, have separate flags for only broker reporting the performance, and a "detailed" flag to include reports from historicals too.

@github-actions
Copy link

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label May 30, 2023
@github-actions
Copy link

This issue has been closed due to lack of activity. If you think that
is incorrect, or the issue requires additional review, you can revive the issue at
any time.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants