-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making tq stats visible from cli #638
Conversation
|
@@ -15,7 +15,7 @@ require ( | |||
github.com/stretchr/testify v1.9.0 | |||
github.com/temporalio/ui-server/v2 v2.29.2 | |||
go.temporal.io/api v1.36.0 | |||
go.temporal.io/sdk v1.27.0 | |||
go.temporal.io/sdk v1.28.2-0.20240816033637-2a02c483658a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgraded sdk to point to the version where TQ stats were introduced in DescribeTaskQueueEnhanced (#1553)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot be merged to main
until it is a stable tag on the dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show what the output of this command w/ stats for a normal task queue (no versioning) looks like in text and JSON form? Can just be as a comment here in the PR. It's hard to get a feel for what the user sees from the code alone.
@@ -15,7 +15,7 @@ require ( | |||
github.com/stretchr/testify v1.9.0 | |||
github.com/temporalio/ui-server/v2 v2.29.2 | |||
go.temporal.io/api v1.36.0 | |||
go.temporal.io/sdk v1.27.0 | |||
go.temporal.io/sdk v1.28.2-0.20240816033637-2a02c483658a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot be merged to main
until it is a stable tag on the dependency
@antlai-temporal @Shivs11 I think it's better to change the short and long descriptions of the I'd suggest we mention poller info and stats as the main output. Reachability can be mentioned as a experimental part of the API with proper warning that it may change in the future. Note that there is a chance that we want to move reachability API under |
I was not aware that the reachability side of things could be moved over later and knowing this now, it makes sense to not over-focus on it. I am for the act of keeping pollers and stats as the main output but shall wait for @antlai-temporal to give us the green light 🚥 |
I saw the markdown table in PR description but I think we want to see what the raw table looks like in text (and will wait until after y'all's discussion on text/json format to review what latest output looks like and approve). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but would like to wait for other approvers because I think they may have stronger opinions on the output structure. And please don't merge until we have a tagged SDK dependency.
The main reason to over focus on reachability was the feedback that it was hard to find which command would give us that info. I think that once we move reachaibility to versioning there will be no issue, but until we do, it makes sense to give it priority in the short/long description. |
* `--disable-stats` (bool) - Disable task queue statistics. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for reachability we use report-reachability
as opposed to disable-reachability
because it is expensive, and we wanted to avoid calls.
Is getting the stats also expensive? In that case we should use also use report-stats
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stats is not the cheapest thing (because it need partition fan-out) but still relatively cheap. I'm comfortable with having it enabled by default. wdyt @dnr ?
We may want to add caching for stats in the root partition in the future (if SDK starts to query it for poller scaling).
temporalcli/commandsmd/commands.md
Outdated
information for the requested versions and all task types, which can be used to safely retire Workers with old code versions, | ||
provided that they were assigned a Build ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part of the sentence is only about reachability and it should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, however we should have the part : the requested versions and all task types
.
All in all, the sentence should look like:
The temporal task-queue describe command provides poller information, backlog statistics and task reachability (experimental) information for the requested versions and task types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task reachability (experimental)
information
"information" seems redundant?
What was changed
temporal task-queue describe --task-queue xxx
Task Queue Statistics
temporal task-queue describe --task-queue xxx -o json
"stats": [ { "buildId": "UNVERSIONED", "taskQueueType": "workflow", "approximateBacklogCount": 0, "approximateBacklogAge": 0, "backlogIncreaseRate": 0, "tasksAddRate": 0, "tasksDispatchRate": 0 }, { "buildId": "UNVERSIONED", "taskQueueType": "activity", "approximateBacklogCount": 0, "approximateBacklogAge": 0, "backlogIncreaseRate": 0, "tasksAddRate": 0, "tasksDispatchRate": 0 } ]
Why?
Checklist
Closes
How was this tested: