Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump node-clinic to tier 3 #234

Closed
BridgeAR opened this issue Sep 19, 2018 · 23 comments
Closed

Bump node-clinic to tier 3 #234

BridgeAR opened this issue Sep 19, 2018 · 23 comments

Comments

@BridgeAR
Copy link
Member

Right now node-clinic is not yet in any diagnostic support tier. However, recently we ran into issues with 10.10 that could have been detected early on if the clinic test suite would be run on the Node.js infrastructure.

The tool fulfills becomes more and more popular at the moment and bumping it to 3 would likely be the right thing at the moment. That way we are able to guarantee a better stability level overall.

I opened the issue here to discuss it first in the working group before opening a PR to change the tier right away. If there are any questions or if there is any further information necessary, please ask.

@mcollina please also weight in.

@mcollina
Copy link
Member

Recently we had a case where the generation of flamegraphs was broken in both Node.js 10.10.0 (all the times) and 8.12.0 (in certain cases). I think we are not testing enough those parts of our runtime.

@mike-kaufman
Copy link
Contributor

Discussed on Diagnostics WG call on 2018-09-19. here's a summary:

  • There's some concern here that clinic is too broad and touches too many things, whereas smaller, more targeted tools would be better scoped here. Is there some existing flame graph tool that could be added to the matrix that would be more targeted and prevent clinic's flame graphs from breaking?
  • is it appropriate to test this via citgm?
  • can we add tests at an API level to ensure that the flame graphs APIs are working correctly? i.e., the tests aren't validating clinic itself, but rather are testing APIs clinic depends on. API tests would have a better chance of moving up to higher tiers (e.g., blocking releases)

Can we get some details on the exact APIs being used by clinic?

@mmarchini
Copy link
Contributor

@mcollina could you give a little more context on what broke and why? This would help us understand how to keep it from breaking in the future.

@richardlau
Copy link
Member

is it appropriate to test this via citgm?

FYI there's a PR to add it to the default citgm lookup: nodejs/citgm#603

@mcollina
Copy link
Member

The following commit not being backported broke Node 8.12.0 (I did a brittle workaround on 0x side): nodejs/node#19285 nodejs/node#19260 nodejs/node#22825.

At more or less the same time, flamegraphs broke on Node 10.10 because of nodejs/node#22786 and it was fixed in nodejs/node#22790.

It seems that whole area of Node core is not tested nor validated before a release.

@joyeecheung
Copy link
Member

Do the tests of clinic pass on all the Node.js release lines that it supports right now?
The majority of --prof-process comes from V8 and the breakage may come from V8 as well. https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/buildTimeTrend CITGM has been very red for a long time and I am concerned that adding something that deeply depends on V8 (hence the fix may have to come from V8) would make the situation worse.

@mhdawson
Copy link
Member

Would a test for the tick profile processor have caught the problem? Just wondering if we could add a smaller test to cover this case?

@ofrobots
Copy link
Contributor

This case needs a test that runs as part of the CI so that V8's integration builds pick it up too. That way we can avoid V8 breaking this functionality in the first place. I would imagine that Clinic could be part of CitGM, but not the CI.

@richardlau
Copy link
Member

Would a test for the tick profile processor have caught the problem? Just wondering if we could add a smaller test to cover this case?

Our existing tick processor tests don't catch nodejs/node#22825, so we at least have a coverage gap: nodejs/node#22825 (comment)

@mcollina
Copy link
Member

Filling that coverage gap would solve the immediate problem, yes.

So, we can talk about bumping --prof-process up in the Tier hierarchy instead (T3).

@joyeecheung
Copy link
Member

Bumping --prof-process to T3 first sounds more reasonable to me.

@mhdawson
Copy link
Member

+1 to bump up --prof-process

@cjihrig
Copy link

cjihrig commented Sep 21, 2018

+1 to bumping up --prof-process

@Fishrock123
Copy link
Contributor

Please bump --prof-process. It's shipped with node and should obviously not ship broken.

@mmarchini
Copy link
Contributor

I can include --prof-process in nodejs/node#22915

@BridgeAR
Copy link
Member Author

I am fixing the tick-processor tests and I am moving those to the sequential tests to run them with each CI. That way we should be able to detect failures early.

Refs: nodejs/node#23100

@BridgeAR
Copy link
Member Author

Please definitely bump --prof-process to at least T3. To me it seems like it should be even higher since this feature is used frequently.

@mhdawson
Copy link
Member

@BridgeAR will the updated tick-processor tests adequately test --prof-process? I think that is what you are saying but want to be sure.

@BridgeAR
Copy link
Member Author

@mhdawson sadly it is not yet testing all things properly but it's a good improvement nevertheless as we'll run them on each CI afterwards. I am also going to write further tests to ensure that all issues we faced so far would have been detected by our tests.

@BridgeAR
Copy link
Member Author

BridgeAR commented Oct 3, 2018

Just as a note from the meeting today:

I am going to gather information on what systems --prof-process works properly and on what it does not.

We can then go ahead and either document that something is not supported or ideally improve the situation for the specific platform.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Jul 26, 2022
@RafaelGSS
Copy link
Member

Fixed in: #537

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests