Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRAP Caching: Add timeouts to upload/download operations #1280

Merged
merged 1 commit into from
Sep 30, 2022

Conversation

edoardopirovano
Copy link
Contributor

Per @henrymercer's suggestion as he observed one instance where TRAP cache downloading hanged and caused a whole run to fail, this PR adds timeouts to both the download and upload operations.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.
  • Confirm the changelog has been updated if necessary.

@edoardopirovano edoardopirovano requested a review from a team as a code owner September 30, 2022 09:54
henrymercer
henrymercer previously approved these changes Sep 30, 2022
Copy link
Contributor

@henrymercer henrymercer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor comments.

src/trap-caching.ts Outdated Show resolved Hide resolved
src/util.ts Show resolved Hide resolved
src/trap-caching.ts Outdated Show resolved Hide resolved
@jbj
Copy link

jbj commented Sep 30, 2022

Will we get sufficient telemetry to tell how often the timeout is reached?

@edoardopirovano
Copy link
Contributor Author

Will we get sufficient telemetry to tell how often the timeout is reached?

Our telemetry will record an upload/download time that's very near to the timeout. This should allow us to pick out runs where we timed out fairly easily by filtering on telemetry rows that are near this value. I suppose there might be a few runs that finished just before the timeout that might mistakenly get attributed as timeouts by this. If we find that becomes an issue, we can add an extra telemetry field that records whether we hit a timeout, but as long as that's fairly rare (which I expect to be, because most normal upload/downloads we've seen thus far are very far from the timeout) I don't think it warrants an extra field and the associated complexity that threading that through the code entails.

@henrymercer
Copy link
Contributor

Our telemetry will record an upload/download time that's very near to the timeout.

IIRC we measure the aggregate upload / download time over all languages. So I think we could tell whether we timed out when analyzing a single language, but not when analyzing multiple languages (unless they all timed out). It should be an unlikely case though, so I wouldn't personally push for extra telemetry fields at this time.

Comment on lines +178 to +182
() => {
logger.info(
`Timed out waiting for TRAP cache for ${language} to upload, will continue without uploading`
);
}
Copy link
Contributor

@henrymercer henrymercer Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, optional: You can remove the curly braces here and in other calls like this. I'm surprised the linter doesn't complain about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave them in. I personally prefer how it reads with the curly braces, since it makes it clearer that this is part of the callback. If we ever want to add a second statement here (e.g. to record a telemetry field saying we timed out), then it will also make the diff then cleaner.

@edoardopirovano
Copy link
Contributor Author

So I think we could tell whether we timed out when analyzing a single language, but not when analyzing multiple languages (unless they all timed out).

Yes, it will be a little tricker to infer from the timings whether timeouts occurred when there's more than one language. We may want to consider adding an extra field then, although I hope this will be rare enough to not warrant it.

@edoardopirovano edoardopirovano merged commit 4cf8004 into main Sep 30, 2022
@edoardopirovano edoardopirovano deleted the edoardo/add-timeout branch September 30, 2022 13:13
@github-actions github-actions bot mentioned this pull request Oct 6, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants