Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Add termination period for static code upload #4250

Merged

Conversation

gchhablani
Copy link
Collaborator

@gchhablani gchhablani commented Dec 28, 2023

This PR adds a graceful termination period time for static code upload jobs.

The issue is that while the sidecar container polls for the submission file, the submission container exists and initiates a TERM signal to all the other containers (sidecar). What this does is that after 30 seconds kills the running sidecar container and the submission never reaches EvalAI backend.
This is fixed by adding a graceful termination period for the sidecar container which is slightly higher than the sleep interval.

Scenarios

  • Sidecar container terminates first - only happens when the submission time limit is reached. In this case, no file will be uploaded to EvalAI, although the submission container will get 360 secons of grace period.
  • Submission container terminates first - more likely to happen, the sidecar container may be in sleep mode - a maximum of 300 seconds - after 360 seconds of submission container terminates it should have found the submission file and can exit without any issues.

There might be a better way to fix this to need as little time as possible while exiting (maybe a preStop hook or something else). This may be investigated later on. We also should look into sidecare containers on EKS: https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/

@gchhablani gchhablani force-pushed the add_termination_grace_period_static branch from 7480845 to d40cee5 Compare December 28, 2023 23:12
@gchhablani gchhablani force-pushed the add_termination_grace_period_static branch from d40cee5 to dd21172 Compare December 28, 2023 23:12
@codecov-commenter
Copy link

Codecov Report

Merging #4250 (7480845) into master (96968d6) will decrease coverage by 3.63%.
Report is 1096 commits behind head on master.
The diff coverage is 46.30%.

❗ Current head 7480845 differs from pull request most recent head dd21172. Consider uploading reports for the commit dd21172 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4250      +/-   ##
==========================================
- Coverage   72.93%   69.30%   -3.63%     
==========================================
  Files          83       20      -63     
  Lines        5368     3574    -1794     
==========================================
- Hits         3915     2477    -1438     
+ Misses       1453     1097     -356     
Files Coverage Δ
frontend/src/js/controllers/ChallengeInviteCtrl.js 100.00% <100.00%> (ø)
frontend/src/js/controllers/SubmissionFilesCtrl.js 95.45% <100.00%> (ø)
frontend/src/js/controllers/analyticsCtrl.js 80.72% <ø> (ø)
frontend/src/js/controllers/changePwdCtrl.js 100.00% <100.00%> (ø)
frontend/src/js/controllers/contactUsCtrl.js 100.00% <100.00%> (ø)
frontend/src/js/controllers/dashCtrl.js 99.07% <100.00%> (+0.14%) ⬆️
...ontend/src/js/controllers/featuredChallengeCtrl.js 68.99% <100.00%> (ø)
frontend/src/js/controllers/hostedChallengeCtrl.js 100.00% <100.00%> (ø)
frontend/src/js/controllers/webCtrl.js 90.90% <100.00%> (ø)
...ntend/src/js/controllers/challengeHostTeamsCtrl.js 70.50% <66.66%> (-1.18%) ⬇️
... and 7 more

... and 63 files with indirect coverage changes

Files Coverage Δ
frontend/src/js/controllers/ChallengeInviteCtrl.js 100.00% <100.00%> (ø)
frontend/src/js/controllers/SubmissionFilesCtrl.js 95.45% <100.00%> (ø)
frontend/src/js/controllers/analyticsCtrl.js 80.72% <ø> (ø)
frontend/src/js/controllers/changePwdCtrl.js 100.00% <100.00%> (ø)
frontend/src/js/controllers/contactUsCtrl.js 100.00% <100.00%> (ø)
frontend/src/js/controllers/dashCtrl.js 99.07% <100.00%> (+0.14%) ⬆️
...ontend/src/js/controllers/featuredChallengeCtrl.js 68.99% <100.00%> (ø)
frontend/src/js/controllers/hostedChallengeCtrl.js 100.00% <100.00%> (ø)
frontend/src/js/controllers/webCtrl.js 90.90% <100.00%> (ø)
...ntend/src/js/controllers/challengeHostTeamsCtrl.js 70.50% <66.66%> (-1.18%) ⬇️
... and 7 more

... and 63 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd9260b...dd21172. Read the comment docs.

@gchhablani gchhablani merged commit 31e803f into Cloud-CV:master Dec 28, 2023
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants