-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add log retention for database logger #8622
Conversation
✅ Deploy Preview for determined-ui ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
For reference we had a discussion about the golang duration & postgres interval stuff, and decided to keep duration as days. This is because golang duration parsing does not support units larger than hours, we would need to create or find a wrapper for postgres support, and postgres interval math is complex and could be problematic. Here is a blog post with some more info: |
5c263ed
to
518963d
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8622 +/- ##
==========================================
+ Coverage 47.14% 47.17% +0.03%
==========================================
Files 1150 1152 +2
Lines 141671 141773 +102
Branches 2415 2417 +2
==========================================
+ Hits 66785 66888 +103
+ Misses 74696 74695 -1
Partials 190 190
Flags with carried forward coverage won't be shown. Click here to find out more.
|
fc8b731
to
78625da
Compare
cd1e9f2
to
10767f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
talked offline but leaving this review still just incase i lost something
@@ -34,7 +34,7 @@ stages: | |||
set -ex | |||
mkdir -p /tmp/det | |||
cat >/tmp/det/agent.yaml <<EOF | |||
master_host: 127.0.0.1 | |||
master_host: host.docker.internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't forget to revert this when you land?
StartTime: time.Now().UTC(), | ||
JobID: &jobID, | ||
LogVersion: model.CurrentTaskLogVersion, | ||
LogRetentionDays: taskSpec.LogRetentionDays, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you leave a really good comment on the field about how nulls work, etc?
i'm coming back to this review after a while and thought: what happens if the admin wants to retain most of the logs for a lot shorter time and sets the cluster wide retention time to a week from a month, and what does this field mean, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to add some comments to this method:
https://github.com/determined-ai/determined/pull/8622/files#diff-0aff34e1ab2c0076df8608480773216fd6b78a64d46d3f485aee6eb7586659faR87-R89
Please let me know if you think it needs more clarification or comments.
master/internal/db/postgres_tasks.go
Outdated
r, err := db.sql.Exec(fmt.Sprintf(` | ||
WITH log_retention_tasks AS ( | ||
SELECT task_id, end_time, COALESCE(log_retention_days, %d) AS current_retention_days FROM tasks | ||
WHERE task_id IN (SELECT DISTINCT task_id FROM task_logs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a minor optimization to not try to delete from tasks without logs, or more than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, it seems like this would help if we have a database with a very large number of tasks but fewer task logs, which is likely to be the situation once retention is enabled.
this might be a good candidate for scale testing & optimization tho.
3a6bb1a
to
7874b8a
Compare
f5e91cb
to
147bb54
Compare
|
||
.. code:: yaml | ||
|
||
log_retention_days: 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It introduces a bit of complexity to have differing syntax for whether you're setting the default retention policy in the master or setting the retention policy for this one particular experiment. I think either syntax works for me, I just don't want to have to remember both and keep them straight.
Checking in, is this the desired choice? For comparison, log_policies
has both the same syntax and same name in both master config and experiment config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names are somewhat unfortunate but due to collisions with log/logging naming in the configs this seems like the best compromise. The server can also take a schedule and default number of days for retention, while the experiment can only configure the number of days and not the schedule.
c97db92
to
d816d35
Compare
@erikwilson this does not support the CLI or WebUI modifications of retention on individual experiments right? Is there a reason that wasn't done? |
This does contain some of the CLI modifications, the rest of the changes can be added in another PR. For the most part we have tried to separate backend and WebUI PRs, as they have historically been done by different teams and developers. |
Thanks! One concern here is whether the CLI/WebUI/SDK can modify the retention periods for experiments once submitted? If not, I'd rather have a simpler interface where all you can configure for an experiment is whether it is subject to log deletion or not (and we can decide if we need to give users more control later). |
69839b2
to
6210319
Compare
6210319
to
3fb43da
Compare
6451894
to
6cf216e
Compare
6cf216e
to
76f85f0
Compare
harness/determined/cli/master.py
Outdated
@@ -119,6 +124,9 @@ def logs(args: Namespace) -> None: | |||
help="number of lines to show, counting from the end " | |||
"of the log (default is all)") | |||
]), | |||
Cmd("cleanup-logs", cleanup_logs, "cleanup expired task logs", [ | |||
Group(cli.output_format_args["json"], cli.output_format_args["yaml"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not be necessary to have Group(cli.output_format_args["json"], cli.output_format_args["yaml"])
as cleanup_logs()
prints one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we relocate this command to task.py
, since this is related to task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the yaml/json options. But I think this should stay in master because that's what product wanted
76f85f0
to
3524b9a
Compare
3524b9a
to
ad0d3cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
da7f1dc
to
89c07fb
Compare
89c07fb
to
ee1e140
Compare
Description
Adds log retention for database logger.
Test Plan
Automated tests added.
Manual testing:
Commentary (optional)
Checklist
docs/release-notes/
.See Release Note for details.
Ticket
DET-9995