Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: update proto_checkpoint_view to use index #8793

Merged
merged 2 commits into from
Feb 2, 2024

Conversation

NicholasBlaskey
Copy link
Contributor

Description

proto_checkpoint_view was casting checkpoints_v2.uuid to a text meaning we couldn't use the index on checkpoints_v2.uuid so model queries were full scanning checkpoints.

Test Plan

TestCheckpointReturned should have this covered well

Commentary (optional)

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

@NicholasBlaskey NicholasBlaskey requested a review from a team as a code owner February 2, 2024 19:25
@cla-bot cla-bot bot added the cla-signed label Feb 2, 2024
Copy link

netlify bot commented Feb 2, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit b1fe644
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/65bd432e01defb00080f4f93

@stoksc
Copy link
Contributor

stoksc commented Feb 2, 2024

@NicholasBlaskey can you post the new plan?

Copy link

codecov bot commented Feb 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (abd590d) 47.72% compared to head (b1fe644) 53.93%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8793      +/-   ##
==========================================
+ Coverage   47.72%   53.93%   +6.20%     
==========================================
  Files        1049      616     -433     
  Lines      167258    70734   -96524     
  Branches     2243        0    -2243     
==========================================
- Hits        79816    38147   -41669     
+ Misses      87284    32587   -54697     
+ Partials      158        0     -158     
Flag Coverage Δ
backend 43.24% <ø> (+<0.01%) ⬆️
harness 63.19% <ø> (-1.13%) ⬇️
web ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 454 files with indirect coverage changes

@NicholasBlaskey
Copy link
Contributor Author

new query

                                                                                       QUERY PLAN                                                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=13.11..64.55 rows=1 width=266) (actual time=2.409..2.434 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=2.73..54.09 rows=1 width=2461) (actual time=1.778..1.784 rows=1 loops=1)
         ->  Nested Loop Left Join  (cost=2.15..45.76 rows=1 width=1947) (actual time=1.741..1.745 rows=1 loops=1)
               ->  Nested Loop Left Join  (cost=1.71..37.74 rows=1 width=1730) (actual time=1.646..1.649 rows=1 loops=1)
                     ->  Nested Loop Left Join  (cost=1.42..29.91 rows=1 width=1555) (actual time=1.206..1.209 rows=1 loops=1)
                           ->  Nested Loop Left Join  (cost=1.00..21.94 rows=1 width=839) (actual time=1.191..1.194 rows=1 loops=1)
                                 ->  Nested Loop  (cost=0.57..17.73 rows=1 width=835) (actual time=1.160..1.162 rows=1 loops=1)
                                       ->  Nested Loop Left Join  (cost=0.14..9.28 rows=1 width=218) (actual time=0.014..0.016 rows=1 loops=1)
                                             ->  Seq Scan on model_versions  (cost=0.00..1.05 rows=1 width=204) (actual time=0.006..0.007 rows=1 loops=1)
                                                   Filter: (model_id = 1)
                                                   Rows Removed by Filter: 3
                                             ->  Index Scan using users_pkey on users  (cost=0.14..8.16 rows=1 width=18) (actual time=0.006..0.006 rows=1 loops=1)
                                                   Index Cond: (id = model_versions.user_id)
                                       ->  Index Scan using checkpoints_v2_uuid_key on checkpoints_v2 c  (cost=0.43..8.45 rows=1 width=633) (actual time=1.144..1.144 rows=1 loops=1)
                                             Index Cond: (uuid = model_versions.checkpoint_uuid)
                                 ->  Index Only Scan using idx_checkpoint_id_run_id on run_checkpoints rc  (cost=0.43..4.21 rows=1 width=20) (actual time=0.014..0.015 rows=1 loops=1)
                                       Index Cond: (checkpoint_id = c.uuid)
                                       Heap Fetches: 0
                           ->  Index Scan using trials_pkey on runs r  (cost=0.42..7.97 rows=1 width=720) (actual time=0.014..0.014 rows=1 loops=1)
                                 Index Cond: (id = rc.run_id)
                     ->  Index Scan using experiments_pkey on experiments e  (cost=0.29..7.83 rows=1 width=179) (actual time=0.439..0.439 rows=1 loops=1)
                           Index Cond: (id = r.experiment_id)
               ->  Index Scan using validations_trial_id_total_batches_run_id_unique on raw_validations v  (cost=0.44..8.01 rows=1 width=225) (actual time=0.012..0.013 rows=1 loops=1)
                     Index Cond: ((trial_id = r.id) AND (total_batches = ((c.metadata ->> 'steps_completed'::text))::integer))
                     Filter: (NOT archived)
         ->  Index Scan using steps_trial_id_total_batches_run_id_unique on raw_steps s  (cost=0.57..8.31 rows=1 width=522) (actual time=0.010..0.011 rows=1 loops=1)
               Index Cond: ((trial_id = r.id) AND (total_batches = ((c.metadata ->> 'steps_completed'::text))::integer))
               Filter: (NOT archived)
   ->  Subquery Scan on m  (cost=10.38..10.41 rows=1 width=191) (actual time=0.057..0.059 rows=1 loops=1)
         ->  GroupAggregate  (cost=10.38..10.40 rows=1 width=211) (actual time=0.048..0.049 rows=1 loops=1)
               Group Key: m_1.id, u.id
               ->  Sort  (cost=10.38..10.38 rows=1 width=207) (actual time=0.043..0.044 rows=1 loops=1)
                     Sort Key: u.id
                     Sort Method: quicksort  Memory: 25kB
                     ->  Nested Loop Left Join  (cost=0.14..10.37 rows=1 width=207) (actual time=0.011..0.014 rows=1 loops=1)
                           Join Filter: (mv.model_id = m_1.id)
                           ->  Nested Loop  (cost=0.14..9.31 rows=1 width=203) (actual time=0.008..0.010 rows=1 loops=1)
                                 ->  Seq Scan on models m_1  (cost=0.00..1.07 rows=1 width=185) (actual time=0.004..0.005 rows=1 loops=1)
                                       Filter: (id = 1)
                                       Rows Removed by Filter: 5
                                 ->  Index Scan using users_pkey on users u  (cost=0.14..8.16 rows=1 width=18) (actual time=0.003..0.003 rows=1 loops=1)
                                       Index Cond: (id = m_1.user_id)
                           ->  Seq Scan on model_versions mv  (cost=0.00..1.05 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=1)
                                 Filter: (model_id = 1)
                                 Rows Removed by Filter: 3
 Planning Time: 4.010 ms
 Execution Time: 2.714 ms
(47 rows)


@NicholasBlaskey
Copy link
Contributor Author

old query

                                                                                        QUERY PLAN                                                                                                
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=7427.00..508363.94 rows=16359 width=266) (actual time=440.571..1443.003 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Nested Loop Left Join  (cost=6427.00..505728.04 rows=6816 width=266) (actual time=1098.314..1431.299 rows=0 loops=3)
         ->  Nested Loop Left Join  (cost=6426.43..448657.93 rows=6816 width=2138) (actual time=1098.043..1431.017 rows=0 loops=3)
               ->  Parallel Hash Left Join  (cost=6425.99..393969.19 rows=6816 width=1921) (actual time=1098.014..1430.987 rows=0 loops=3)
                     Hash Cond: (r.experiment_id = e.id)
                     ->  Nested Loop Left Join  (cost=20.57..387545.87 rows=6816 width=1746) (actual time=1078.823..1411.793 rows=0 loops=3)
                           ->  Nested Loop Left Join  (cost=20.15..333237.67 rows=6816 width=1030) (actual time=1078.816..1411.786 rows=0 loops=3)
                                 ->  Hash Join  (cost=19.72..304518.72 rows=6816 width=1026) (actual time=1078.798..1411.766 rows=0 loops=3)
                                       Hash Cond: ((c.uuid)::text = (model_versions.checkpoint_uuid)::text)
                                       ->  Parallel Seq Scan on checkpoints_v2 c  (cost=0.00..292502.45 rows=1363245 width=633) (actual time=0.005..492.461 rows=1090596 loops=3)
                                       ->  Hash  (cost=19.70..19.70 rows=1 width=409) (actual time=0.125..0.132 rows=1 loops=3)
                                             Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                             ->  Nested Loop  (cost=10.52..19.70 rows=1 width=409) (actual time=0.112..0.120 rows=1 loops=3)
                                                   ->  Subquery Scan on m  (cost=10.38..10.41 rows=1 width=191) (actual time=0.098..0.103 rows=1 loops=3)
                                                         ->  GroupAggregate  (cost=10.38..10.40 rows=1 width=211) (actual time=0.068..0.071 rows=1 loops=3)
                                                               Group Key: m_1.id, u.id
                                                               ->  Sort  (cost=10.38..10.38 rows=1 width=207) (actual time=0.057..0.060 rows=1 loops=3)
                                                                     Sort Key: u.id
                                                                     Sort Method: quicksort  Memory: 25kB
                                                                     Worker 0:  Sort Method: quicksort  Memory: 25kB
                                                                     Worker 1:  Sort Method: quicksort  Memory: 25kB
                                                                     ->  Nested Loop Left Join  (cost=0.14..10.37 rows=1 width=207) (actual time=0.032..0.037 rows=1 loops=3)
                                                                           Join Filter: (mv.model_id = m_1.id)
                                                                           ->  Nested Loop  (cost=0.14..9.31 rows=1 width=203) (actual time=0.023..0.025 rows=1 loops=3)
                                                                                 ->  Seq Scan on models m_1  (cost=0.00..1.07 rows=1 width=185) (actual time=0.013..0.014 rows=1 loops=3)
                                                                                       Filter: (id = 1)
                                                                                       Rows Removed by Filter: 5
                                                                                 ->  Index Scan using users_pkey on users u  (cost=0.14..8.16 rows=1 width=18) (actual time=0.007..0.007 rows=1 loops=3)
                                                                                       Index Cond: (id = m_1.user_id)
                                                                           ->  Seq Scan on model_versions mv  (cost=0.00..1.05 rows=1 width=8) (actual time=0.006..0.008 rows=1 loops=3)
                                                                                 Filter: (model_id = 1)
                                                                                 Rows Removed by Filter: 3
                                                   ->  Nested Loop Left Join  (cost=0.14..9.28 rows=1 width=218) (actual time=0.012..0.014 rows=1 loops=3)
                                                         ->  Seq Scan on model_versions  (cost=0.00..1.05 rows=1 width=204) (actual time=0.003..0.004 rows=1 loops=3)
                                                               Filter: (model_id = 1)
                                                               Rows Removed by Filter: 3
                                                         ->  Index Scan using users_pkey on users  (cost=0.14..8.16 rows=1 width=18) (actual time=0.007..0.007 rows=1 loops=3)
                                                               Index Cond: (id = model_versions.user_id)
                                 ->  Index Only Scan using idx_checkpoint_id_run_id on run_checkpoints rc  (cost=0.43..4.21 rows=1 width=20) (actual time=0.028..0.028 rows=1 loops=1)
                                       Index Cond: (checkpoint_id = c.uuid)
                                       Heap Fetches: 0
                           ->  Index Scan using trials_pkey on runs r  (cost=0.42..7.97 rows=1 width=720) (actual time=0.015..0.016 rows=1 loops=1)
                                 Index Cond: (id = rc.run_id)
                     ->  Parallel Hash  (cost=6183.52..6183.52 rows=17752 width=179) (actual time=18.723..18.724 rows=14202 loops=3)
                           Buckets: 65536  Batches: 1  Memory Usage: 9632kB
                           ->  Parallel Seq Scan on experiments e  (cost=0.00..6183.52 rows=17752 width=179) (actual time=0.004..6.354 rows=14202 loops=3)
               ->  Index Scan using validations_trial_id_total_batches_run_id_unique on raw_validations v  (cost=0.44..8.01 rows=1 width=225) (actual time=0.021..0.023 rows=1 loops=1)
                     Index Cond: ((trial_id = r.id) AND (total_batches = ((c.metadata ->> 'steps_completed'::text))::integer))
                     Filter: (NOT archived)
         ->  Index Scan using steps_trial_id_total_batches_run_id_unique on raw_steps s  (cost=0.57..8.31 rows=1 width=522) (actual time=0.014..0.017 rows=1 loops=1)
               Index Cond: ((trial_id = r.id) AND (total_batches = ((c.metadata ->> 'steps_completed'::text))::integer))
               Filter: (NOT archived)
 Planning Time: 2.267 ms
 Execution Time: 1443.133 ms
(56 rows)

Copy link
Contributor

@stoksc stoksc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, checked it out locally and hit some apis to make sure there wasn't some random thing using that view that was untested.

Copy link
Contributor

@stoksc stoksc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, checked it out locally and hit some apis to make sure there wasn't some random thing using that view that was untested.

@NicholasBlaskey NicholasBlaskey merged commit f1a45ae into main Feb 2, 2024
73 of 87 checks passed
@NicholasBlaskey NicholasBlaskey deleted the fix_model_versions_queries branch February 2, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants