Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve resource agg performance #8735

Merged
merged 4 commits into from
Feb 2, 2024
Merged

Conversation

NicholasBlaskey
Copy link
Contributor

@NicholasBlaskey NicholasBlaskey commented Jan 23, 2024

Description

Improve resource aggregation daily performance.

Test Plan

Integration test should cover it.

Commentary (optional)

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

@cla-bot cla-bot bot added the cla-signed label Jan 23, 2024
Copy link

netlify bot commented Jan 23, 2024

Deploy Preview for determined-ui ready!

Name Link
🔨 Latest commit 1bd2edb
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/65bcec87e7f67000088129bc
😎 Deploy Preview https://deploy-preview-8735--determined-ui.netlify.app/
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Jan 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e873381) 47.70% compared to head (1bd2edb) 47.72%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8735      +/-   ##
==========================================
+ Coverage   47.70%   47.72%   +0.01%     
==========================================
  Files        1049     1049              
  Lines      167250   167250              
  Branches     2241     2242       +1     
==========================================
+ Hits        79792    79812      +20     
+ Misses      87300    87280      -20     
  Partials      158      158              
Flag Coverage Δ
backend 43.21% <ø> (+0.05%) ⬆️
harness 64.32% <ø> (ø)
web 42.54% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 4 files with indirect coverage changes

@NicholasBlaskey
Copy link
Contributor Author

Post query plan on large db

   Group Key: (generate_series((GREATEST('2000-01-01'::date, $1))::timestamp with time zone, (LEAST('3000-01-01'::date, $3))::timestamp with time zone, '1 day'::interval))
   ->  Merge Left Join  (cost=2126.01..2305.15 rows=5110 width=88) (actual time=9.540..20.090 rows=24830 loops=1)
         Merge Cond: ((generate_series((GREATEST('2000-01-01'::date, $1))::timestamp with time zone, (LEAST('3000-01-01'::date, $3))::timestamp with time zone, '1 day'::interval)) = ra_username.date)
         ->  Merge Left Join  (cost=1548.19..1646.49 rows=1626 width=68) (actual time=7.364..11.500 rows=4767 loops=1)
               Merge Cond: ((generate_series((GREATEST('2000-01-01'::date, $1))::timestamp with time zone, (LEAST('3000-01-01'::date, $3))::timestamp with time zone, '1 day'::interval)) = ra_resource_pool.date)
               ->  Merge Left Join  (cost=1161.52..1229.34 rows=1180 width=48) (actual time=5.943..8.451 rows=2993 loops=1)
                     Merge Cond: ((generate_series((GREATEST('2000-01-01'::date, $1))::timestamp with time zone, (LEAST('3000-01-01'::date, $3))::timestamp with time zone, '1 day'::interval)) = ra_experiment_label.date)
                     ->  Merge Join  (cost=795.06..838.92 rows=1000 width=28) (actual time=4.682..6.381 rows=1652 loops=1)
                           Merge Cond: ((generate_series((GREATEST('2000-01-01'::date, $1))::timestamp with time zone, (LEAST('3000-01-01'::date, $3))::timestamp with time zone, '1 day'::interval)) = resource_aggregates.date)
                           ->  Merge Left Join  (cost=413.94..437.12 rows=1000 width=28) (actual time=1.765..2.696 rows=1652 loops=1)
                                 Merge Cond: ((generate_series((GREATEST('2000-01-01'::date, $1))::timestamp with time zone, (LEAST('3000-01-01'::date, $3))::timestamp with time zone, '1 day'::interval)) = ra_total.date)
                                 ->  Sort  (cost=65.53..68.03 rows=1000 width=8) (actual time=0.436..0.555 rows=1652 loops=1)
                                       Sort Key: (generate_series((GREATEST('2000-01-01'::date, $1))::timestamp with time zone, (LEAST('3000-01-01'::date, $3))::timestamp with time zone, '1 day'::interval))
                                       Sort Method: quicksort  Memory: 49kB
                                       ->  ProjectSet  (cost=0.68..5.70 rows=1000 width=8) (actual time=0.034..0.331 rows=1652 loops=1)
                                             InitPlan 2 (returns $1)
                                               ->  Result  (cost=0.33..0.34 rows=1 width=4) (actual time=0.021..0.022 rows=1 loops=1)
                                                     InitPlan 1 (returns $0)
                                                       ->  Limit  (cost=0.29..0.33 rows=1 width=4) (actual time=0.020..0.021 rows=1 loops=1)
                                                             ->  Index Only Scan using resource_aggregates_keys_unique on resource_aggregates resource_aggregates_1  (cost=0.29..545.50 rows=12646 width=4) (actual time=0.019..0.019 rows=1 loops=1)
                                                                   Index Cond: (date IS NOT NULL)
                                                                   Heap Fetches: 0
                                             InitPlan 4 (returns $3)
                                               ->  Result  (cost=0.33..0.34 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
                                                     InitPlan 3 (returns $2)
                                                       ->  Limit  (cost=0.29..0.33 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=1)
                                                             ->  Index Only Scan Backward using resource_aggregates_keys_unique on resource_aggregates resource_aggregates_2  (cost=0.29..545.50 rows=12646 width=4) (actual time=0.007..0.007 rows=1 loops=1)
                                                                   Index Cond: (date IS NOT NULL)
                                                                   Heap Fetches: 1
                                             ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
                                 ->  Sort  (cost=348.40..352.49 rows=1636 width=24) (actual time=1.326..1.443 rows=1652 loops=1)
                                       Sort Key: ra_total.date
                                       Sort Method: quicksort  Memory: 152kB
                                       ->  Seq Scan on resource_aggregates ra_total  (cost=0.00..261.08 rows=1636 width=24) (actual time=0.008..1.050 rows=1652 loops=1)
                                             Filter: (aggregation_type = 'total'::text)
                                             Rows Removed by Filter: 10994
                           ->  Sort  (cost=381.12..385.21 rows=1636 width=4) (actual time=2.915..3.013 rows=1652 loops=1)
                                 Sort Key: resource_aggregates.date
                                 Sort Method: quicksort  Memory: 49kB
                                 ->  HashAggregate  (cost=261.07..277.44 rows=1636 width=4) (actual time=2.534..2.699 rows=1652 loops=1)
                                       Group Key: resource_aggregates.date
                                       Batches: 1  Memory Usage: 193kB
                                       ->  Seq Scan on resource_aggregates  (cost=0.00..229.46 rows=12646 width=4) (actual time=0.004..0.794 rows=12646 loops=1)
                     ->  Sort  (cost=366.46..371.29 rows=1931 width=24) (actual time=1.258..1.380 rows=1924 loops=1)
                           Sort Key: ra_experiment_label.date
                           Sort Method: quicksort  Memory: 204kB
                           ->  Seq Scan on resource_aggregates ra_experiment_label  (cost=0.00..261.08 rows=1931 width=24) (actual time=0.034..0.929 rows=1924 loops=1)
                                 Filter: (aggregation_type = 'experiment_label'::text)
                                 Rows Removed by Filter: 10722
               ->  Sort  (cost=386.67..392.30 rows=2255 width=24) (actual time=1.419..1.688 rows=4469 loops=1)
                     Sort Key: ra_resource_pool.date
                     Sort Method: quicksort  Memory: 245kB
                     ->  Seq Scan on resource_aggregates ra_resource_pool  (cost=0.00..261.08 rows=2255 width=24) (actual time=0.005..0.979 rows=2247 loops=1)
                           Filter: (aggregation_type = 'resource_pool'::text)
                           Rows Removed by Filter: 10399
         ->  Sort  (cost=577.82..590.67 rows=5139 width=24) (actual time=2.173..3.421 rows=24530 loops=1)
               Sort Key: ra_username.date
               Sort Method: quicksort  Memory: 559kB
               ->  Seq Scan on resource_aggregates ra_username  (cost=0.00..261.08 rows=5139 width=24) (actual time=0.004..1.241 rows=5121 loops=1)
                     Filter: (aggregation_type = 'username'::text)
                     Rows Removed by Filter: 7525
 Planning Time: 0.584 ms
 Execution Time: 85.455 ms
:

@NicholasBlaskey
Copy link
Contributor Author

old query plan

                                                   QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=665.69..665.84 rows=63 width=168) (actual time=5611.072..5611.161 rows=1652 loops=1)
   Sort Key: (to_char((starts.period_start)::timestamp with time zone, 'YYYY-MM-DD'::text))
   Sort Method: quicksort  Memory: 569kB
   CTE days
     ->  Seq Scan on resource_aggregates  (cost=0.00..261.08 rows=63 width=34) (actual time=0.010..1.615 rows=12646 loops=1)
           Filter: ('[2000-01-01,3000-01-02)'::daterange @> date)
   ->  Subquery Scan on starts  (cost=1.42..402.73 rows=63 width=168) (actual time=7.981..5605.735 rows=1652 loops=1)
         ->  HashAggregate  (cost=1.42..2.05 rows=63 width=4) (actual time=5.021..5.880 rows=1652 loops=1)
               Group Key: days.period_start
               Batches: 1  Memory Usage: 209kB
               ->  CTE Scan on days  (cost=0.00..1.26 rows=63 width=4) (actual time=0.011..3.234 rows=12646 loops=1)
         SubPlan 2
           ->  Limit  (cost=0.00..1.58 rows=1 width=8) (actual time=0.371..0.371 rows=1 loops=1652)
                 ->  CTE Scan on days days_1  (cost=0.00..1.58 rows=1 width=8) (actual time=0.371..0.371 rows=1 loops=1652)
                       Filter: ((aggregation_type = 'total'::text) AND (period_start = starts.period_start))
                       Rows Removed by Filter: 4814
         SubPlan 3
           ->  Aggregate  (cost=1.58..1.59 rows=1 width=32) (actual time=1.055..1.055 rows=1 loops=1652)
                 ->  CTE Scan on days days_2  (cost=0.00..1.58 rows=1 width=40) (actual time=0.536..1.048 rows=3 loops=1652)
                       Filter: ((aggregation_type = 'username'::text) AND (period_start = starts.period_start))
                       Rows Removed by Filter: 12643
         SubPlan 4
           ->  Aggregate  (cost=1.58..1.59 rows=1 width=32) (actual time=0.968..0.968 rows=1 loops=1652)
                 ->  CTE Scan on days days_3  (cost=0.00..1.58 rows=1 width=40) (actual time=0.782..0.965 rows=1 loops=1652)
                       Filter: ((aggregation_type = 'experiment_label'::text) AND (period_start = starts.period_start))
                       Rows Removed by Filter: 12645
         SubPlan 5
           ->  Aggregate  (cost=1.58..1.59 rows=1 width=32) (actual time=0.991..0.991 rows=1 loops=1652)
                 ->  CTE Scan on days days_4  (cost=0.00..1.58 rows=1 width=40) (actual time=0.506..0.986 rows=1 loops=1652)
                       Filter: ((aggregation_type = 'resource_pool'::text) AND (period_start = starts.period_start))
                       Rows Removed by Filter: 12645
 Planning Time: 0.171 ms
 Execution Time: 5611.343 ms
(33 rows)

@NicholasBlaskey NicholasBlaskey marked this pull request as ready for review January 23, 2024 22:00
@NicholasBlaskey NicholasBlaskey requested a review from a team as a code owner January 23, 2024 22:00
Copy link
Contributor

@eecsliu eecsliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just a question about the sum of seconds

),
-- We divide by count since we have multiple rows with the same total.
(SUM(ra_total.seconds) FILTER (WHERE ra_total.aggregation_key IS NOT NULL) /
COUNT(*) FILTER (WHERE ra_total.aggregation_key IS NOT NULL)) AS seconds,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: are the rows always guaranteed to have the same second values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah they always will, but I think there was some awkwardness in the LEFT JOIN method, I changed my mind and decided to rewrite the query and use the same method as the original but making sure index scans happen

@NicholasBlaskey
Copy link
Contributor Author

NicholasBlaskey commented Jan 24, 2024

new strategy query plan

                                                                                                                       QUERY PLAN                                                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=446.66..39965.36 rows=1000 width=176) (actual time=3.397..34.403 rows=1653 loops=1)
   Merge Cond: ((generate_series((GREATEST('2002-01-01'::date, $5))::timestamp with time zone, (LEAST('2040-02-04'::date, $7))::timestamp with time zone, '1 day'::interval)) = resource_aggregates.date)
   ->  Sort  (cost=65.53..68.03 rows=1000 width=8) (actual time=0.383..0.487 rows=1653 loops=1)
         Sort Key: (generate_series((GREATEST('2002-01-01'::date, $5))::timestamp with time zone, (LEAST('2040-02-04'::date, $7))::timestamp with time zone, '1 day'::interval))
         Sort Method: quicksort  Memory: 49kB
         ->  ProjectSet  (cost=0.68..5.70 rows=1000 width=8) (actual time=0.032..0.305 rows=1653 loops=1)
               InitPlan 6 (returns $5)
                 ->  Result  (cost=0.33..0.34 rows=1 width=4) (actual time=0.020..0.022 rows=1 loops=1)
                       InitPlan 5 (returns $4)
                         ->  Limit  (cost=0.29..0.33 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=1)
                               ->  Index Only Scan using resource_aggregates_keys_unique on resource_aggregates resource_aggregates_5  (cost=0.29..545.50 rows=12646 width=4) (actual time=0.018..0.019 rows=1 loops=1)
                                     Index Cond: (date IS NOT NULL)
                                     Heap Fetches: 0
               InitPlan 8 (returns $7)
                 ->  Result  (cost=0.33..0.34 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
                       InitPlan 7 (returns $6)
                         ->  Limit  (cost=0.29..0.33 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=1)
                               ->  Index Only Scan Backward using resource_aggregates_keys_unique on resource_aggregates resource_aggregates_6  (cost=0.29..545.50 rows=12646 width=4) (actual time=0.006..0.006 rows=1 loops=1)
                                     Index Cond: (date IS NOT NULL)
                                     Heap Fetches: 1
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
   ->  Sort  (cost=381.12..385.21 rows=1636 width=4) (actual time=2.955..3.052 rows=1653 loops=1)
         Sort Key: resource_aggregates.date
         Sort Method: quicksort  Memory: 49kB
         ->  HashAggregate  (cost=261.07..277.44 rows=1636 width=4) (actual time=2.586..2.733 rows=1653 loops=1)
               Group Key: resource_aggregates.date
               Batches: 1  Memory Usage: 193kB
               ->  Seq Scan on resource_aggregates  (cost=0.00..229.46 rows=12646 width=4) (actual time=0.005..0.790 rows=12648 loops=1)
   SubPlan 1
     ->  Limit  (cost=0.29..8.30 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1653)
           ->  Index Scan using resource_aggregates_keys_unique on resource_aggregates resource_aggregates_1  (cost=0.29..8.30 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1653)
                 Index Cond: ((date = (generate_series((GREATEST('2002-01-01'::date, $5))::timestamp with time zone, (LEAST('2040-02-04'::date, $7))::timestamp with time zone, '1 day'::interval))) AND (aggregation_type = 'total'::text))
   SubPlan 2
     ->  Aggregate  (cost=14.54..14.55 rows=1 width=32) (actual time=0.006..0.006 rows=1 loops=1653)
           ->  Index Scan using resource_aggregates_keys_unique on resource_aggregates resource_aggregates_2  (cost=0.29..14.53 rows=3 width=20) (actual time=0.002..0.002 rows=3 loops=1653)
                 Index Cond: ((date = (generate_series((GREATEST('2002-01-01'::date, $5))::timestamp with time zone, (LEAST('2040-02-04'::date, $7))::timestamp with time zone, '1 day'::interval))) AND (aggregation_type = 'username'::text))
   SubPlan 3
     ->  Aggregate  (cost=8.31..8.32 rows=1 width=32) (actual time=0.004..0.004 rows=1 loops=1653)
           ->  Index Scan using resource_aggregates_keys_unique on resource_aggregates resource_aggregates_3  (cost=0.29..8.30 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=1653)
                 Index Cond: ((date = (generate_series((GREATEST('2002-01-01'::date, $5))::timestamp with time zone, (LEAST('2040-02-04'::date, $7))::timestamp with time zone, '1 day'::interval))) AND (aggregation_type = 'experiment_label'::text))
   SubPlan 4
     ->  Aggregate  (cost=8.31..8.32 rows=1 width=32) (actual time=0.004..0.004 rows=1 loops=1653)
           ->  Index Scan using resource_aggregates_keys_unique on resource_aggregates resource_aggregates_4  (cost=0.29..8.30 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=1653)
                 Index Cond: ((date = (generate_series((GREATEST('2002-01-01'::date, $5))::timestamp with time zone, (LEAST('2040-02-04'::date, $7))::timestamp with time zone, '1 day'::interval))) AND (aggregation_type = 'resource_pool'::text))
 Planning Time: 0.294 ms
 Execution Time: 34.634 ms
(46 rows)

Copy link
Contributor

@eecsliu eecsliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super familiar with query planning, but from what I can tell it looks good!

@NicholasBlaskey NicholasBlaskey enabled auto-merge (squash) February 2, 2024 13:24
@NicholasBlaskey NicholasBlaskey merged commit ef656bc into main Feb 2, 2024
69 of 84 checks passed
@NicholasBlaskey NicholasBlaskey deleted the perf_resource_aggs branch February 2, 2024 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants