Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of the latest_of_each method of the manager #1360

Merged
merged 5 commits into from
Jul 22, 2024

Conversation

bagababaoglu
Copy link
Contributor

@bagababaoglu bagababaoglu commented Jun 26, 2024

Description

latest_of_each method of the manager is currently getting all the latest pks and using a pk__in filter to determine final results. But in filter is performing bad, especially when there are a lot of rows in the db. Every item in the database needs to be compared by this long list. This can be avoided by using another subquery which would annotate the existence of later items and then it can be used to filter the query by this field to get latest of each item.

This also removes the need to have database specific implementation as the resulting query should work on each.

Related Issue

Currently there are no related issues.

Motivation and Context

Improves performance for latest_of_each method which is necessary for history tables which are having a lot of data.

How Has This Been Tested?

Existing tests for the method has been used.

Results from query analysis which ran on postgresql. The table 1.8 million entries.

This is the cost analysis for the improved query

Limit  (cost=130063.36..131511.02 rows=50 width=5438) (actual time=327.427..343.020 rows=50 loops=1)

This is the cost analysis for the normal query

Limit  (cost=483323.35..483323.47 rows=50 width=5438) (actual time=2142.676..2148.425 rows=50 loops=1)

According to these results, there is %84,7338936 improvement.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Improvement

Checklist:

  • I have run the pre-commit run command to format and lint.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have added my name and/or github handle to AUTHORS.rst
  • I have added my change to CHANGES.rst
  • All new and existing tests passed.

@bagababaoglu bagababaoglu marked this pull request as draft June 26, 2024 15:37
@bagababaoglu bagababaoglu force-pushed the improve-latest-of-each branch 2 times, most recently from 433da4a to 804dee3 Compare June 26, 2024 15:57
@bagababaoglu bagababaoglu marked this pull request as ready for review June 26, 2024 16:12
@tim-schilling
Copy link
Contributor

@bagababaoglu did you happen to also test this on a mysql database?

@tim-schilling
Copy link
Contributor

@bagababaoglu did you compare the Subquery approach that's not vendor specific to this approach?

@bagababaoglu
Copy link
Contributor Author

@bagababaoglu did you happen to also test this on a mysql database?

@tim-schilling yes, I tested it for all databases using make test command. All tests were successful.

simple_history/__init__.py                                                                                                                           13      0      2      0   100%
simple_history/admin.py                                                                                                                             172      2     40      0    99%
simple_history/exceptions.py                                                                                                                          8      0      0      0   100%
simple_history/management/__init__.py                                                                                                                 0      0      0      0   100%
simple_history/management/commands/__init__.py                                                                                                        0      0      0      0   100%
simple_history/management/commands/clean_duplicate_history.py                                                                                        76      1     36      2    97%
simple_history/management/commands/clean_old_history.py                                                                                              40      0     14      0   100%
simple_history/management/commands/populate_history.py                                                                                              100      0     40      4    97%
simple_history/manager.py                                                                                                                           108      1     36      2    98%
simple_history/middleware.py                                                                                                                         23      5      2      1    76%
simple_history/models.py                                                                                                                            545     12    204     10    97%
simple_history/registry_tests/__init__.py                                                                                                             0      0      0      0   100%
simple_history/registry_tests/migration_test_app/__init__.py                                                                                          0      0      0      0   100%
simple_history/registry_tests/migration_test_app/migrations/0001_initial.py                                                                           7      0      0      0   100%
simple_history/registry_tests/migration_test_app/migrations/0002_historicalmodelwithcustomattrforeignkey_modelwithcustomattrforeignkey.py             8      0      0      0   100%
simple_history/registry_tests/migration_test_app/migrations/0003_alter_historicalmodelwithcustomattrforeignkey_options_and_more.py                    4      0      0      0   100%
simple_history/registry_tests/migration_test_app/migrations/0004_history_date_indexing.py                                                             4      0      0      0   100%
simple_history/registry_tests/migration_test_app/migrations/0005_historicalmodelwithcustomattronetoonefield_modelwithcustomattronetoonefield.py       8      0      0      0   100%
simple_history/registry_tests/migration_test_app/migrations/0006_alter_historicalmodelwithcustomattronetoonefield_options_and_more.py                 4      0      0      0   100%
simple_history/registry_tests/migration_test_app/migrations/0007_alter_historicalmodelwithcustomattrforeignkey_options_and_more.py                    4      0      0      0   100%
simple_history/registry_tests/migration_test_app/migrations/__init__.py                                                                               0      0      0      0   100%
simple_history/registry_tests/migration_test_app/models.py                                                                                           37      0      4      2    95%
simple_history/registry_tests/tests.py                                                                                                              101      4     12      0    96%
simple_history/signals.py                                                                                                                             5      0      0      0   100%
simple_history/template_utils.py                                                                                                                     85      0     22      0   100%
simple_history/templatetags/__init__.py                                                                                                               0      0      0      0   100%
simple_history/templatetags/getattributes.py                                                                                                          5      0      0      0   100%
simple_history/templatetags/simple_history_admin_list.py                                                                                              7      0      0      0   100%
simple_history/templatetags/simple_history_compat.py                                                                                                  4      0      0      0   100%
simple_history/utils.py                                                                                                                              78      0     28      0   100%
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                                              1446     25    440     21    98%
py38-dj42-mysql: OK ✔ in 3 minutes 50.47 seconds

Or do you mean if I have tested the performance on mysql database? In that case, no I haven't as I had the data for Postgresql only. Do you think the performance improvement would be different for mysql?

@bagababaoglu
Copy link
Contributor Author

@bagababaoglu did you compare the Subquery approach that's not vendor specific to this approach?

@tim-schilling I did now. Here are the results:

This is the current version with not exists subquery

Gather Merge  (cost=1022.88..940164.10 rows=1030544 width=1229) (actual time=114.145..4341.837 rows=982706 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Incremental Sort  (cost=22.85..820213.72 rows=515272 width=1229) (actual time=84.218..1966.828 rows=327569 loops=3)
"        Sort Key: w3syse_historicalprojectrequirement.history_date DESC, w3syse_historicalprojectrequirement.history_id DESC"
        Presorted Key: w3syse_historicalprojectrequirement.history_date
        Full-sort Groups: 9922  Sort Method: quicksort  Average Memory: 52kB  Peak Memory: 67kB
        Pre-sorted Groups: 21  Sort Method: quicksort  Average Memory: 1272kB  Peak Memory: 3549kB
        Worker 0:  Full-sort Groups: 9950  Sort Method: quicksort  Average Memory: 49kB  Peak Memory: 67kB
          Pre-sorted Groups: 34  Sort Method: quicksort  Average Memory: 891kB  Peak Memory: 1067kB
        Worker 1:  Full-sort Groups: 9941  Sort Method: quicksort  Average Memory: 54kB  Peak Memory: 67kB
"          Pre-sorted Groups: 24  Sort Methods: quicksort, external merge  Average Memory: 204kB  Peak Memory: 546kB  Average Disk: 1735kB  Peak Disk: 2776kB"
        ->  Nested Loop Anti Join  (cost=0.85..795435.12 rows=515272 width=1229) (actual time=83.921..1846.508 rows=327569 loops=3)
              ->  Parallel Index Scan Backward using w3syse_historicalprojectrequirement2_history_date_d89ef834 on w3syse_historicalprojectrequirement  (cost=0.43..252182.47 rows=772908 width=1229) (actual time=83.850..331.192 rows=618388 loops=3)
              ->  Index Scan using w3syse_historicalprojectrequirement2_id_35c6c746 on w3syse_historicalprojectrequirement u0  (cost=0.43..0.73 rows=1 width=24) (actual time=0.002..0.002 rows=0 loops=1855164)
                    Index Cond: (id = w3syse_historicalprojectrequirement.id)
                    Filter: (history_date > w3syse_historicalprojectrequirement.history_date)
                    Rows Removed by Filter: 1
Planning Time: 0.238 ms
JIT:
  Functions: 30
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 1.620 ms, Inlining 94.133 ms, Optimization 92.618 ms, Emission 64.764 ms, Total 253.134 ms"
Execution Time: 4362.702 ms

This is for the id__in filter and id subquery

Incremental Sort  (cost=635.22..23458369.87 rows=927490 width=1229) (actual time=63.291..6967.493 rows=982706 loops=1)
"  Sort Key: w3syse_historicalprojectrequirement.history_date DESC, w3syse_historicalprojectrequirement.history_id DESC"
  Presorted Key: w3syse_historicalprojectrequirement.history_date
  Full-sort Groups: 29803  Sort Method: quicksort  Average Memory: 51kB  Peak Memory: 67kB
"  Pre-sorted Groups: 77  Sort Methods: quicksort, external merge  Average Memory: 514kB  Peak Memory: 1404kB  Average Disk: 1803kB  Peak Disk: 3472kB"
  ->  Index Scan Backward using w3syse_historicalprojectrequirement2_history_date_d89ef834 on w3syse_historicalprojectrequirement  (cost=0.43..23408460.48 rows=927490 width=1229) (actual time=63.080..6638.325 rows=982706 loops=1)
        Filter: (SubPlan 1)
        Rows Removed by Filter: 872458
        SubPlan 1
          ->  Limit  (cost=12.47..12.47 rows=1 width=12) (actual time=0.003..0.003 rows=1 loops=1855164)
                ->  Sort  (cost=12.47..12.48 rows=2 width=12) (actual time=0.003..0.003 rows=1 loops=1855164)
"                      Sort Key: u0.history_date DESC, u0.history_id DESC"
                      Sort Method: quicksort  Memory: 25kB
                      ->  Index Scan using w3syse_historicalprojectrequirement2_id_35c6c746 on w3syse_historicalprojectrequirement u0  (cost=0.43..12.46 rows=2 width=12) (actual time=0.002..0.002 rows=2 loops=1855164)
                            Index Cond: (id = w3syse_historicalprojectrequirement.id)
Planning Time: 0.167 ms
JIT:
  Functions: 14
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 0.973 ms, Inlining 5.540 ms, Optimization 33.635 ms, Emission 23.815 ms, Total 63.963 ms"
Execution Time: 6986.603 ms

Although the difference is not that big, not exists subquery is still performing better. IMO, it is better to avoid in filter if it is possible.

@tim-schilling
Copy link
Contributor

Thank you! I'm in favor of this. @ddabble thoughts?

tim-schilling
tim-schilling previously approved these changes Jul 12, 2024
@ddabble
Copy link
Member

ddabble commented Jul 16, 2024

Ooh, that's a satisfying simplification! I'll check it out more closely sometime this week :)

Copy link
Member

@ddabble ddabble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, great job! 😄

@bagababaoglu Do you have any feedback on the changes I made? 🙂 Did I misrepresent anything in the comments/descriptions?

@bagababaoglu
Copy link
Contributor Author

Looks nice, great job! 😄

@bagababaoglu Do you have any feedback on the changes I made? 🙂 Did I misrepresent anything in the comments/descriptions?

Thank you for your additions 🙏 They look fine to me.

@ddabble ddabble merged commit d8cf0b8 into jazzband:master Jul 22, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

as_of with Postgres generates a Sequential Scan with composite index enabled
3 participants