Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aggreate key dependency optimizer #1517

Merged
merged 1 commit into from
May 7, 2023
Merged

Add aggreate key dependency optimizer #1517

merged 1 commit into from
May 7, 2023

Conversation

andyfengHKU
Copy link
Contributor

@andyfengHKU andyfengHKU commented May 6, 2023

This PR adds AggregateKeyDependencyOptimizer which analyzes user input group by keys' dependency and split into keys and dependentKeys. A dependentKey has functional dependency on a key. keys will be hashed during hash aggregate while dependentKeys will be directly materialized into f-table.

Consider the following example

RETURN a.ID, a.age, COUNT(*)

Based on user input, both a.ID and a.age are stated as hash keys. However, since a.ID is the primary key, a.age has functional dependency on a.ID so we only need to hash a.ID in hash aggregate.

This optimizer is more important given Cypher grammar because the grammar itself doesn't differentiate SELECT and GROUP BY which means user cannot group by a subset of selection list. Another case is Cypher allows group by Node or Rel which by definition is to group by all properties of Node or Rel. A more reasonable solution is to only group by their internal IDs and treat properties as dependentKeys.

Performance benchmark

Dataset: LDBC100
Machine: M1 Pro, 16GB memory, maximum 4 threads
Query: MATCH (a:Person)-[:knows]->(b:Person) RETURN keys, COUNT(*) LIMIT 1; (Note that Aggregate is a Sink operator that is not affected by LIMIT, LIMIT 1 is to reduce print size).

Keys Optimizer ON Optimizer OFF
ID 207 ms 198 ms
ID, firstName 204 ms 292 ms
ID, firstName, lastName 214 ms 356 ms

@andyfengHKU andyfengHKU requested review from acquamarin and semihsalihoglu-uw and removed request for acquamarin May 6, 2023 19:48
@codecov
Copy link

codecov bot commented May 6, 2023

Codecov Report

Patch coverage: 98.78% and project coverage change: +0.01 🎉

Comparison is base (bfc1b0d) 91.92% compared to head (1bd8063) 91.93%.

❗ Current head 1bd8063 differs from pull request most recent head 1aef377. Consider uploading reports for the commit 1aef377 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1517      +/-   ##
==========================================
+ Coverage   91.92%   91.93%   +0.01%     
==========================================
  Files         678      680       +2     
  Lines       24458    24528      +70     
==========================================
+ Hits        22482    22550      +68     
- Misses       1976     1978       +2     
Impacted Files Coverage Δ
src/include/processor/mapper/plan_mapper.h 100.00% <ø> (ø)
src/planner/operator/logical_aggregate.cpp 97.67% <94.11%> (-2.33%) ⬇️
src/processor/mapper/map_aggregate.cpp 96.49% <96.42%> (-1.98%) ⬇️
...c/include/optimizer/agg_key_dependency_optimizer.h 100.00% <100.00%> (ø)
.../logical_plan/logical_operator/logical_aggregate.h 94.73% <100.00%> (-5.27%) ⬇️
...r/logical_plan/logical_operator/logical_distinct.h 94.44% <100.00%> (-5.56%) ⬇️
...rocessor/operator/aggregate/aggregate_hash_table.h 77.77% <100.00%> (ø)
...lude/processor/operator/aggregate/hash_aggregate.h 100.00% <100.00%> (ø)
src/optimizer/agg_key_dependency_optimizer.cpp 100.00% <100.00%> (ø)
src/optimizer/optimizer.cpp 100.00% <100.00%> (ø)
... and 5 more

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@andyfengHKU andyfengHKU merged commit fc358a3 into master May 7, 2023
6 of 7 checks passed
@andyfengHKU andyfengHKU deleted the agg-dependency branch May 7, 2023 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants