Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abstract common base of SQL micro-benchmarks to reduce boilerplate and standardize parameters #17383

Merged
merged 3 commits into from
Oct 23, 2024

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Oct 19, 2024

changes:

  • adds SqlBenchmarkDatasets which contains commonly used benchmark data generator schemas
  • adds SqlBaseBenchmark which contains common benchmark segment generation methods for any benchmark using SqlBenchmarkDatasets
  • adds SqlBaseQueryBenchmark and SqlBasePlanBenchmark for benchmarks measuring queries and planning respectively
  • migrate all existing SQL jmh benchmarks to extend SqlBaseQueryBenchmark, quite dramatically reducing the boilerplate needed to create benchmarks, and allowing the use of multiple datasources within a benchmark file
  • adjustments to data generator stuff to allow passing in an ObjectMapper so that the same mapper can be used for both benchmark queries and segment generation, avoiding the need to register stuff with both mappers for benchmarks
  • adds SqlProjectionsBenchmark and SqlComplexMetricsColumnsBenchmark for measuring projections and measuring complex metric compression respectively

Common options are:

  • schemaType - "explicit" or "auto", to test differences between columns created with explicit dimension schemas vs AutoTypeColumnSchema that is used by schema discovery (and numbers have indexes and such)
  • storageType - "MMAP", "INCREMENTAL", "FRAME_COLUMNAR", "FRAME_ROW" for testing various backing "segment" types
  • stringEncoding - "UTF8", "FRONT_CODED_DEFAULT_V1", "FRONT_CODED_16_V1", for testing different string encoding strategies (only applies to "MMAP" storageType)
  • complexMetricCompression - "none", "lz4" for testing different complex metric compression in IndexSpec (only applies to "MMAP" storageType)

Most query benchmarks also have a numbered query parameter, the exception being SqlGroupByBenchmark which instead has a groupingDimension parameter.

Example:

DRUID_BENCHMARK_CACHE_DIR=./tmp java --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED -server -jar benchmarks/target/benchmarks.jar org.apache.druid.benchmark.query.SqlProjectionsBenchmark -p stringEncoding=UTF8 -p schemaType=explicit =p storageType=MMAP -p complexCompression=lz4 -p query=0

…d standardize parameters

changes:
* adds `SqlBenchmarkDatasets` which contains commonly used benchmark data generator schemas
* adds `SqlBaseBenchmark` which contains common benchmark segment generation methods for any benchmark using `SqlBenchmarkDatasets`
* adds `SqlBaseQueryBenchmark` and `SqlBasePlanBenchmark` for benchmarks measuring queries and planning respectively
* migrate all existing SQL jmh benchmarks to extend `SqlBaseQueryBenchmark`, quite dramatically reducing the boilerplate needed to create benchmarks, and allowing the use of multiple datasources within a benchmark file
* adjustments to data generator stuff to allow passing in an ObjectMapper so that the same mapper can be used for both benchmark queries and segment generation, avoiding the need to register stuff with both mappers for benchmarks
* adds `SqlProjectionsBenchmark` and `SqlComplexMetricsColumnsBenchmark` for measuring projections and measuring complex metric compression respectively
@clintropolis
Copy link
Member Author

failures are related to code coverage for changes that primarily are only used in tests or benchmarks. I should move this generator stuff into its own extension someday, but for now i think we can ignore it

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, ok to ignore the coverage failures since this is benchmarking code.

@clintropolis clintropolis merged commit 1157ecd into apache:master Oct 23, 2024
85 of 90 checks passed
@clintropolis clintropolis deleted the benchmark-improvements branch October 23, 2024 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants