abstract common base of SQL micro-benchmarks to reduce boilerplate and standardize parameters #17383

clintropolis · 2024-10-19T01:47:20Z

changes:

adds SqlBenchmarkDatasets which contains commonly used benchmark data generator schemas
adds SqlBaseBenchmark which contains common benchmark segment generation methods for any benchmark using SqlBenchmarkDatasets
adds SqlBaseQueryBenchmark and SqlBasePlanBenchmark for benchmarks measuring queries and planning respectively
migrate all existing SQL jmh benchmarks to extend SqlBaseQueryBenchmark, quite dramatically reducing the boilerplate needed to create benchmarks, and allowing the use of multiple datasources within a benchmark file
adjustments to data generator stuff to allow passing in an ObjectMapper so that the same mapper can be used for both benchmark queries and segment generation, avoiding the need to register stuff with both mappers for benchmarks
adds SqlProjectionsBenchmark and SqlComplexMetricsColumnsBenchmark for measuring projections and measuring complex metric compression respectively

Common options are:

schemaType - "explicit" or "auto", to test differences between columns created with explicit dimension schemas vs AutoTypeColumnSchema that is used by schema discovery (and numbers have indexes and such)
storageType - "MMAP", "INCREMENTAL", "FRAME_COLUMNAR", "FRAME_ROW" for testing various backing "segment" types
stringEncoding - "UTF8", "FRONT_CODED_DEFAULT_V1", "FRONT_CODED_16_V1", for testing different string encoding strategies (only applies to "MMAP" storageType)
complexMetricCompression - "none", "lz4" for testing different complex metric compression in IndexSpec (only applies to "MMAP" storageType)

Most query benchmarks also have a numbered query parameter, the exception being SqlGroupByBenchmark which instead has a groupingDimension parameter.

Example:

DRUID_BENCHMARK_CACHE_DIR=./tmp java --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED -server -jar benchmarks/target/benchmarks.jar org.apache.druid.benchmark.query.SqlProjectionsBenchmark -p stringEncoding=UTF8 -p schemaType=explicit =p storageType=MMAP -p complexCompression=lz4 -p query=0

…d standardize parameters changes: * adds `SqlBenchmarkDatasets` which contains commonly used benchmark data generator schemas * adds `SqlBaseBenchmark` which contains common benchmark segment generation methods for any benchmark using `SqlBenchmarkDatasets` * adds `SqlBaseQueryBenchmark` and `SqlBasePlanBenchmark` for benchmarks measuring queries and planning respectively * migrate all existing SQL jmh benchmarks to extend `SqlBaseQueryBenchmark`, quite dramatically reducing the boilerplate needed to create benchmarks, and allowing the use of multiple datasources within a benchmark file * adjustments to data generator stuff to allow passing in an ObjectMapper so that the same mapper can be used for both benchmark queries and segment generation, avoiding the need to register stuff with both mappers for benchmarks * adds `SqlProjectionsBenchmark` and `SqlComplexMetricsColumnsBenchmark` for measuring projections and measuring complex metric compression respectively

benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlBenchmark.java

...hmarks/src/test/java/org/apache/druid/benchmark/query/SqlComplexMetricsColumnsBenchmark.java

benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlExpressionBenchmark.java

benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlNestedDataBenchmark.java

benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlPlanBenchmark.java

benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlProjectionsBenchmark.java

benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlWindowFunctionsBenchmark.java

clintropolis · 2024-10-22T16:18:14Z

failures are related to code coverage for changes that primarily are only used in tests or benchmarks. I should move this generator stuff into its own extension someday, but for now i think we can ignore it

gianm

LGTM, ok to ignore the coverage failures since this is benchmarking code.

github-actions bot added the Area - Segment Format and Ser/De label Oct 19, 2024

adjustment

04d7f4d

github-advanced-security bot found potential problems Oct 19, 2024

View reviewed changes

style

df8f5fb

gianm approved these changes Oct 22, 2024

View reviewed changes

clintropolis merged commit 1157ecd into apache:master Oct 23, 2024
85 of 90 checks passed

clintropolis deleted the benchmark-improvements branch October 23, 2024 02:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

abstract common base of SQL micro-benchmarks to reduce boilerplate and standardize parameters #17383

abstract common base of SQL micro-benchmarks to reduce boilerplate and standardize parameters #17383

clintropolis commented Oct 19, 2024 •

edited

Loading

clintropolis commented Oct 22, 2024

gianm left a comment

abstract common base of SQL micro-benchmarks to reduce boilerplate and standardize parameters #17383

abstract common base of SQL micro-benchmarks to reduce boilerplate and standardize parameters #17383

Conversation

clintropolis commented Oct 19, 2024 • edited Loading

clintropolis commented Oct 22, 2024

gianm left a comment

Choose a reason for hiding this comment

clintropolis commented Oct 19, 2024 •

edited

Loading