FE OOM error cause com.starrocks.sql.optimizer.statistics.StatisticsCalculator.computeHMSTableScanNode does not perform predicate pushdown. #49911
Labels
type/enhancement
Make an enhancement to StarRocks
FE OOM Describe
When using StarRocks 3.3.0 to query a Hive table with a total of 500,000 partitions and 200 columns, the FE node experiences an OOM error, even if the query conditions limit the partitions.
sql like
select * from big_hive_table where dt='2024-08-12' limit 10;
Here are the dump analysis details:
stackTrace:
Detailed Problem Description
This causes the FE node’s memory utilization to spike and can lead to OOM errors.
The org.apache.hadoop.hive.metastore.api.Partition schema information is ignored when transfer to com.starrocks.connector.hive.Partition, except for the full path information. This leads to unnecessary resource usage .
Why ScanOperatorPredicates predicates is empty
Through debugging, it can be observed that although the query has partition constraints, the predicates remain empty.
starrocks/fe/fe-core/src/main/java/com/starrocks/sql/optimizer/statistics/StatisticsCalculator.java
Lines 583 to 619 in e703a61
Since predicates is null, the line:
List<PartitionKey> partitionKeys = predicates.hasPrunedPartition() ? predicates.getSelectedPartitionKeys() : PartitionUtil.getPartitionKeys(table);
retrieves all partition keys (partitionKeys). Consequently, in the subsequent call:
com.starrocks.connector.hive.HiveStatisticsProvider.getEstimatedRowCount
it queries all partitions from hiev metastoreI noticed that predicates is only assigned a value in one place,
starrocks/fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rule/transformation/ExternalScanPartitionPruneRule.java
Line 72 in e703a61
,but ruleRewriteIterative(tree, rootTaskContext, RuleSetType.PUSH_DOWN_PREDICATE) happens after the
starrocks/fe/fe-core/src/main/java/com/starrocks/sql/optimizer/Optimizer.java
Line 549 in e703a61
a potential solution could be to adjust the order of skewJoinOptimize(tree, rootTaskContext) and ruleRewriteIterative(tree, rootTaskContext, RuleSetType.PUSH_DOWN_PREDICATE) ?
org.apache.hadoop.hive.metastore.api.FieldSchema seems unsless
starrocks/fe/fe-core/src/main/java/com/starrocks/connector/hive/HiveMetastoreApiConverter.java
Lines 352 to 366 in a34d0ed
Fetching org.apache.hadoop.hive.metastore.api.Partition from the metastore retrieves the schema for each partition. This is not utilized at all in StarRocks. Perhaps a new interface could be added to the metastore to only fetch the necessary information.
The text was updated successfully, but these errors were encountered: