Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward incompatibility it #12

Open
wants to merge 77 commits into
base: master
Choose a base branch
from

Conversation

findingrish
Copy link
Owner

No description provided.

abhishekrb19 and others added 3 commits August 2, 2024 10:01
* Rename setExpectedSegment to setExpectedSegments in MSQTestBase.

* Add expected segments for max num segments test cases.
…tor (apache#16691)

Design:
The loading rate is computed as a moving average of at least the last 10 GiB of successful segment loads.
To account for multiple loading threads on a server, we use the concept of a batch to track load times.
A batch is a set of segments added by the coordinator to the load queue of a server in one go.

Computation:
batchDurationMillis = t(load queue becomes empty) - t(first load request in batch is sent to server)
batchBytes = total bytes successfully loaded in batch
avg loading rate in batch (kbps) = (8 * batchBytes) / batchDurationMillis
overall avg loading rate (kbps) = (8 * sumOverWindow(batchBytes)) / sumOverWindow(batchDurationMillis)

Changes:
- Add `LoadingRateTracker` which computes a moving average load rate based on
the last few GBs of successful segment loads.
- Emit metric `segment/loading/rateKbps` from the Coordinator. In the future, we may
also consider emitting this metric from the historicals themselves.
- Add `expectedLoadTimeMillis` to response of API `/druid/coordinator/v1/loadQueue?simple`
findingrish and others added 5 commits August 3, 2024 18:10
…ig (apache#16822)

This patch introduces an optional cluster configuration, druid.indexing.formats.stringMultiValueHandlingMode, allowing operators to override the default mode SORTED_SET for string dimensions. The possible values for the config are SORTED_SET, SORTED_ARRAY, or ARRAY (SORTED_SET is the default). Case insensitive values are allowed.
While this cluster property allows users to manage the multi-value handling mode for string dimension types, it's recommended to migrate to using real array types instead of MVDs.
 
This fixes a long-standing issue where compaction will honor the configured cluster wide property instead of rewriting it as the default SORTED_ARRAY always, even if the data was originally ingested with ARRAY or SORTED_SET.
sreemanamala and others added 13 commits August 5, 2024 11:27
…eries (apache#16835)

subquery/rows and subquery/bytes metrics have been added, which indicate the size of the results materialized on the heap.
…field readers (apache#16825)

This patch fixes queries like `SELECT COUNT(DISTINCT json_col) FROM foo`
This PR adds indexer-level task metrics-

"indexer/task/failed/count"
"indexer/task/success/count"

the current "worker/task/completed/count" metric shows all the tasks completed irrespective of success or failure status so these metrics would help us get more visibility into the status of the completed tasks
* enables to launch a fake broker based on test resources (druidtest uri)
* could record queries into new testfiles during usage
* instead of re-purpose Calcite's Hook migrates to use DruidHook which we can add further keys
* added a quidem-ut module which could be the place for tests which could iteract with modules/etc
…che#16598)

* Add columnMapping information in the Explain dialog

* use arrow char
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment