Comparison with https://github.com/apache/druid/pull/10920 #1

maytasm · 2022-12-07T01:19:25Z

Just wondering if you have look/review apache/druid#10920 before?

sebastianzontek · 2023-01-12T04:37:33Z

We're currently familiar with the project, however we had developed spark-druid-segment-reader a couple years ago. Our main goal was to read Druid segments directly in ML pipelines on Apache Spark. As such, it served its purpose and after a couple requests we just recently decided to make the codebase open sourced.

The key difference is that spark-druid-segment-reader does not rely on metadata storage, the schema and latest version is inferred directly from the segment files. So, it doesn't need any Druid instance present. And this opens a couple of new use cases:

data scientists can work in the completely separated environment, with no Druid, they can have access just to the binary files
some customers (internal or external) can have a request to provide them their Druid data for ML or other purposes. With this tool they can read Druid segments with no Druid whatsoever (and that's our real use case).

To summarize, the tool mentioned looks more complete and with more features (like writing segments), however we decided to provide to the community something that worked for our specific needs for a long time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison with https://github.com/apache/druid/pull/10920 #1

Comparison with https://github.com/apache/druid/pull/10920 #1

maytasm commented Dec 7, 2022

sebastianzontek commented Jan 12, 2023

Comparison with https://github.com/apache/druid/pull/10920 #1

Comparison with https://github.com/apache/druid/pull/10920 #1

Comments

maytasm commented Dec 7, 2022

sebastianzontek commented Jan 12, 2023