You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're currently familiar with the project, however we had developed spark-druid-segment-reader a couple years ago. Our main goal was to read Druid segments directly in ML pipelines on Apache Spark. As such, it served its purpose and after a couple requests we just recently decided to make the codebase open sourced.
The key difference is that spark-druid-segment-reader does not rely on metadata storage, the schema and latest version is inferred directly from the segment files. So, it doesn't need any Druid instance present. And this opens a couple of new use cases:
data scientists can work in the completely separated environment, with no Druid, they can have access just to the binary files
some customers (internal or external) can have a request to provide them their Druid data for ML or other purposes. With this tool they can read Druid segments with no Druid whatsoever (and that's our real use case).
To summarize, the tool mentioned looks more complete and with more features (like writing segments), however we decided to provide to the community something that worked for our specific needs for a long time.
Just wondering if you have look/review apache/druid#10920 before?
The text was updated successfully, but these errors were encountered: