Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison with https://github.com/apache/druid/pull/10920 #1

Open
maytasm opened this issue Dec 7, 2022 · 1 comment
Open

Comments

@maytasm
Copy link

maytasm commented Dec 7, 2022

Just wondering if you have look/review apache/druid#10920 before?

@sebastianzontek
Copy link

We're currently familiar with the project, however we had developed spark-druid-segment-reader a couple years ago. Our main goal was to read Druid segments directly in ML pipelines on Apache Spark. As such, it served its purpose and after a couple requests we just recently decided to make the codebase open sourced.

The key difference is that spark-druid-segment-reader does not rely on metadata storage, the schema and latest version is inferred directly from the segment files. So, it doesn't need any Druid instance present. And this opens a couple of new use cases:

  • data scientists can work in the completely separated environment, with no Druid, they can have access just to the binary files
  • some customers (internal or external) can have a request to provide them their Druid data for ML or other purposes. With this tool they can read Druid segments with no Druid whatsoever (and that's our real use case).

To summarize, the tool mentioned looks more complete and with more features (like writing segments), however we decided to provide to the community something that worked for our specific needs for a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants