Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appenderator plumber #2913

Merged
merged 1 commit into from
May 26, 2016
Merged

appenderator plumber #2913

merged 1 commit into from
May 26, 2016

Conversation

javasoze
Copy link
Contributor

@javasoze javasoze commented May 2, 2016

This is primarily from Gian's work here:
https://github.com/gianm/druid/tree/appenderator-skunkworks

  1. I moved the AppenderatorPlumber stuff to io.druid.segment.realtime.appenderator package from io.druid.segment.realtime.skunkworks package.

  2. Added a test for AppenderatorPlumber.

@fjy
Copy link
Contributor

fjy commented May 4, 2016

This is a very useful PR and required to pave the way for some larger scale changes so we should try to get this in.

@gianm
Copy link
Contributor

gianm commented May 4, 2016

Some context:

This is meant to be an adapter from an Appenderator (#2220) into the Plumber interface. Right now Appenderators are only used by the KafkaIndexTask. @javasoze has been working on a custom Appenderator as an extension and wanted to use it with realtime nodes too, which this would help with.

The motivation for the custom Appenderator is that @javasoze is wanting to create a new segment type based on Lucene indexes rather than the traditional Druid columnar format with smooshes and all that.

The hope is that with some changes in Druid we could support more than one kind of segment format, where "base" queries (like topN, timeseries, groupBy) work on all formats, but maybe additional extension queries only work on specific formats, or maybe certain features (like certain filters) would be implemented in more optimal ways on certain formats.

The first step towards experimenting with that is making it possible to customize segment creation via implementing a custom Appenderator, which this PR helps with by making that interface usable in realtime nodes. If that works out then we'd also want to make it usable in batch ingestion and in realtime tasks.

@fjy fjy added the Feature label May 4, 2016
@xvrl
Copy link
Member

xvrl commented May 5, 2016

@fjy @javasoze can we get some broader context on this? It would be worthwhile to submit a proposal to the mailing list for discussion: it sounds like there has been a lot of private discussion already that would be worthwhile sharing with the community.

@gianm
Copy link
Contributor

gianm commented May 5, 2016

@xvrl, definitely, although we have not yet figured out the details of a proposal so there is nothing concrete to share yet. Hoping to have something there soon-ish. The goal though is to make it possible to store a Druid dataSource using some engine other than Druid's traditional storage engine (like Lucene).

Probably in addition to this PR, a sketch of a fuller proposal is,

  • using Appenderator for batch ingestion too (there's already an Appenderators.offline builder method that should work for this)
  • somehow abstract over IndexIO so SegmentLoader can use different loading strategies for getting Segment objects, other than IndexIO.loadIndex to allow historicals to load segement with alternative storage engines
  • something in Segment that lets query engines get different interfaces other than QueryableIndex and StorageAdapter, maybe T as(Class<T> clazz) where as(StorageAdapter.class) would do the same thing as asStorageAdapter() but could also be used for interfaces that only exist in extensions. this would allow queries to use features that are only present in alternative storage engines. if those engines also implemented StorageAdapter, then all existing queries would work on them too.

I think these changes could be done in a pretty non-invasive way and then open up some cool possibilities for storage engine extensions.

Hopefully this PR can be reviewed before a fuller proposal is ready.

@xvrl
Copy link
Member

xvrl commented May 5, 2016

is there any benefit of getting this merged before we flesh out the proposal?

@gianm
Copy link
Contributor

gianm commented May 5, 2016

merged, I'm not sure (@javasoze?), but looked at – that always helps :)

@javasoze
Copy link
Contributor Author

javasoze commented May 5, 2016

Thanks @gianm for providing the context.

@xvrl I guess I had hoped the AppenderatorPlumber would be a general purpose extension point where people can do interesting things with Druid.

I used @gianm's branch and was able to plug-in full-text support through Lucene via this extension point and thought it would be interesting to explore further.

I am not sure if this PR should be treated as a simple extension point or part of a larger roadmap. Personally I am a fan of small incremental changes that make sense.

So I would like to get an opinion from the community to see if this extension point makes sense in general without making it a dependency of a larger discussion/roadmap.

Hope this makes sense.

Thanks!

@javasoze
Copy link
Contributor Author

Haven't heard anymore comments on this. @xvrl Any objections in merging this?

@fjy
Copy link
Contributor

fjy commented May 10, 2016

@javasoze we are trying to release 0.9.1 right now so all other PRs that are non-urgent will require a few more days before we can get to it

@fjy fjy added this to the 0.9.2 milestone May 10, 2016

final long startDelay = runExecStopwatch.elapsed(TimeUnit.MILLISECONDS);
metrics.incrementPersistBackPressureMillis(startDelay);
if (startDelay > WARN_DELAY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a minor comment, but in the event that there are a lot of persists occurring, we'll get spammed with these messages. But I think when so many persists are occurring, we'll have other problems

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javasoze yeah I mean the log.warn line below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply moved over code by @gianm so I am under the impression that this is needed. @gianm, comments?

@fjy
Copy link
Contributor

fjy commented May 16, 2016

👍

@javasoze
Copy link
Contributor Author

@fjy Do you mean the log.warn()?

@fjy
Copy link
Contributor

fjy commented May 26, 2016

@himanshug can you help review this?

@himanshug
Copy link
Contributor

@javasoze how did you use this PR to create lucene segment? i imaging you would have written an Appenderator impl based on lucene, but, i see that RealtimeIndexTask would still be using RealtimePlumberSchool even after this PR .

@javasoze
Copy link
Contributor Author

@himanshug Yes, I implemented a Lucene based Appenderator. For this PR, we don't expect everything to be migrated to AppenderatorPlumber, only an extension point. As for longer term plans for RealtimePlumberSchool, it is out of the scope of this PR.

@himanshug
Copy link
Contributor

👍

note to users: custom Appenderators can only be utilized via Realtime nodes currently.

@fjy fjy modified the milestones: 0.9.1, 0.9.2 May 26, 2016
@fjy fjy merged commit a004f1e into apache:master May 26, 2016
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants