Experimental kafa simple consumer based firehose #1609

himanshug · 2015-08-07T20:26:28Z

tracks and addresses the review comments on #1482

copied description from #1482

This feature introduces a simple consumer implementation for realtime firehose. It keeps track of current offset metadata by storing it in metadata.drd with smooshed files, and will be able to recover the previous offset position after restart.
On restart, it take down the offset from the sequentially "last" valid persisted file, and rename all incomplete persist directory to a path for corrupted data. e.g. there are sub dirs /1, /2, /3, ..../8 under directory mydatasource/20150630T10:00:00-11:00:00/, and /7 doesn't contain meta.smoosh, RealtimePlumber will rename both /7 and /8 to corrupted/mydatasource/20150630T10:00:00-11:00:00/* and use offset from /6 as starting point.
When indexing, it follows the logic:

Call start()
Read currRow()
Call advance()
If index should be committed: commit()
GOTO 2
Note it will only advance when current row is successfully processed, which means saving the end offset of current message.

Google Group reference:
https://groups.google.com/forum/#!topic/druid-development/9HB9hCcqvuI

The goal of this PR is to

Introduce a driver that can handle a new FirehoseV2 interface that should enable better real-time support going forward.
Start persisting metadata along with segments so that a Firehose can potentially use that information to restart.
Help inform what kinds of things we need to think about and deal with as we move forward with rationalizing the real-time ingestion story

Even after this is merged, firehoseV2 is expected to be experimental and should not be the goto firehose for realtime ingestion. That will come after more adjustments, likely. Or, it's possible that this initial attempt informs things such that we actually go and change the interfaces or add a firehoseV3. As it stands, the PR does the useful thing that we initially need it to do and is hopefully a good spring board for further evolution.

himanshug · 2015-08-07T20:40:02Z

server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumber.java

@@ -352,9 +390,13 @@ public void doRun()
          {
            try {
              for (Pair<FireHydrant, Interval> pair : indexesToPersist) {
-                metrics.incrementRowOutputCount(persistHydrant(pair.lhs, schema, pair.rhs));
+                metrics.incrementRowOutputCount(
+                    persistHydrant(


outstanding review comment --
@gianm - Will we lose data if one hydrant is persisted with the metadata, then the plumber crashes? If I'm reading the code right, that would cause the next bootstrap to think that all the previously read data was persisted.

@himanshug - hmmm... that sounds correct, still thinking what would be the right thing to do here...
may be create a marker file in the end at /persist_dir/datasource/, and use commit metadata information only if the marker file was present?

@cheddar what do you think?

I think it makes sense to store the metadata outside the segments in a separate file. This is because the commit metadata isn't really associated with an individual segment-- it's associated with a set of segments that are persisted at the same time. So storing it in the segments is asking for problems.

Sort of like this:

{ "metadata" : {"foo": "bar"}, "segments": [ {"id": "datasource_2000_2001_2000_1", "hydrant": 10}, {"id": "datasource_2001_2002_2001_1", "hydrant": 12}, ] }

When a realtime node crashes and starts back up, it would delete any hydrants numbered higher than the ones in the commit file.

As I look more at this code, I think I agree about commitMetadata be associated with whole datasource at this level really and not to individual segments.

We could also just include the set of segments for the same chunk of metadata in more metadata on each of the segments.

I think there is value is storing it inside the segment as a form of lineage.

I also don't necessarily feel so strongly about it that I would be against a separate file, necessarily. I don't think that has to be changed in this initial PR, however. It actually unravels and creeps out the scope quite a bit 'cause it also requires us to consider the hand-off in terms of the full set of segments being handed off instead of individual segments being handed off (that is, if one of the set succeeds in handing off and the others fail, the real-time would believe that it needs to re-ingest the data).

noted the discussion in comments for future.

gianm · 2015-08-25T21:33:23Z

👍 after merge conflicts / travis are resolved

himanshug · 2015-08-26T01:34:19Z

@gianm resolved the merge conflict, jdk8 build actually failed on an unrelated test and hopefully will pass this time.

gianm · 2015-08-26T02:53:33Z

bouncing for travis

gianm · 2015-08-26T15:11:53Z

Thanks @himanshug

Hmm, does @cheddar's +1 on #1482 count towards this one? Anyone else want to / available to take a look?

fjy · 2015-08-26T15:18:58Z

@cheddar's +1 should count

gianm · 2015-08-26T21:27:01Z

Ok, sgtm. Will merge in a bit unless there are further comments. /cc @drcrallen @nishantmonu51 @xvrl who had commented on the previous PR.

drcrallen · 2015-08-26T21:30:43Z

common/src/main/java/io/druid/common/utils/SerializerUtils.java

-    byte[] stringBytes = new byte[length];
-    in.get(stringBytes);
-    return new String(stringBytes, UTF8);
+    return new String(readBytes(in, length), UTF8);


Can StringUtils.fromUtf8 be used?

will change

drcrallen · 2015-08-26T23:21:05Z

server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumber.java

+  {
+    return new File(
+        persistDir.getAbsolutePath()
+                  .replace(schema.getDataSource(), "corrupted/" + schema.getDataSource())


Better to use Path to build path than to assume "/" is the proper delimiter.

will change that to use File.Separator to remove the delimiter assumption

drcrallen · 2015-08-26T23:22:40Z

Is it possible to test the corruption code paths?

himanshug · 2015-08-27T06:24:38Z

@drcrallen addressed all your review comments in latest commit.
wrt to test for corruption code paths, I think @gianm is blocked on this PR getting merged to make progress on new plumber which is going to change some(maybe more than some) of this code. We will add more tests later if needed to cover various scenarios.

gianm · 2015-08-27T06:48:41Z

yeah, I was hoping to rebase the other PR off this one and then build on that.

drcrallen · 2015-08-27T16:19:32Z

extensions/kafka-eight-simpleConsumer/pom.xml

+    </dependency>
+    <dependency>
+      <groupId>org.apache.kafka</groupId>
+      <artifactId>kafka_2.10</artifactId>


(not blocking for this PR) we may need to consider better ways to handle the "different scala version require different artifact IDs" thing.

himanshug · 2015-08-27T16:48:23Z

@drcrallen updated code to have messaging around metadata parsing failure in IndexIO

fjy · 2015-08-27T17:21:58Z

I'm 👍 on this. It is experimental and not extremely impacting.

fjy · 2015-08-27T17:25:14Z

I think we have all the votes we need for this PR. Any more blockers/concerns?

@himanshug @gianm

drcrallen · 2015-08-27T17:31:01Z

👍 Provided that it will be revisited to add better corruption testing, and address some of the firehose concerns that could be deferred until later.

gianm · 2015-08-27T23:52:59Z

lgtm, @himanshug do you want to squash the commits a little? Maybe into one for the original patch and one for your changes, or however you want to do it

firehoseV2 addition to Realtime[Manager|Plumber], essential segment metadata persist support, kafka-simple-consumer-firehose extension patch

himanshug · 2015-08-28T01:51:28Z

@gianm rebased/squashed in 2 commits , 1 from original patch and another with the changes to address review comments

himanshug · 2015-08-28T04:03:14Z

@gianm ok, finally the build has passed :)

Experimental kafa simple consumer based firehose

himanshug mentioned this pull request Aug 7, 2015

kafka 8 simple consumer firehose #1482

Closed

himanshug reviewed Aug 7, 2015
View reviewed changes

fjy added this to the 0.8.2 milestone Aug 13, 2015

himanshug force-pushed the kafka_firehose2 branch from c579ce1 to 3b0e12d Compare August 13, 2015 18:59

This was referenced Aug 19, 2015

New plumber #1639

Merged

Epic: Realtime Ingestion Improvements #1642

Closed

himanshug force-pushed the kafka_firehose2 branch from 3b0e12d to b2d9c8e Compare August 24, 2015 15:47

himanshug force-pushed the kafka_firehose2 branch from b2d9c8e to 515ae0c Compare August 26, 2015 01:33

gianm closed this Aug 26, 2015

gianm reopened this Aug 26, 2015

drcrallen reviewed Aug 26, 2015
View reviewed changes

himanshug force-pushed the kafka_firehose2 branch from 515ae0c to 0b5e8d9 Compare August 27, 2015 06:22

drcrallen reviewed Aug 27, 2015
View reviewed changes

himanshug force-pushed the kafka_firehose2 branch from 0b5e8d9 to 3b60351 Compare August 27, 2015 16:47

himanshug force-pushed the kafka_firehose2 branch from 3b60351 to 2e830e3 Compare August 28, 2015 01:50

lvjq and others added 2 commits August 27, 2015 20:50

kafka 8 simple consumer firehose

2237a8c

adding UTs and addressing review comments to

2e0dd1d

firehoseV2 addition to Realtime[Manager|Plumber], essential segment metadata persist support, kafka-simple-consumer-firehose extension patch

himanshug force-pushed the kafka_firehose2 branch from 2e830e3 to 2e0dd1d Compare August 28, 2015 01:50

gianm closed this Aug 28, 2015

gianm reopened this Aug 28, 2015

gianm closed this Aug 28, 2015

gianm reopened this Aug 28, 2015

gianm closed this Aug 28, 2015

gianm reopened this Aug 28, 2015

himanshug closed this Aug 28, 2015

himanshug reopened this Aug 28, 2015

gianm added a commit that referenced this pull request Aug 28, 2015

Merge pull request #1609 from himanshug/kafka_firehose2

19c63a1

Experimental kafa simple consumer based firehose

gianm merged commit 19c63a1 into apache:master Aug 28, 2015

gianm mentioned this pull request Aug 28, 2015

IngestSegmentFirehose may not handle overlapping segments properly #1678

Closed

himanshug mentioned this pull request Aug 31, 2015

Store aggregatorFactories in segment metadata #1514

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental kafa simple consumer based firehose #1609

Experimental kafa simple consumer based firehose #1609

himanshug commented Aug 7, 2015

himanshug Aug 7, 2015

himanshug Aug 7, 2015

gianm Aug 11, 2015

himanshug Aug 11, 2015

cheddar Aug 11, 2015

himanshug Aug 13, 2015

gianm commented Aug 25, 2015

himanshug commented Aug 26, 2015

gianm commented Aug 26, 2015

gianm commented Aug 26, 2015

fjy commented Aug 26, 2015

gianm commented Aug 26, 2015

drcrallen Aug 26, 2015

himanshug Aug 27, 2015

drcrallen Aug 26, 2015

himanshug Aug 27, 2015

drcrallen commented Aug 26, 2015

himanshug commented Aug 27, 2015

gianm commented Aug 27, 2015

drcrallen Aug 27, 2015

himanshug commented Aug 27, 2015

fjy commented Aug 27, 2015

fjy commented Aug 27, 2015

drcrallen commented Aug 27, 2015

gianm commented Aug 27, 2015

himanshug commented Aug 28, 2015

himanshug commented Aug 28, 2015

Experimental kafa simple consumer based firehose #1609

Experimental kafa simple consumer based firehose #1609

Conversation

himanshug commented Aug 7, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gianm commented Aug 25, 2015

himanshug commented Aug 26, 2015

gianm commented Aug 26, 2015

gianm commented Aug 26, 2015

fjy commented Aug 26, 2015

gianm commented Aug 26, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drcrallen commented Aug 26, 2015

himanshug commented Aug 27, 2015

gianm commented Aug 27, 2015

Choose a reason for hiding this comment

himanshug commented Aug 27, 2015

fjy commented Aug 27, 2015

fjy commented Aug 27, 2015

drcrallen commented Aug 27, 2015

gianm commented Aug 27, 2015

himanshug commented Aug 28, 2015

himanshug commented Aug 28, 2015