Fixes and tests related to the Indexer process. #10631

gianm · 2020-12-04T04:53:39Z

Three bugs fixed:

Indexers would not announce themselves as segment servers if they
did not have storage locations defined. This used to work, but was
broken in Load broadcast datasources on broker and tasks #9971. Fixed this by adding an "isSegmentServer" method
to ServerType and updating SegmentLoadDropHandler to always announce
if this method returns true.
Certain batch task types were written in a way that assumed "isReady"
would be called before "run", which is not guaranteed. In particular,
they relied on it in order to initialize "taskLockHelper". Fixed this
by updating AbstractBatchIndexTask to ensure "isReady" is called
before "run" for these tasks.
UnifiedIndexerAppenderatorsManager did not properly handle join
datasources. Introduced DataSourceAnalysis in order to fix this.

Test changes:

Add a new "docker-compose.cli-indexer.yml" config that spins up an
Indexer instead of a MiddleManager.
Introduce a "USE_INDEXER" environment variable that determines if
docker-compose will start up an Indexer or a MiddleManager.
Duplicate all the jdk8 tests and run them in both MiddleManager and
Indexer mode.
Various adjustments to encourage fail-fast errors in the Docker
build scripts.
Various adjustments to speed up integration tests and reduce memory
usage.
Add another Mac-specific approach to determining a machine's own IP.
This was useful on my development machine.
Update segment-count check in ITCompactionTaskTest to eliminate a
race condition (it was looking for 6 segments, which only exist
together briefly, until the older 4 are marked unused).

Javadoc updates:

AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
that make it clear when taskLockHelper will be initialized as a side
effect. (Related to the second bug above.)
Task: Clarified that "isReady" is not guaranteed to be called before
"run". It was already implied, but now it's explicit.
ZkCoordinator: Clarified deprecation message.
DataSegmentServerAnnouncer: Clarified deprecation message.

Three bugs fixed: 1) Indexers would not announce themselves as segment servers if they did not have storage locations defined. This used to work, but was broken in apache#9971. Fixed this by adding an "isSegmentServer" method to ServerType and updating SegmentLoadDropHandler to always announce if this method returns true. 2) Certain batch task types were written in a way that assumed "isReady" would be called before "run", which is not guaranteed. In particular, they relied on it in order to initialize "taskLockHelper". Fixed this by updating AbstractBatchIndexTask to ensure "isReady" is called before "run" for these tasks. 3) UnifiedIndexerAppenderatorsManager did not properly handle complex datasources. Introduced DataSourceAnalysis in order to fix this. Test changes: 1) Add a new "docker-compose.cli-indexer.yml" config that spins up an Indexer instead of a MiddleManager. 2) Introduce a "USE_INDEXER" environment variable that determines if docker-compose will start up an Indexer or a MiddleManager. 3) Duplicate all the jdk8 tests and run them in both MiddleManager and Indexer mode. 4) Various adjustments to encourage fail-fast errors in the Docker build scripts. 5) Various adjustments to speed up integration tests and reduce memory usage. 6) Add another Mac-specific approach to determining a machine's own IP. This was useful on my development machine. 7) Update segment-count check in ITCompactionTaskTest to eliminate a race condition (it was looking for 6 segments, which only exist together briefly, until the older 4 are marked unused). Javadoc updates: 1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX that make it clear when taskLockHelper will be initialized as a side effect. (Related to the second bug above.) 2) Task: Clarified that "isReady" is not guaranteed to be called before "run". It was already implied, but now it's explicit. 3) ZkCoordinator: Clarified deprecation message. 4) DataSegmentServerAnnouncer: Clarified deprecation message.

jihoonson · 2020-12-04T05:18:34Z

#10538 and #10258 will be fixed by this PR. Perhaps #9820 as well.

jihoonson · 2020-12-04T05:30:22Z

indexing-service/src/main/java/org/apache/druid/indexing/common/task/Task.java

+   * This method will not necessarily be executed before {@link #run(TaskToolbox)}, since this task readiness check
+   * may be done on a different machine from the one that actually runs the task.


Hmm, how can this happen? I thought Task.isReady() is always called before Task.run() because a task can be scheduled only when Task.isReady() returns true. Task.run() will be called after the task is scheduled in some indexer or middleManager.

It won't necessarily be called on that same Task object (it might be called on a different instance that represents the same actual task).

Oh, I see. I think it happens only in indexers because peon calls isReady() before it runs its task.

jihoonson · 2020-12-04T05:44:25Z

integration-tests/docker/environment-configs/coordinator

+druid_coordinator_period_indexingPeriod=PT180000S
+druid_coordinator_period=PT1S


Is this too aggressive? 😅

Ha ha 🙂

I think it's OK — there are going to be very few segments in the integration tests, so a coordinator run should be able to finish very quickly. This change was meant to speed up the tests.

gianm · 2020-12-04T05:56:53Z

#10538 and #10258 will be fixed by this PR. Perhaps #9820 as well.

I agree, it should fix all of those. I commented in those issues.

jihoonson

+1 after CI

jihoonson · 2020-12-04T06:28:06Z

indexing-service/src/main/java/org/apache/druid/indexing/common/task/Task.java

+   * This method will not necessarily be executed before {@link #run(TaskToolbox)}, since this task readiness check
+   * may be done on a different machine from the one that actually runs the task.


Oh, I see. I think it happens only in indexers because peon calls isReady() before it runs its task.

gianm · 2020-12-08T17:00:26Z

One test left. Trying to figure out what the problem is with "Kafka index integration test with various formats". Strangely, the Indexer version works fine, but the MM version is busted.

gianm · 2020-12-08T20:01:01Z

One test left. Trying to figure out what the problem is with "Kafka index integration test with various formats". Strangely, the Indexer version works fine, but the MM version is busted.

I ran it locally and it passed. It might be flaky; I'll try running it again in CI.

* Fixes and tests related to the Indexer process. Three bugs fixed: 1) Indexers would not announce themselves as segment servers if they did not have storage locations defined. This used to work, but was broken in apache#9971. Fixed this by adding an "isSegmentServer" method to ServerType and updating SegmentLoadDropHandler to always announce if this method returns true. 2) Certain batch task types were written in a way that assumed "isReady" would be called before "run", which is not guaranteed. In particular, they relied on it in order to initialize "taskLockHelper". Fixed this by updating AbstractBatchIndexTask to ensure "isReady" is called before "run" for these tasks. 3) UnifiedIndexerAppenderatorsManager did not properly handle complex datasources. Introduced DataSourceAnalysis in order to fix this. Test changes: 1) Add a new "docker-compose.cli-indexer.yml" config that spins up an Indexer instead of a MiddleManager. 2) Introduce a "USE_INDEXER" environment variable that determines if docker-compose will start up an Indexer or a MiddleManager. 3) Duplicate all the jdk8 tests and run them in both MiddleManager and Indexer mode. 4) Various adjustments to encourage fail-fast errors in the Docker build scripts. 5) Various adjustments to speed up integration tests and reduce memory usage. 6) Add another Mac-specific approach to determining a machine's own IP. This was useful on my development machine. 7) Update segment-count check in ITCompactionTaskTest to eliminate a race condition (it was looking for 6 segments, which only exist together briefly, until the older 4 are marked unused). Javadoc updates: 1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX that make it clear when taskLockHelper will be initialized as a side effect. (Related to the second bug above.) 2) Task: Clarified that "isReady" is not guaranteed to be called before "run". It was already implied, but now it's explicit. 3) ZkCoordinator: Clarified deprecation message. 4) DataSegmentServerAnnouncer: Clarified deprecation message. * Fix stop_cluster script. * Fix sanity check in script. * Fix hashbang lines. * Test and doc adjustments. * Additional tests, and adjustments for tests. * Split ITs back out. * Revert change to druid_coordinator_period_indexingPeriod. * Set Indexer capacity to match MM. * Bump up Historical memory. * Bump down coordinator, overlord memory. * Bump up Broker memory.

gianm force-pushed the fix-indexer-stuff branch from 9e8698a to a549f5b Compare December 4, 2020 05:05

gianm force-pushed the fix-indexer-stuff branch from a549f5b to 7ae9e3c Compare December 4, 2020 05:15

jihoonson added Area - Streaming Ingestion Bug labels Dec 4, 2020

jihoonson reviewed Dec 4, 2020

View reviewed changes

jihoonson approved these changes Dec 4, 2020

View reviewed changes

gianm added 16 commits December 4, 2020 02:18

Merge branch 'master' into fix-indexer-stuff

fafa647

Fix stop_cluster script.

6dd9164

Fix sanity check in script.

b16ec71

Fix hashbang lines.

b47693c

Merge branch 'master' into fix-indexer-stuff

df43ebb

Test and doc adjustments.

df57231

Additional tests, and adjustments for tests.

c7e85c6

Merge branch 'master' into fix-indexer-stuff

bcda0cc

Split ITs back out.

12bdb4a

Revert change to druid_coordinator_period_indexingPeriod.

5154b69

Set Indexer capacity to match MM.

5804ab8

Merge branch 'master' into fix-indexer-stuff

9f64c5c

Bump up Historical memory.

c8a26c1

Bump down coordinator, overlord memory.

be5453b

Bump up Broker memory.

f21e20a

Merge branch 'master' into fix-indexer-stuff

b84ce2b

gianm merged commit 96a387d into apache:master Dec 9, 2020

gianm deleted the fix-indexer-stuff branch December 9, 2020 00:02

jihoonson added this to the 0.21.0 milestone Jan 4, 2021

jihoonson mentioned this pull request Jan 13, 2021

[Draft] 0.21.0 Release Notes #10752

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes and tests related to the Indexer process. #10631

Fixes and tests related to the Indexer process. #10631

gianm commented Dec 4, 2020 •

edited

Loading

jihoonson commented Dec 4, 2020

jihoonson Dec 4, 2020

gianm Dec 4, 2020

jihoonson Dec 4, 2020

jihoonson Dec 4, 2020

gianm Dec 4, 2020 •

edited

Loading

gianm commented Dec 4, 2020 •

edited

Loading

jihoonson left a comment

jihoonson Dec 4, 2020

gianm commented Dec 8, 2020

gianm commented Dec 8, 2020

		* This method will not necessarily be executed before {@link #run(TaskToolbox)}, since this task readiness check
		* may be done on a different machine from the one that actually runs the task.

		druid_coordinator_period_indexingPeriod=PT180000S
		druid_coordinator_period=PT1S

Fixes and tests related to the Indexer process. #10631

Fixes and tests related to the Indexer process. #10631

Conversation

gianm commented Dec 4, 2020 • edited Loading

jihoonson commented Dec 4, 2020

jihoonson Dec 4, 2020

Choose a reason for hiding this comment

gianm Dec 4, 2020

Choose a reason for hiding this comment

jihoonson Dec 4, 2020

Choose a reason for hiding this comment

jihoonson Dec 4, 2020

Choose a reason for hiding this comment

gianm Dec 4, 2020 • edited Loading

Choose a reason for hiding this comment

gianm commented Dec 4, 2020 • edited Loading

jihoonson left a comment

Choose a reason for hiding this comment

jihoonson Dec 4, 2020

Choose a reason for hiding this comment

gianm commented Dec 8, 2020

gianm commented Dec 8, 2020

gianm commented Dec 4, 2020 •

edited

Loading

gianm Dec 4, 2020 •

edited

Loading

gianm commented Dec 4, 2020 •

edited

Loading