Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes and tests related to the Indexer process. #10631

Merged
merged 17 commits into from
Dec 9, 2020

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Dec 4, 2020

Three bugs fixed:

  1. Indexers would not announce themselves as segment servers if they
    did not have storage locations defined. This used to work, but was
    broken in Load broadcast datasources on broker and tasks #9971. Fixed this by adding an "isSegmentServer" method
    to ServerType and updating SegmentLoadDropHandler to always announce
    if this method returns true.

  2. Certain batch task types were written in a way that assumed "isReady"
    would be called before "run", which is not guaranteed. In particular,
    they relied on it in order to initialize "taskLockHelper". Fixed this
    by updating AbstractBatchIndexTask to ensure "isReady" is called
    before "run" for these tasks.

  3. UnifiedIndexerAppenderatorsManager did not properly handle join
    datasources. Introduced DataSourceAnalysis in order to fix this.

Test changes:

  1. Add a new "docker-compose.cli-indexer.yml" config that spins up an
    Indexer instead of a MiddleManager.

  2. Introduce a "USE_INDEXER" environment variable that determines if
    docker-compose will start up an Indexer or a MiddleManager.

  3. Duplicate all the jdk8 tests and run them in both MiddleManager and
    Indexer mode.

  4. Various adjustments to encourage fail-fast errors in the Docker
    build scripts.

  5. Various adjustments to speed up integration tests and reduce memory
    usage.

  6. Add another Mac-specific approach to determining a machine's own IP.
    This was useful on my development machine.

  7. Update segment-count check in ITCompactionTaskTest to eliminate a
    race condition (it was looking for 6 segments, which only exist
    together briefly, until the older 4 are marked unused).

Javadoc updates:

  1. AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
    that make it clear when taskLockHelper will be initialized as a side
    effect. (Related to the second bug above.)

  2. Task: Clarified that "isReady" is not guaranteed to be called before
    "run". It was already implied, but now it's explicit.

  3. ZkCoordinator: Clarified deprecation message.

  4. DataSegmentServerAnnouncer: Clarified deprecation message.

Three bugs fixed:

1) Indexers would not announce themselves as segment servers if they
   did not have storage locations defined. This used to work, but was
   broken in apache#9971. Fixed this by adding an "isSegmentServer" method
   to ServerType and updating SegmentLoadDropHandler to always announce
   if this method returns true.

2) Certain batch task types were written in a way that assumed "isReady"
   would be called before "run", which is not guaranteed. In particular,
   they relied on it in order to initialize "taskLockHelper". Fixed this
   by updating AbstractBatchIndexTask to ensure "isReady" is called
   before "run" for these tasks.

3) UnifiedIndexerAppenderatorsManager did not properly handle complex
   datasources. Introduced DataSourceAnalysis in order to fix this.

Test changes:

1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
   Indexer instead of a MiddleManager.

2) Introduce a "USE_INDEXER" environment variable that determines if
   docker-compose will start up an Indexer or a MiddleManager.

3) Duplicate all the jdk8 tests and run them in both MiddleManager and
   Indexer mode.

4) Various adjustments to encourage fail-fast errors in the Docker
   build scripts.

5) Various adjustments to speed up integration tests and reduce memory
   usage.

6) Add another Mac-specific approach to determining a machine's own IP.
   This was useful on my development machine.

7) Update segment-count check in ITCompactionTaskTest to eliminate a
   race condition (it was looking for 6 segments, which only exist
   together briefly, until the older 4 are marked unused).

Javadoc updates:

1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
   that make it clear when taskLockHelper will be initialized as a side
   effect. (Related to the second bug above.)

2) Task: Clarified that "isReady" is not guaranteed to be called before
   "run". It was already implied, but now it's explicit.

3) ZkCoordinator: Clarified deprecation message.

4) DataSegmentServerAnnouncer: Clarified deprecation message.
@jihoonson
Copy link
Contributor

#10538 and #10258 will be fixed by this PR. Perhaps #9820 as well.

Comment on lines +165 to +166
* This method will not necessarily be executed before {@link #run(TaskToolbox)}, since this task readiness check
* may be done on a different machine from the one that actually runs the task.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how can this happen? I thought Task.isReady() is always called before Task.run() because a task can be scheduled only when Task.isReady() returns true. Task.run() will be called after the task is scheduled in some indexer or middleManager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't necessarily be called on that same Task object (it might be called on a different instance that represents the same actual task).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I think it happens only in indexers because peon calls isReady() before it runs its task.

druid_coordinator_period_indexingPeriod=PT180000S
druid_coordinator_period=PT1S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this too aggressive? 😅

Copy link
Contributor Author

@gianm gianm Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha ha 🙂

I think it's OK — there are going to be very few segments in the integration tests, so a coordinator run should be able to finish very quickly. This change was meant to speed up the tests.

@gianm
Copy link
Contributor Author

gianm commented Dec 4, 2020

#10538 and #10258 will be fixed by this PR. Perhaps #9820 as well.

I agree, it should fix all of those. I commented in those issues.

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 after CI

Comment on lines +165 to +166
* This method will not necessarily be executed before {@link #run(TaskToolbox)}, since this task readiness check
* may be done on a different machine from the one that actually runs the task.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I think it happens only in indexers because peon calls isReady() before it runs its task.

@gianm
Copy link
Contributor Author

gianm commented Dec 8, 2020

One test left. Trying to figure out what the problem is with "Kafka index integration test with various formats". Strangely, the Indexer version works fine, but the MM version is busted.

@gianm
Copy link
Contributor Author

gianm commented Dec 8, 2020

One test left. Trying to figure out what the problem is with "Kafka index integration test with various formats". Strangely, the Indexer version works fine, but the MM version is busted.

I ran it locally and it passed. It might be flaky; I'll try running it again in CI.

@gianm gianm merged commit 96a387d into apache:master Dec 9, 2020
@gianm gianm deleted the fix-indexer-stuff branch December 9, 2020 00:02
@jihoonson jihoonson added this to the 0.21.0 milestone Jan 4, 2021
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
* Fixes and tests related to the Indexer process.

Three bugs fixed:

1) Indexers would not announce themselves as segment servers if they
   did not have storage locations defined. This used to work, but was
   broken in apache#9971. Fixed this by adding an "isSegmentServer" method
   to ServerType and updating SegmentLoadDropHandler to always announce
   if this method returns true.

2) Certain batch task types were written in a way that assumed "isReady"
   would be called before "run", which is not guaranteed. In particular,
   they relied on it in order to initialize "taskLockHelper". Fixed this
   by updating AbstractBatchIndexTask to ensure "isReady" is called
   before "run" for these tasks.

3) UnifiedIndexerAppenderatorsManager did not properly handle complex
   datasources. Introduced DataSourceAnalysis in order to fix this.

Test changes:

1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
   Indexer instead of a MiddleManager.

2) Introduce a "USE_INDEXER" environment variable that determines if
   docker-compose will start up an Indexer or a MiddleManager.

3) Duplicate all the jdk8 tests and run them in both MiddleManager and
   Indexer mode.

4) Various adjustments to encourage fail-fast errors in the Docker
   build scripts.

5) Various adjustments to speed up integration tests and reduce memory
   usage.

6) Add another Mac-specific approach to determining a machine's own IP.
   This was useful on my development machine.

7) Update segment-count check in ITCompactionTaskTest to eliminate a
   race condition (it was looking for 6 segments, which only exist
   together briefly, until the older 4 are marked unused).

Javadoc updates:

1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
   that make it clear when taskLockHelper will be initialized as a side
   effect. (Related to the second bug above.)

2) Task: Clarified that "isReady" is not guaranteed to be called before
   "run". It was already implied, but now it's explicit.

3) ZkCoordinator: Clarified deprecation message.

4) DataSegmentServerAnnouncer: Clarified deprecation message.

* Fix stop_cluster script.

* Fix sanity check in script.

* Fix hashbang lines.

* Test and doc adjustments.

* Additional tests, and adjustments for tests.

* Split ITs back out.

* Revert change to druid_coordinator_period_indexingPeriod.

* Set Indexer capacity to match MM.

* Bump up Historical memory.

* Bump down coordinator, overlord memory.

* Bump up Broker memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants