-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock that can occur while merging group by results #15420
Merged
LakshSingla
merged 36 commits into
apache:master
from
LakshSingla:merge-buffer-deadlock
Apr 22, 2024
Merged
Changes from 5 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
de216d5
init commit
LakshSingla 35bf551
test cases fix
LakshSingla 09689e9
test cases fix 2
LakshSingla e85f4b0
test cases fix 3
LakshSingla ce89fdf
test cases fix 4
LakshSingla a34bcde
caching clustered client removes the flag before sending the requests
LakshSingla a4f476d
fix test
LakshSingla fe32796
add comments
LakshSingla 6ecd555
Merge branch 'master' into merge-buffer-deadlock
LakshSingla b168ce9
init 2
LakshSingla 6cd8aa0
some tests fix
LakshSingla ac44edb
merge and fix build
LakshSingla d2886bc
tests
LakshSingla ceef4f1
Merge branch 'master' into merge-buffer-deadlock
LakshSingla 75aee0a
compilation
LakshSingla 951dce2
compilation benchmarks
LakshSingla 3c1b124
compilation benchmarks, responseContext changes revert
LakshSingla 8c72a92
DumpSegment test fix
LakshSingla bcfabf8
compilation and tests
LakshSingla 3c7fd81
codeql and test
LakshSingla e110662
histogram test
LakshSingla 14f5bab
distinctcount test
LakshSingla d956e73
groupByTimeseriesRunnerTest
LakshSingla bf57bcc
fix issue with unions
LakshSingla 2c3e675
unit test fix
LakshSingla dc2aa53
test fix
LakshSingla 430f2e5
fix it util to properly display error message
LakshSingla 76bc024
new runner
LakshSingla 349e8d9
Merge branch 'master' into merge-buffer-deadlock
LakshSingla d26f7fc
comments, test cases
LakshSingla b3850df
Merge branch 'master' into merge-buffer-deadlock
LakshSingla 8f255d7
checkstyle
LakshSingla 6980d77
query resource id
LakshSingla e2da72a
review comments
LakshSingla 9271043
comments, tests
LakshSingla b8eada8
review final
LakshSingla File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
121 changes: 121 additions & 0 deletions
121
processing/src/main/java/org/apache/druid/query/QueryResourceId.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.apache.druid.query; | ||
|
||
import com.google.common.base.Preconditions; | ||
|
||
import java.util.Objects; | ||
|
||
/** | ||
* Wrapper class on the queryResourceId string. The object must be addressable on an associative map, therefore it must implement | ||
* equals and hashCode. | ||
* <p> | ||
* Query's resource id is used to allocate the resources, and identify the resources allocated to a query in a global pool. | ||
* Queries USUALLY do not share any resources - each query is assigned its own thread, and buffer pool. However, some resources | ||
* are shared globally - the GroupBy query's merge buffers being a prime example of those (and the primary utiliser of the | ||
* query's resource id). Such resources MUST be allocated once to prevent deadlocks, and can be used throughout the query stack, till | ||
* the query holds those resources, or till its completion. A query holding a global resources must not request for more of the same | ||
* resource, or else it becomes a candidate for deadlocks. | ||
* <p> | ||
* Each query has a unique resource id, that is assigned to it when it enters the queryable server. This is distinct from | ||
* the existing queryId, subqueryId and sqlQueryId in the following ways: | ||
* 1. It is not assigned by the user, it is assigned internally for usage by the Druid server | ||
* 2. The query's resource id will be unique to the query in the system. The queryId can be non-unique amongst the queries | ||
* that are running in the system. Druid must ensure that the queryResourceId isn't unique. If the user (somehow) | ||
* assigns the queryResourceId to the query, it must be overwritten internally. | ||
* 3. During the query server <-> data server communication, the queryResourceId assigned to a particular query can (and will) | ||
* differ in the query servers and the data servers. This is particularly helpful in case of union queries, where a | ||
* single query in the broker can be treated as two separate queries and executed simultaneously in the historicals. | ||
* <p> | ||
* The queryId is assigned to the query, and populated in the query context at the time it hits the queryable server. In Druid, | ||
* there are three queryable servers (classes are not linkable from this method): | ||
* 1. {@link org.apache.druid.server.ClientQuerySegmentWalker} - For brokers | ||
* 2. {@link org.apache.druid.server.coordination.ServerManager} - For historicals | ||
* 3. {@link org.apache.druid.segment.realtime.appenderator.SinkQuerySegmentWalker} - For peons & indexer's tasks | ||
* <p> | ||
* These three classes are one of the first places the query reaches when it begins processing, therefore it is | ||
* guaranteed that if the resource id is allotted at only these places, no one will overwrite the resource id | ||
* during the execution. | ||
* <p> | ||
* Note: Historicals and Peons could have used the same query id allotted by the brokers, however they assign their own because: | ||
* 1. The user can directly choose to query the data server (while debugging etc.) | ||
* 2. UNIONs are treated as multiple separate queries when the broker sends them to the historicals. Therefore, we | ||
* require a unique id for each part of the union, and hence we need to reassign the resource id to the query's part, | ||
* or else they'll end up sharing the same resource id, as mentioned before | ||
* <p> | ||
* Notable places where QueryResourceId is used: | ||
* <p> | ||
* 1. {@link org.apache.druid.query.groupby.GroupByResourcesReservationPool} Primary user of the query resource id. | ||
* <p> | ||
* 2. {@link org.apache.druid.server.ClientQuerySegmentWalker} Allocates the query resource id on the brokers | ||
* <p> | ||
* 3. {@link org.apache.druid.server.coordination.ServerManager} Allocates the query resource id on the historicals | ||
* <p> | ||
* 4. {@link org.apache.druid.segment.realtime.appenderator.SinkQuerySegmentWalker} Allocates the query resource id on the peons | ||
* (MMs) and indexers | ||
* <p> | ||
* 5. {@link org.apache.druid.server.ResourceIdPopulatingQueryRunner} Populates the query resource id. ({@link org.apache.druid.server.ClientQuerySegmentWalker} | ||
* allocates the query resource id directly, since it also does a bunch of transforms to the query) | ||
* <p> | ||
* 6. {@link org.apache.druid.query.groupby.GroupByQueryQueryToolChest} Allocates, and associates one of the global resources, | ||
* merge buffers, with the query's resource id. It also cleans it up, once the query is completed. Apart from that, | ||
* it is also a consumer of the merge buffers it allocates. | ||
* <p> | ||
* 7. {@link org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunner} One of the consumer of the merge buffers, | ||
* allocated at the beginning of the query | ||
* | ||
* @see org.apache.druid.query.groupby.GroupByResourcesReservationPool | ||
*/ | ||
public class QueryResourceId | ||
{ | ||
private final String queryResourceId; | ||
|
||
public QueryResourceId(String queryResourceId) | ||
{ | ||
this.queryResourceId = Preconditions.checkNotNull(queryResourceId, "queryResourceId must be present"); | ||
} | ||
|
||
@Override | ||
public boolean equals(Object o) | ||
{ | ||
if (this == o) { | ||
return true; | ||
} | ||
if (o == null || getClass() != o.getClass()) { | ||
return false; | ||
} | ||
QueryResourceId that = (QueryResourceId) o; | ||
return Objects.equals(queryResourceId, that.queryResourceId); | ||
} | ||
|
||
@Override | ||
public int hashCode() | ||
{ | ||
return Objects.hash(queryResourceId); | ||
} | ||
|
||
@Override | ||
public String toString() | ||
{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Simpler, and will make interpolations nicer, to just do |
||
return "QueryResourceId{" + | ||
"queryResourceId='" + queryResourceId + '\'' + | ||
'}'; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
55 changes: 55 additions & 0 deletions
55
processing/src/test/java/org/apache/druid/query/QueryResourceIdTest.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.apache.druid.query; | ||
|
||
import nl.jqno.equalsverifier.EqualsVerifier; | ||
import org.junit.Assert; | ||
import org.junit.Test; | ||
|
||
import java.util.HashMap; | ||
import java.util.Map; | ||
|
||
public class QueryResourceIdTest | ||
{ | ||
|
||
@Test | ||
public void testConstructorWithNullString() | ||
{ | ||
Assert.assertThrows(NullPointerException.class, () -> new QueryResourceId(null)); | ||
} | ||
|
||
@Test | ||
public void testEqualsAndHashCode() | ||
{ | ||
EqualsVerifier.forClass(QueryResourceId.class) | ||
.withNonnullFields("queryResourceId") | ||
.usingGetClass() | ||
.verify(); | ||
} | ||
|
||
@Test | ||
public void testAddressableOnAssociativeMap() | ||
{ | ||
Map<QueryResourceId, Integer> map = new HashMap<>(); | ||
map.put(new QueryResourceId("abc"), 1); | ||
Assert.assertEquals(1, (int) map.get(new QueryResourceId("abc"))); | ||
|
||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc for what
willMergeRunner
means. I recognize most other stuff in here doesn't have javadocs, but, still.