Skip to content

Commit

Permalink
Update from master-for-merge-to-release (apache#318)
Browse files Browse the repository at this point in the history
* IMPLY-6556 remove offending settings.xml for intellij inspections

* GCS lookup support (apache#11026)

* GCS lookup support

* checkstyle fix

* review comments

* review comments

* remove unused import

* remove experimental from Kinesis with caveats (apache#10998)

* remove experimental from Kinesis with caveats

* add suggested known issue

* spelling fixes

* Bump aliyun SDK to 3.11.3 (apache#11044)

* Update reset-cluster.md (apache#10990)

fixed Error: Could not find or load main class org.apache.druid.cli.Main

* Make imply-view-manager non-experimental (apache#316)

* Make druid.indexer.task.ignoreTimestampSpecForDruidInputSource default to true, for backwards compat (apache#315)

* Add explicit EOF and use assert instead of exception (apache#11041)

* Add Calcite Avatica protobuf handler (apache#10543)

* bump to latest of same version node and npm versions, bump frontend-maven-plugin (apache#11057)

* request logs through kafka emitter (apache#11036)

* request logs through kafka emitter

* travis fixes

* review comments

* kafka emitter unit test

* new line

* travis checks

* checkstyle fix

* count request lost when request topic is null

* IMPLY-6556 map local repository instead .m2

* remove outdated info from faq (apache#11053)

* remove outdated info from faq

* Add an option for ingestion task to drop (mark unused) all existing segments that are contained by interval in the ingestionSpec (apache#11025)

* Auto-Compaction can run indefinitely when segmentGranularity is changed from coarser to finer.

* Add option to drop segments after ingestion

* fix checkstyle

* add tests

* add tests

* add tests

* fix test

* add tests

* fix checkstyle

* fix checkstyle

* add docs

* fix docs

* address comments

* address comments

* fix spelling

* Allow list for JDBC connection properties to address CVE-2021-26919 (apache#11047)

* Allow list for JDBC connection properties to address CVE-2021-26919

* fix tests for java 11

* Fix compile issue from dropExisting in ingest-service (apache#320)

Co-authored-by: Slava Mogilevsky <triggerwoods91@gmail.com>
Co-authored-by: Parag Jain <pjain1@apache.org>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: frank chen <frank.chen021@outlook.com>
Co-authored-by: Tushar Raj <43772524+tushar-1728@users.noreply.github.com>
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Lasse Krogh Mammen <lkm@bookboon.com>
Co-authored-by: Clint Wylie <cwylie@apache.org>
Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>
  • Loading branch information
12 people authored Apr 2, 2021
1 parent a191871 commit 9603203
Show file tree
Hide file tree
Showing 126 changed files with 4,832 additions and 333 deletions.
6 changes: 6 additions & 0 deletions codestyle/spotbugs-exclude.xml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@
</Or>
</And>
</Match>
<Match>
<And>
<Bug pattern="SE_BAD_FIELD_STORE" />
<Class name="org.apache.druid.server.AsyncQueryForwardingServlet" />
</And>
</Match>

<Bug pattern="AT_OPERATION_SEQUENCE_ON_CONCURRENT_ABSTRACTION"/>
<Bug pattern="BC_UNCONFIRMED_CAST"/>
Expand Down
2 changes: 2 additions & 0 deletions core/src/main/antlr4/org/apache/druid/math/expr/antlr/Expr.g4
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@

grammar Expr;

start : expr EOF;

expr : NULL # null
| ('-'|'!') expr # unaryOpExpr
|<assoc=right> expr '^' expr # powOpExpr
Expand Down
63 changes: 63 additions & 0 deletions core/src/main/java/org/apache/druid/utils/ConnectionUriUtils.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.druid.utils;

import com.google.common.base.Preconditions;

import java.util.Set;

public final class ConnectionUriUtils
{
// Note: MySQL JDBC connector 8 supports 7 other protocols than just `jdbc:mysql:`
// (https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-jdbc-url-format.html).
// We should consider either expanding recognized mysql protocols or restricting allowed protocols to
// just a basic one.
public static final String MYSQL_PREFIX = "jdbc:mysql:";
public static final String POSTGRES_PREFIX = "jdbc:postgresql:";

/**
* This method checks {@param actualProperties} against {@param allowedProperties} if they are not system properties.
* A property is regarded as a system property if its name starts with a prefix in {@param systemPropertyPrefixes}.
* See org.apache.druid.server.initialization.JDBCAccessSecurityConfig for more details.
*
* If a non-system property that is not allowed is found, this method throws an {@link IllegalArgumentException}.
*/
public static void throwIfPropertiesAreNotAllowed(
Set<String> actualProperties,
Set<String> systemPropertyPrefixes,
Set<String> allowedProperties
)
{
for (String property : actualProperties) {
if (systemPropertyPrefixes.stream().noneMatch(property::startsWith)) {
Preconditions.checkArgument(
allowedProperties.contains(property),
"The property [%s] is not in the allowed list %s",
property,
allowedProperties
);
}
}
}

private ConnectionUriUtils()
{
}
}
40 changes: 40 additions & 0 deletions core/src/main/java/org/apache/druid/utils/Throwables.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.druid.utils;

public final class Throwables
{
public static boolean isThrowable(Throwable t, Class<? extends Throwable> searchFor)
{
if (t.getClass().isAssignableFrom(searchFor)) {
return true;
} else {
if (t.getCause() != null) {
return isThrowable(t.getCause(), searchFor);
} else {
return false;
}
}
}

private Throwables()
{
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.druid.utils;

import com.google.common.collect.ImmutableSet;
import org.junit.Rule;
import org.junit.Test;
import org.junit.experimental.runners.Enclosed;
import org.junit.rules.ExpectedException;
import org.junit.runner.RunWith;

@RunWith(Enclosed.class)
public class ConnectionUriUtilsTest
{
public static class ThrowIfURLHasNotAllowedPropertiesTest
{
@Rule
public ExpectedException expectedException = ExpectedException.none();

@Test
public void testEmptyActualProperties()
{
ConnectionUriUtils.throwIfPropertiesAreNotAllowed(
ImmutableSet.of(),
ImmutableSet.of("valid_key1", "valid_key2"),
ImmutableSet.of("system_key1", "system_key2")
);
}

@Test
public void testThrowForNonAllowedProperties()
{
expectedException.expect(IllegalArgumentException.class);
expectedException.expectMessage("The property [invalid_key] is not in the allowed list [valid_key1, valid_key2]");

ConnectionUriUtils.throwIfPropertiesAreNotAllowed(
ImmutableSet.of("valid_key1", "invalid_key"),
ImmutableSet.of("system_key1", "system_key2"),
ImmutableSet.of("valid_key1", "valid_key2")
);
}

@Test
public void testAllowedProperties()
{
ConnectionUriUtils.throwIfPropertiesAreNotAllowed(
ImmutableSet.of("valid_key2"),
ImmutableSet.of("system_key1", "system_key2"),
ImmutableSet.of("valid_key1", "valid_key2")
);
}

@Test
public void testAllowSystemProperties()
{
ConnectionUriUtils.throwIfPropertiesAreNotAllowed(
ImmutableSet.of("system_key1", "valid_key2"),
ImmutableSet.of("system_key1", "system_key2"),
ImmutableSet.of("valid_key1", "valid_key2")
);
}

@Test
public void testMatchSystemProperties()
{
ConnectionUriUtils.throwIfPropertiesAreNotAllowed(
ImmutableSet.of("system_key1.1", "system_key1.5", "system_key11.11", "valid_key2"),
ImmutableSet.of("system_key1", "system_key2"),
ImmutableSet.of("valid_key1", "valid_key2")
);
}
}
}
48 changes: 48 additions & 0 deletions core/src/test/java/org/apache/druid/utils/ThrowablesTest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.druid.utils;

import org.junit.Assert;
import org.junit.Test;

public class ThrowablesTest
{
@Test
public void testIsThrowableItself()
{
Assert.assertTrue(Throwables.isThrowable(new NoClassDefFoundError(), NoClassDefFoundError.class));
}

@Test
public void testIsThrowableNestedThrowable()
{
Assert.assertTrue(
Throwables.isThrowable(new RuntimeException(new NoClassDefFoundError()), NoClassDefFoundError.class)
);
}

@Test
public void testIsThrowableNonTarget()
{
Assert.assertFalse(
Throwables.isThrowable(new RuntimeException(new ClassNotFoundException()), NoClassDefFoundError.class)
);
}
}
4 changes: 2 additions & 2 deletions distribution/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,8 @@
<argument>-c</argument>
<argument>org.apache.druid.extensions:imply-druid-security</argument>
<argument>-c</argument>
<argument>io.imply:imply-view-manager</argument>
<argument>-c</argument>
<argument>io.imply:clarity-emitter</argument>
<argument>-c</argument>
<argument>io.imply:clarity-emitter-kafka</argument>
Expand Down Expand Up @@ -437,8 +439,6 @@
<argument>-l</argument>
<argument>${settings.localRepository}</argument>
<argument>--no-default-hadoop</argument>
<argument>-c</argument>
<argument>io.imply:imply-view-manager</argument>
</arguments>
</configuration>
</execution>
Expand Down
19 changes: 19 additions & 0 deletions docs/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -537,6 +537,25 @@ the [HTTP input source](../ingestion/native-batch.md#http-input-source) and the
|`druid.ingestion.http.allowedProtocols`|List of protocols|Allowed protocols for the HTTP input source and HTTP firehose.|["http", "https"]|


### Ingestion Security Configuration

#### JDBC Connections to External Databases

You can use the following properties to specify permissible JDBC options for:
- [SQL input source](../ingestion/native-batch.md#sql-input-source)
- [SQL firehose](../ingestion/native-batch.md#sqlfirehose),
- [globally cached JDBC lookups](../development/extensions-core/lookups-cached-global.md#jdbc-lookup)
- [JDBC Data Fetcher for per-lookup caching](../development/extensions-core/druid-lookups.md#data-fetcher-layer).

These properties do not apply to metadata storage connections.

|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|
|`druid.access.jdbc.enforceAllowedProperties`|Boolean|When true, Druid applies `druid.access.jdbc.allowedProperties` to JDBC connections starting with `jdbc:postgresql:` or `jdbc:mysql:`. When false, Druid allows any kind of JDBC connections without JDBC property validation. This config is deprecated and will be removed in a future release.|false|
|`druid.access.jdbc.allowedProperties`|List of JDBC properties|Defines a list of allowed JDBC properties. Druid always enforces the list for all JDBC connections starting with `jdbc:postgresql:` or `jdbc:mysql:` if `druid.access.jdbc.enforceAllowedProperties` is set to true.<br/><br/>This option is tested against MySQL connector 5.1.48 and PostgreSQL connector 42.2.14. Other connector versions might not work.|["useSSL", "requireSSL", "ssl", "sslmode"]|
|`druid.access.jdbc.allowUnknownJdbcUrlFormat`|Boolean|When false, Druid only accepts JDBC connections starting with `jdbc:postgresql:` or `jdbc:mysql:`. When true, Druid allows JDBC connections to any kind of database, but only enforces `druid.access.jdbc.allowedProperties` for PostgreSQL and MySQL.|true|


### Task Logging

If you are running the indexing service in remote mode, the task logs must be stored in S3, Azure Blob Store, Google Cloud Storage or HDFS.
Expand Down
1 change: 1 addition & 0 deletions docs/development/extensions-contrib/kafka-emitter.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ All the configuration parameters for the Kafka emitter are under `druid.emitter.
|`druid.emitter.kafka.bootstrap.servers`|Comma-separated Kafka broker. (`[hostname:port],[hostname:port]...`)|yes|none|
|`druid.emitter.kafka.metric.topic`|Kafka topic name for emitter's target to emit service metric.|yes|none|
|`druid.emitter.kafka.alert.topic`|Kafka topic name for emitter's target to emit alert.|yes|none|
|`druid.emitter.kafka.request.topic`|Kafka topic name for emitter's target to emit request logs. If left empty then request logs will not be sent to the Kafka topic.|no|none|
|`druid.emitter.kafka.producer.config`|JSON formatted configuration which user want to set additional properties to Kafka producer.|no|none|
|`druid.emitter.kafka.clusterName`|Optional value to specify name of your druid cluster. It can help make groups in your monitoring environment. |no|none|

Expand Down
16 changes: 13 additions & 3 deletions docs/development/extensions-core/druid-lookups.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Same for Loading cache, developer can implement a new type of loading cache by i

|Field|Type|Description|Required|default|
|-----|----|-----------|--------|-------|
|dataFetcher|JSON object|Specifies the lookup data fetcher type to use in order to fetch data|yes|null|
|dataFetcher|JSON object|Specifies the lookup data fetcher type for fetching data|yes|null|
|cacheFactory|JSON Object|Cache factory implementation|no |onHeapPolling|
|pollPeriod|Period|polling period |no |null (poll once)|

Expand Down Expand Up @@ -129,7 +129,7 @@ Guava cache configuration spec.
"type":"loadingLookup",
"dataFetcher":{ "type":"jdbcDataFetcher", "connectorConfig":"jdbc://mysql://localhost:3306/my_data_base", "table":"lookup_table_name", "keyColumn":"key_column_name", "valueColumn": "value_column_name"},
"loadingCacheSpec":{"type":"guava"},
"reverseLoadingCacheSpec":{"type":"guava", "maximumSize":500000, "expireAfterAccess":100000, "expireAfterAccess":10000}
"reverseLoadingCacheSpec":{"type":"guava", "maximumSize":500000, "expireAfterAccess":100000, "expireAfterWrite":10000}
}
```

Expand All @@ -150,6 +150,16 @@ Off heap cache is backed by [MapDB](http://www.mapdb.org/) implementation. MapDB
"type":"loadingLookup",
"dataFetcher":{ "type":"jdbcDataFetcher", "connectorConfig":"jdbc://mysql://localhost:3306/my_data_base", "table":"lookup_table_name", "keyColumn":"key_column_name", "valueColumn": "value_column_name"},
"loadingCacheSpec":{"type":"mapDb", "maxEntriesSize":100000},
"reverseLoadingCacheSpec":{"type":"mapDb", "maxStoreSize":5, "expireAfterAccess":100000, "expireAfterAccess":10000}
"reverseLoadingCacheSpec":{"type":"mapDb", "maxStoreSize":5, "expireAfterAccess":100000, "expireAfterWrite":10000}
}
```

### JDBC Data Fetcher

|Field|Type|Description|Required|default|
|-----|----|-----------|--------|-------|
|`connectorConfig`|JSON object|Specifies the database connection details. You can set `connectURI`, `user` and `password`. You can selectively allow JDBC properties in `connectURI`. See [JDBC connections security config](../../configuration/index.md#jdbc-connections-to-external-databases) for more details.|yes||
|`table`|string|The table name to read from.|yes||
|`keyColumn`|string|The column name that contains the lookup key.|yes||
|`valueColumn`|string|The column name that contains the lookup value.|yes||
|`streamingFetchSize`|int|Fetch size used in JDBC connections.|no|1000|
20 changes: 15 additions & 5 deletions docs/development/extensions-core/kinesis-ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,15 @@ sidebar_label: "Amazon Kinesis"
-->


Similar to the [Kafka indexing service](./kafka-ingestion.md), the Kinesis indexing service enables the configuration of *supervisors* on the Overlord, which facilitate ingestion from
Kinesis by managing the creation and lifetime of Kinesis indexing tasks. These indexing tasks read events using Kinesis's own
Similar to the [Kafka indexing service](./kafka-ingestion.md), the Kinesis indexing service for Apache Druid enables the configuration of *supervisors* on the Overlord. These supervisors facilitate ingestion from Kinesis by managing the creation and lifetime of Kinesis indexing tasks. These indexing tasks read events using Kinesis's own
Shards and Sequence Number mechanism and are therefore able to provide guarantees of exactly-once ingestion.
The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures,
and ensure that the scalability and replication requirements are maintained.

The Kinesis indexing service is provided as the `druid-kinesis-indexing-service` core Apache Druid extension (see
[Including Extensions](../../development/extensions.md#loading-extensions)). Please note that this is
currently designated as an *experimental feature* and is subject to the usual
[experimental caveats](../experimental.md).
[Including Extensions](../../development/extensions.md#loading-extensions)).

> Before you deploy the Kinesis extension to production, read the [Kinesis known issues](#kinesis-known-issues).
## Submitting a Supervisor Spec

Expand Down Expand Up @@ -471,3 +470,14 @@ with an assignment of closed shards that have been fully read and to ensure a ba
This window with early task shutdowns and possible task failures will conclude when:
- All closed shards have been fully read and the Kinesis ingestion tasks have published the data from those shards, committing the "closed" state to metadata storage
- Any remaining tasks that had inactive shards in the assignment have been shutdown (these tasks would have been created before the closed shards were completely drained)

## Kinesis known issues

Before you deploy the Kinesis extension to production, consider the following known issues:

- Avoid implementing more than one Kinesis supervisor that read from the same Kinesis stream for ingestion. Kinesis has a per-shard read throughput limit and having multiple supervisors on the same stream can reduce available read throughput for an individual Supervisor's tasks. Additionally, multiple Supervisors ingesting to the same Druid Datasource can cause increased contention for locks on the Datasource.
- The only way to change the stream reset policy is to submit a new ingestion spec and set up a new supervisor.
- Timeouts for retrieving earliest sequence number will cause a reset of the supervisor. The job will resume own its own eventually, but it can trigger alerts.
- The Kinesis supervisor will not make progress if you have empty shards. Make sure you have at least 1 record in the shard.
- If ingestion tasks get stuck, the supervisor does not automatically recover. You should monitor ingestion tasks and investigate if your ingestion falls behind.
- A Kinesis supervisor can sometimes compare the checkpoint offset to retention window of the stream to see if it has fallen behind. These checks fetch the earliest sequence number for Kinesis which can result in `IteratorAgeMilliseconds` becoming very high in AWS CloudWatch.
Loading

0 comments on commit 9603203

Please sign in to comment.