Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Spark connector reader support. #11823

Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
731a4ff
Add reader support.
JulianJaffePinterest Oct 21, 2021
cda3128
Add spark_druid_connector to .travis.yml.
JulianJaffePinterest Oct 27, 2021
24b9cbc
Add words to spelling dictionary.
JulianJaffePinterest Oct 27, 2021
087d39d
Add explicit dependencies and license notices.
JulianJaffePinterest Oct 27, 2021
7f7f251
Copy the fix for #11799.
JulianJaffePinterest Oct 27, 2021
6d28037
Include DynamicConfigProviderRegistry in the reader PR.
JulianJaffePinterest Oct 27, 2021
794a1fc
Rename package injectableValues.
JulianJaffePinterest Oct 27, 2021
459a54d
fix integration tests (#11638)
clintropolis Aug 30, 2021
25b8659
Add back overshadowing logic & add logging.
JulianJaffePinterest Oct 28, 2021
da238d1
Fix typos in spark.md.
JulianJaffePinterest Oct 28, 2021
9e07e51
Fix package class name.
JulianJaffePinterest Nov 15, 2021
6933b9f
Fix typo.
JulianJaffePinterest Nov 16, 2021
fc18e4f
Try harder to make sure we don't leak temp files.
JulianJaffePinterest Nov 16, 2021
d41a3f8
Support pushing down more filters.
JulianJaffePinterest Nov 16, 2021
76e2bd0
Refactor segment loading; address review comments.
JulianJaffePinterest Dec 1, 2021
dee3eb1
Update spell check dictionary.
JulianJaffePinterest Dec 7, 2021
d46aa0c
Temporarily print surefire reports for failing tests.
JulianJaffePinterest Dec 10, 2021
c9a44ec
Fix report directory and cat dump files as well.
JulianJaffePinterest Dec 10, 2021
ee503d7
Reorder tests.
JulianJaffePinterest Dec 10, 2021
536cdd1
Getting weirder with it.
JulianJaffePinterest Dec 10, 2021
342f861
Revert previous change, try reverting scala minor version bump.
JulianJaffePinterest Dec 10, 2021
03e7d87
Revert scala rollback; more investigatory logging.
JulianJaffePinterest Dec 13, 2021
ec46c02
Try unsplitting spark tests?
JulianJaffePinterest Dec 13, 2021
9a20abf
Revert back to last known-good commit.
JulianJaffePinterest Dec 13, 2021
2b97037
git checkout <commit> . doesn't revert new files.
JulianJaffePinterest Dec 13, 2021
cad6ebe
Revert recent reverts.
JulianJaffePinterest Dec 13, 2021
6b224dd
Possible test failure fix and more tests.
JulianJaffePinterest Dec 17, 2021
0abc33e
Pull out last change.
JulianJaffePinterest Dec 17, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ branches:
only:
- master
- /^\d+\.\d+\.\d+(-\S*)?$/ # release branches
- spark_druid_connector

language: java

Expand Down Expand Up @@ -94,7 +95,7 @@ jobs:
- sudo apt-get update && sudo apt-get install python3 python3-pip python3-setuptools -y
- ./check_test_suite.py && travis_terminate 0 || echo 'Continuing setup'
- pip3 install wheel # install wheel first explicitly
- pip3 install pyyaml
- pip3 install pyyaml==5.4.1
script:
- >
${MVN} apache-rat:check -Prat --fail-at-end
Expand Down
2 changes: 1 addition & 1 deletion distribution/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#

ARG JDK_VERSION=8
FROM maven:3-jdk-11-slim as builder
FROM maven:3.8.1-jdk-11-slim as builder
# Rebuild from source in this stage
# This can be unset if the tarball was already built outside of Docker
ARG BUILD_FROM_SOURCE="true"
Expand Down
287 changes: 287 additions & 0 deletions docs/operations/spark.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion integration-tests/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

ARG JDK_VERSION=8-slim
ARG JDK_VERSION=8-slim-buster
FROM openjdk:$JDK_VERSION as druidbase

# Bundle everything into one script so cleanup can reduce image size.
Expand Down
6 changes: 3 additions & 3 deletions integration-tests/script/docker_build_containers.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ else
case "${DRUID_INTEGRATION_TEST_JVM_RUNTIME}" in
8)
echo "Build druid-cluster with Java 8"
docker build -t druid/cluster --build-arg JDK_VERSION=8-slim --build-arg ZK_VERSION --build-arg KAFKA_VERSION --build-arg CONFLUENT_VERSION --build-arg MYSQL_VERSION --build-arg MARIA_VERSION --build-arg MYSQL_DRIVER_CLASSNAME --build-arg APACHE_ARCHIVE_MIRROR_HOST $SHARED_DIR/docker
docker build -t druid/cluster --build-arg JDK_VERSION=8-slim-buster --build-arg ZK_VERSION --build-arg KAFKA_VERSION --build-arg CONFLUENT_VERSION --build-arg MYSQL_VERSION --build-arg MARIA_VERSION --build-arg MYSQL_DRIVER_CLASSNAME --build-arg APACHE_ARCHIVE_MIRROR_HOST $SHARED_DIR/docker
;;
11)
echo "Build druid-cluster with Java 11"
docker build -t druid/cluster --build-arg JDK_VERSION=11-slim --build-arg ZK_VERSION --build-arg KAFKA_VERSION --build-arg CONFLUENT_VERSION --build-arg MYSQL_VERSION --build-arg MARIA_VERSION --build-arg MYSQL_DRIVER_CLASSNAME --build-arg APACHE_ARCHIVE_MIRROR_HOST $SHARED_DIR/docker
docker build -t druid/cluster --build-arg JDK_VERSION=11-slim-buster --build-arg ZK_VERSION --build-arg KAFKA_VERSION --build-arg CONFLUENT_VERSION --build-arg MYSQL_VERSION --build-arg MARIA_VERSION --build-arg MYSQL_DRIVER_CLASSNAME --build-arg APACHE_ARCHIVE_MIRROR_HOST $SHARED_DIR/docker
;;
15)
echo "Build druid-cluster with Java 15"
docker build -t druid/cluster --build-arg JDK_VERSION=15-slim --build-arg ZK_VERSION --build-arg KAFKA_VERSION --build-arg CONFLUENT_VERSION --build-arg MYSQL_VERSION --build-arg MARIA_VERSION --build-arg USE_MARIA --build-arg APACHE_ARCHIVE_MIRROR_HOST $SHARED_DIR/docker
docker build -t druid/cluster --build-arg JDK_VERSION=15-slim-buster --build-arg ZK_VERSION --build-arg KAFKA_VERSION --build-arg CONFLUENT_VERSION --build-arg MYSQL_VERSION --build-arg MARIA_VERSION --build-arg USE_MARIA --build-arg APACHE_ARCHIVE_MIRROR_HOST $SHARED_DIR/docker
;;
*)
echo "Invalid JVM Runtime given. Stopping"
Expand Down
75 changes: 75 additions & 0 deletions licenses.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5735,3 +5735,78 @@ copyright: Berkeley Martinez
version: 4.0.3
license_file_path: licenses/bin/warning.MIT
# Web console modules end

---

name: Apache Spark
license_category: binary
module: spark
license_name: Apache License version 2.0
version: 2.4.7
libraries:
- org.apache.spark: spark-core_2.12
- org.apache.spark: spark-sql_2.12
notice: |
Apache Spark
Copyright 2014 and onwards The Apache Software Foundation.

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).


Export Control Notice
---------------------

This distribution includes cryptographic software. The country in which you currently reside may have
restrictions on the import, possession, use, and/or re-export to another country, of encryption software.
BEFORE using any encryption software, please check your country's laws, regulations and policies concerning
the import, possession, or use, and re-export of encryption software, to see if this is permitted. See
<http://www.wassenaar.org/> for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this
software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software
using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache
Software Foundation distribution makes it eligible for export under the License Exception ENC Technology
Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for
both object code and source code.

The following provides more details on the included cryptographic software:

This software uses Apache Commons Crypto (https://commons.apache.org/proper/commons-crypto/) to
support authentication, and encryption and decryption of data sent across the network between
services.

---

name: Scala Library
license_category: binary
module: spark
license_name: Apache License version 2.0
version: 2.12.12
libraries:
- org.scala-lang: scala-library
- org.scala-lang: scala-reflect
- org.scala-lang: scalap

---

# Not sure why check-license finds these as well (they're not in mvn dependency:tree for the spark module)
name: Paranamer
license_category: binary
module: spark
license_name: BSD-3-Clause License
version: 2.8
copyright: Paul Hammant & ThoughtWorks Inc
license_file_path: licenses/bin/paranamer.BSD3
libraries:
- com.thoughtworks.paranamer: paranamer

---

name: Jackson Paranamer
license_category: binary
module: spark
license_name: Apache License version 2.0
version: 2.10.5
libraries:
- com.fasterxml.jackson.module: jackson-module-paranamer
205 changes: 201 additions & 4 deletions spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,17 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-server</artifactId>
<version>${project.parent.version}</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-processing</artifactId>
Expand All @@ -125,10 +136,6 @@
<groupId>io.netty</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.maven</groupId>
<artifactId>maven-artifact</artifactId>
</exclusion>
<exclusion>
<groupId>org.mozilla</groupId>
<artifactId>rhino</artifactId>
Expand All @@ -140,7 +147,181 @@
</exclusions>
</dependency>

<!-- Extensions included since we won't be running in a Druid cluster and can't use injection -->
<dependency>
<groupId>org.apache.druid.extensions</groupId>
<artifactId>druid-datasketches</artifactId>
<version>${project.parent.version}</version>
</dependency>
<dependency>
<groupId>org.apache.datasketches</groupId>
<artifactId>datasketches-java</artifactId>
<version>${datasketches.version}</version>
</dependency>
<dependency>
<groupId>org.apache.druid.extensions</groupId>
<artifactId>druid-histogram</artifactId>
<version>${project.parent.version}</version>
</dependency>
<dependency>
<groupId>org.apache.druid.extensions</groupId>
<artifactId>druid-stats</artifactId>
<version>${project.parent.version}</version>
</dependency>
<dependency>
<groupId>org.apache.druid.extensions</groupId>
<artifactId>mysql-metadata-storage</artifactId>
<version>${project.parent.version}</version>
</dependency>
<dependency>
<groupId>org.apache.druid.extensions</groupId>
<artifactId>postgresql-metadata-storage</artifactId>
<version>${project.parent.version}</version>
</dependency>

<!--
Excluding transitive dependencies from deep storage extensions to keep dependency size manangeable. Users
should provide the appropriate jars for their deep storage on their Spark clusters or depend on them directly
in their code.
-->
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-aws-common</artifactId>
<version>${project.parent.version}</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-gcp-common</artifactId>
<version>${project.parent.version}</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.druid.extensions</groupId>
<artifactId>druid-azure-extensions</artifactId>
<version>${project.parent.version}</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.druid.extensions</groupId>
<artifactId>druid-google-extensions</artifactId>
<version>${project.parent.version}</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.druid.extensions</groupId>
<artifactId>druid-hdfs-storage</artifactId>
<version>${project.parent.version}</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.druid.extensions</groupId>
<artifactId>druid-s3-extensions</artifactId>
<version>${project.parent.version}</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>

<!-- Deep storage direct APIs -->
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>8.6.0</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client</artifactId>
<version>${com.google.apis.client.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.http-client</groupId>
<artifactId>google-http-client-jackson2</artifactId>
<version>${com.google.apis.client.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-storage</artifactId>
<version>${com.google.apis.storage.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-core</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-ec2</artifactId>
<scope>provided</scope>
</dependency>


<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.12</artifactId>
<version>2.10.5</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
Expand Down Expand Up @@ -211,6 +392,10 @@
</exclusions>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.jdbi</groupId>
<artifactId>jdbi</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
Expand Down Expand Up @@ -301,6 +486,18 @@
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-processing</artifactId>
<version>${project.parent.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.derby</groupId>
<artifactId>derby</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.major.version}</artifactId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

org.apache.druid.spark.v2.DruidDataSourceV2
Loading