Support converting Iceberg for CONVERT TO DELTA command in Delta Lake #1463

jackierwzhang · 2022-10-27T22:27:03Z

Description

Adding support for in-place converting Iceberg to Delta using the CONVERT TO DELTA command in Apache Spark. Specifically, this PR supports converting a Parquet-based Iceberg table inside a path/directory to Delta Lake format.

Here's an example flow:

Given a Spark environment
Follow the Iceberg setup here. Please use the hadoop directory based catalog so we could find Iceberg in a path.
Suppose now we have an iceberg table sitting inside s3://bucket/catalog/db/table
Run the following command

CONVERT TO DELTA iceberg.`s3://bucket/catalog/db/table`

Now you have a Delta table at the same location!
To bring this Delta table into any Spark catalog, simply run CREATE TABLE delta_table USING delta LOCATION 's3://bucket/catalog/db/table'

See more detail in this ticket: #1462.

How was this patch tested?

New unit tests.

We have tested Iceberg version from 0.13.1 to 1.0.0.

Does this PR introduce any user-facing changes?

It introduces a iceberg-delta-compat module that contains all the Iceberg + Spark dependencies, please include this module during Spark startup so CONVERT TO DELTA command could work.

scottsand-db · 2022-10-27T22:35:57Z

@jackierwzhang

Does this PR introduce any user-facing changes?

This section should also discuss the new maven artifact your PR is proposing, and how to include it in a spark session along with delta lake.

Also - have you tested this on a real cluster (e.g. EMR)? Your build.sbt project definition is a little unlike the others. e.g. yours includes .cross(CrossVersion.binary) yet commonSettings already includes crossScalaVersions := all_scala_versions,, so I'd want to see this tested.

Also - we should add an integration test for this. This PR is already large enough so I think we can add this in a future PR.

jackierwzhang · 2022-10-27T23:03:54Z

Also - have you tested this on a real cluster (e.g. EMR)?

Yes, I have tested this on a real cluster.

Your build.sbt project definition is a little unlike the others. e.g. yours includes .cross(CrossVersion.binary) yet commonSettings already includes crossScalaVersions := all_scala_versions,, so I'd want to see this tested.

Yes, so basically this will allow the scala build to cross compile different iceberg versions against all_scala_versions as we defined. If I don't do this, the 2.13 compiler would fail (which served as a inverse-test I suppose).

we should add an integration test for this. This PR is already large enough so I think we can add this in a future PR.

Sure but imo the unit test is already a complete Spark environment with a standard Iceberg setup.

tdas · 2022-10-27T23:06:48Z

build.sbt

+lazy val deltaIcebergCompat = (project in file("delta-iceberg-compat"))
+  .dependsOn(core % "compile->compile;test->test;provided->provided")
+  .settings (
+    name := "delta-iceberg-compat",


@zsxwing do you think this is a good name?
compat or compatibility?

and should we have "spark" in the name?

scottsand-db · 2022-10-27T23:29:00Z

we should add an integration test for this. This PR is already large enough so I think we can add this in a future PR.

Sure but imo the unit test is already a complete Spark environment with a standard Iceberg setup.

Yes but this doesn't test the packaged JAR. We've caught many bugs (e.g. shading, dependencies, scala version issues, etc.) before with our integration tests.

vkorukanti

+1 on adding an integration test.

Also docs on what jars needed to be copied (end-2-end steps)

Otherwise LGTM.

core/src/main/scala/org/apache/spark/sql/delta/DeltaErrors.scala

core/src/main/scala/org/apache/spark/sql/delta/commands/ConvertToDeltaCommand.scala

core/src/main/resources/error/delta-error-classes.json

core/src/main/scala/org/apache/spark/sql/delta/commands/ConvertToDeltaCommand.scala

vkorukanti · 2022-10-31T17:44:07Z

Are there any limitations on what types/features in Iceberg supported and not supported?

jackierwzhang · 2022-10-31T19:13:49Z

Are there any limitations on what types/features in Iceberg supported and not supported?

There are a few remaining things we don't yet support:

Converting hive based iceberg table
Converting non-parquet iceberg table
Converting special partition transformation rules (ref)
Converting iceberg with custom name mapping (ref)
Converting iceberg table with case-sensitive column names (ref)

vkorukanti

lgtm

core/src/main/resources/error/delta-error-classes.json

fix

commit

491fcd5

scottsand-db requested review from tdas and vkorukanti October 27, 2022 22:37

tdas reviewed Oct 27, 2022

View reviewed changes

vkorukanti reviewed Oct 31, 2022

View reviewed changes

jackierwzhang added 4 commits October 31, 2022 12:19

cmt

da2aa6e

fix

fab36a3

retrigger test

69caa2e

fix

fee04dc

vkorukanti approved these changes Nov 1, 2022

View reviewed changes

fix

4a4f756

tdas reviewed Nov 1, 2022

View reviewed changes

core/src/main/resources/error/delta-error-classes.json Outdated Show resolved Hide resolved

tdas reviewed Nov 1, 2022

View reviewed changes

core/src/main/resources/error/delta-error-classes.json Outdated Show resolved Hide resolved

fix

4f96c10

fix

jackierwzhang force-pushed the dbr-oss-iceberg-converter branch from c204c3a to 4f96c10 Compare November 1, 2022 22:23

jackierwzhang requested a review from tdas November 1, 2022 22:25

jackierwzhang added 2 commits November 1, 2022 16:38

fix

896b229

fix dep name

b14bfa0

jackierwzhang force-pushed the dbr-oss-iceberg-converter branch from c70f1ae to b14bfa0 Compare November 2, 2022 19:14

scottsand-db closed this in 4ef5959 Nov 14, 2022

felipepessoto mentioned this pull request Nov 16, 2022

[Feature Request] Enable converting from Iceberg to Delta #1462

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support converting Iceberg for CONVERT TO DELTA command in Delta Lake #1463

Support converting Iceberg for CONVERT TO DELTA command in Delta Lake #1463

jackierwzhang commented Oct 27, 2022 •

edited

Loading

scottsand-db commented Oct 27, 2022 •

edited

Loading

jackierwzhang commented Oct 27, 2022 •

edited

Loading

tdas Oct 27, 2022

scottsand-db commented Oct 27, 2022

vkorukanti left a comment

vkorukanti commented Oct 31, 2022

jackierwzhang commented Oct 31, 2022 •

edited

Loading

vkorukanti left a comment

Support converting Iceberg for CONVERT TO DELTA command in Delta Lake #1463

Support converting Iceberg for CONVERT TO DELTA command in Delta Lake #1463

Conversation

jackierwzhang commented Oct 27, 2022 • edited Loading

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

scottsand-db commented Oct 27, 2022 • edited Loading

jackierwzhang commented Oct 27, 2022 • edited Loading

tdas Oct 27, 2022

Choose a reason for hiding this comment

scottsand-db commented Oct 27, 2022

vkorukanti left a comment

Choose a reason for hiding this comment

vkorukanti commented Oct 31, 2022

jackierwzhang commented Oct 31, 2022 • edited Loading

vkorukanti left a comment

Choose a reason for hiding this comment

jackierwzhang commented Oct 27, 2022 •

edited

Loading

scottsand-db commented Oct 27, 2022 •

edited

Loading

jackierwzhang commented Oct 27, 2022 •

edited

Loading

jackierwzhang commented Oct 31, 2022 •

edited

Loading