Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Glue Catalog for Iceberg ingest extension #17392

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Shekharrajak
Copy link

@Shekharrajak Shekharrajak commented Oct 22, 2024

Fixes #17352.

Description

Release note


Key changed/added classes in this PR
  • GlueIcebergCatalog

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

private Catalog setupGlueCatalog() {
catalog = new GlueCatalog();
catalogProperties.put(CatalogProperties.WAREHOUSE_LOCATION, warehousePath);
catalog.initialize(CATALOG_NAME, catalogProperties);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalog properties must have these key value pairs

                "type" : "glue",
           	"catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog",
           	"io-impl": "org.apache.iceberg.aws.s3.S3FileIO",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warehouse path must be s3://bucket/path

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS related env variables must be available where druid cluster is running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS related env variables must be available where druid cluster is running.

Could we add more information related to this in the docs specific to the glue catalog?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will do that. Recently figured out that there is simpler approach in iceberg API itself to choose the catalog. I am spending sometime to check if that would drastically make it modular & work for all available iceberg catalog support on the fly.

@Shekharrajak
Copy link
Author

While testing I find error:

Invalid value for the field [inputSource]. Reason: [Please make sure to load all the necessary extensions and jars with type 'iceberg' on 'druid/broker' service. Could not resolve type id 'iceberg' as a subtype of `org.apache.druid.data.input.InputSource` known type ids = [combining, hdfs, http, inline, local, nil, sql] at [Source: (String)"{"type":"iceberg","tableName": "

Please let me know if anyone have faced similar error message, it is related to not able to find IcebergInputSource from the iceberg extension as subtype for input source.

@a2l007
Copy link
Contributor

a2l007 commented Oct 23, 2024

@shekhar-rajak Thank you for working on this!
Please add the extension to the broker load list, which should fix the error described.

@Shekharrajak
Copy link
Author

Please add the extension to the broker load list, which should fix the error described.

Thanks! I found that there was already druid.extensions.loadList in common.runtime.properties file and it was overriding the below line that I added :

druid.extensions.loadList=["druid-iceberg-extensions"]

After adding into the existing list. I am able to run it.

@Shekharrajak
Copy link
Author

I reallise lib folder not copyting the jars from the druid-iceberg-extension/lib which is needed at runtime . When I copied those jar then GlueCatalog was detected and able to run load iceberg table

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS Glue Catalog for Iceberg ingest extension
3 participants