[SPARK-49246][SQL] TableCatalog#loadTable should indicate if it's for writing #47772

cloud-fan · 2024-08-15T13:54:27Z

What changes were proposed in this pull request?

For custom catalogs that have access control, read and write permissions can be different. However, currently Spark always call TableCatalog#loadTable to look up the table, no matter it's for read or write.

This PR adds a variant of loadTable that indicates the required write privileges. All the write commands will call this new method to look up tables instead. This new method has a default implementation that just calls loadTable, so there is no breaking change.

Why are the changes needed?

allow more fine-grained access control for custom catalogs.

Does this PR introduce any user-facing change?

No

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no

yaooqinn · 2024-08-15T14:28:59Z

Speaking of table-level privileges, it's not simply rw actually,

privilege type	desc
INSERT	for insert rows.
DELETE	for deleting rows.
SELECT	for data retrieval.
UPDATE	for changing column values.
ALTER	table, column, partition meta etc

cloud-fan · 2024-08-15T14:47:41Z

@yaooqinn good point! I'll change it to def loadTable(ident, operationType) while operationType is an enum.

yaooqinn · 2024-08-15T14:57:56Z

operationType shall be a collection, these types are not always used in isolation, e.g., some operations might be upsert, or both read and write against the same table

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableWritePrivilege.java

cloud-fan · 2024-08-17T03:01:46Z

also cc @huaxingao

cloud-fan · 2024-08-17T03:21:09Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala

  final override val nodePatterns: Seq[TreePattern] = Seq(UNRESOLVED_RELATION)
 }

 object UnresolvedRelation {
+  // An internal option of `UnresolvedRelation` to specify the required write privileges when


We can add a new field to UnresolvedRelation but it may break third-party catalyst rules.

cloud-fan · 2024-08-17T03:22:12Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableWritePrivilege.java

+ *
+ * @since 4.0.0
+ */
+public enum TableWritePrivilege {


I only include write privileges as the full privileges include ALTER, REFERENCE, etc, which is not what we need for loadTable.

huaxingao · 2024-08-19T15:11:07Z

@cloud-fan Thanks for pinging me! The new TableCatalog#loadTable API looks good to me, and it also looks good from Iceberg's perspective. Also cc @aokolnychyi @szehon-ho

cloud-fan · 2024-08-21T02:53:03Z

The last commit just resolves a trivial merge conflicts, I'm merging this to master/3.5, thanks for reviewing!

… writing For custom catalogs that have access control, read and write permissions can be different. However, currently Spark always call `TableCatalog#loadTable` to look up the table, no matter it's for read or write. This PR adds a variant of `loadTable`: `loadTableForWrite`, in `TableCatalog`. All the write commands will call this new method to look up tables instead. This new method has a default implementation that just calls `loadTable`, so there is no breaking change. allow more fine-grained access control for custom catalogs. No new tests no Closes #47772 from cloud-fan/write. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b6164e6) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…be changed by falling back to v1 command ### What changes were proposed in this pull request? This is a followup of #47772 . The behavior of SaveAsTable should not be changed by switching v1 to v2 command. This is similar to #47995. For the case of `DelegatingCatalogExtension` we need it goes to V1 commands to be consistent with previous behavior. ### Why are the changes needed? Behavior regression. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT ### Was this patch authored or co-authored using generative AI tooling? No Closes #48019 from amaliujia/regress_v2. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…be changed by falling back to v1 command This is a followup of #47772 . The behavior of SaveAsTable should not be changed by switching v1 to v2 command. This is similar to #47995. For the case of `DelegatingCatalogExtension` we need it goes to V1 commands to be consistent with previous behavior. Behavior regression. No UT No Closes #48019 from amaliujia/regress_v2. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 37b39b4) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

… writing ### What changes were proposed in this pull request? For custom catalogs that have access control, read and write permissions can be different. However, currently Spark always call `TableCatalog#loadTable` to look up the table, no matter it's for read or write. This PR adds a variant of `loadTable`: `loadTableForWrite`, in `TableCatalog`. All the write commands will call this new method to look up tables instead. This new method has a default implementation that just calls `loadTable`, so there is no breaking change. ### Why are the changes needed? allow more fine-grained access control for custom catalogs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#47772 from cloud-fan/write. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…be changed by falling back to v1 command ### What changes were proposed in this pull request? This is a followup of apache#47772 . The behavior of SaveAsTable should not be changed by switching v1 to v2 command. This is similar to apache#47995. For the case of `DelegatingCatalogExtension` we need it goes to V1 commands to be consistent with previous behavior. ### Why are the changes needed? Behavior regression. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48019 from amaliujia/regress_v2. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

loadTable should indicate if it's for writing

8586259

github-actions bot added the SQL label Aug 15, 2024

cloud-fan force-pushed the write branch from a6892ef to 89ba48b Compare August 16, 2024 14:32

address comments

6bb83d5

cloud-fan force-pushed the write branch from 89ba48b to 6bb83d5 Compare August 16, 2024 18:39

cloud-fan commented Aug 17, 2024

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java Outdated Show resolved Hide resolved

cloud-fan commented Aug 17, 2024

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableWritePrivilege.java Outdated Show resolved Hide resolved

cloud-fan commented Aug 17, 2024

View reviewed changes

cloud-fan and others added 4 commits August 20, 2024 11:07

Merge remote-tracking branch 'origin/master' into write

45e442c

Apply suggestions from code review

1a352fe

Merge remote-tracking branch 'origin/master' into write

33e1788

Merge remote-tracking branch 'my/write' into write

50f32b6

cloud-fan closed this in b6164e6 Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-49246][SQL] TableCatalog#loadTable should indicate if it's for writing #47772

[SPARK-49246][SQL] TableCatalog#loadTable should indicate if it's for writing #47772

cloud-fan commented Aug 15, 2024 •

edited

Loading

yaooqinn commented Aug 15, 2024

cloud-fan commented Aug 15, 2024

yaooqinn commented Aug 15, 2024 •

edited

Loading

cloud-fan commented Aug 17, 2024

cloud-fan Aug 17, 2024

cloud-fan Aug 17, 2024

huaxingao commented Aug 19, 2024

cloud-fan commented Aug 21, 2024

[SPARK-49246][SQL] TableCatalog#loadTable should indicate if it's for writing #47772

[SPARK-49246][SQL] TableCatalog#loadTable should indicate if it's for writing #47772

Conversation

cloud-fan commented Aug 15, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

yaooqinn commented Aug 15, 2024

cloud-fan commented Aug 15, 2024

yaooqinn commented Aug 15, 2024 • edited Loading

cloud-fan commented Aug 17, 2024

cloud-fan Aug 17, 2024

Choose a reason for hiding this comment

cloud-fan Aug 17, 2024

Choose a reason for hiding this comment

huaxingao commented Aug 19, 2024

cloud-fan commented Aug 21, 2024

cloud-fan commented Aug 15, 2024 •

edited

Loading

yaooqinn commented Aug 15, 2024 •

edited

Loading