Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49246][SQL] TableCatalog#loadTable should indicate if it's for writing #47772

Closed
wants to merge 6 commits into from

Conversation

cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Aug 15, 2024

What changes were proposed in this pull request?

For custom catalogs that have access control, read and write permissions can be different. However, currently Spark always call TableCatalog#loadTable to look up the table, no matter it's for read or write.

This PR adds a variant of loadTable that indicates the required write privileges. All the write commands will call this new method to look up tables instead. This new method has a default implementation that just calls loadTable, so there is no breaking change.

Why are the changes needed?

allow more fine-grained access control for custom catalogs.

Does this PR introduce any user-facing change?

No

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Aug 15, 2024
@yaooqinn
Copy link
Member

Speaking of table-level privileges, it's not simply rw actually,

privilege type desc
INSERT for insert rows.
DELETE for deleting rows.
SELECT for data retrieval.
UPDATE for changing column values.
ALTER table, column, partition meta etc

@cloud-fan
Copy link
Contributor Author

@yaooqinn good point! I'll change it to def loadTable(ident, operationType) while operationType is an enum.

@yaooqinn
Copy link
Member

yaooqinn commented Aug 15, 2024

operationType shall be a collection, these types are not always used in isolation, e.g., some operations might be upsert, or both read and write against the same table

@cloud-fan
Copy link
Contributor Author

also cc @huaxingao

final override val nodePatterns: Seq[TreePattern] = Seq(UNRESOLVED_RELATION)
}

object UnresolvedRelation {
// An internal option of `UnresolvedRelation` to specify the required write privileges when
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a new field to UnresolvedRelation but it may break third-party catalyst rules.

*
* @since 4.0.0
*/
public enum TableWritePrivilege {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only include write privileges as the full privileges include ALTER, REFERENCE, etc, which is not what we need for loadTable.

@huaxingao
Copy link
Contributor

@cloud-fan Thanks for pinging me! The new TableCatalog#loadTable API looks good to me, and it also looks good from Iceberg's perspective. Also cc @aokolnychyi @szehon-ho

@cloud-fan
Copy link
Contributor Author

The last commit just resolves a trivial merge conflicts, I'm merging this to master/3.5, thanks for reviewing!

@cloud-fan cloud-fan closed this in b6164e6 Aug 21, 2024
cloud-fan added a commit that referenced this pull request Aug 21, 2024
… writing

For custom catalogs that have access control, read and write permissions can be different. However, currently Spark always call `TableCatalog#loadTable` to look up the table, no matter it's for read or write.

This PR adds a variant of `loadTable`: `loadTableForWrite`, in `TableCatalog`. All the write commands will call this new method to look up tables instead. This new method has a default implementation that just calls `loadTable`, so there is no breaking change.

allow more fine-grained access control for custom catalogs.

No

new tests

no

Closes #47772 from cloud-fan/write.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit b6164e6)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan added a commit that referenced this pull request Sep 9, 2024
…be changed by falling back to v1 command

### What changes were proposed in this pull request?

This is a followup of #47772 . The behavior of SaveAsTable should not be changed by switching v1 to v2 command. This is similar to #47995. For the case of `DelegatingCatalogExtension` we need it goes to V1 commands to be consistent with previous behavior.

### Why are the changes needed?

Behavior regression.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #48019 from amaliujia/regress_v2.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan added a commit that referenced this pull request Sep 9, 2024
…be changed by falling back to v1 command

This is a followup of #47772 . The behavior of SaveAsTable should not be changed by switching v1 to v2 command. This is similar to #47995. For the case of `DelegatingCatalogExtension` we need it goes to V1 commands to be consistent with previous behavior.

Behavior regression.

No

UT

No

Closes #48019 from amaliujia/regress_v2.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 37b39b4)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
… writing

### What changes were proposed in this pull request?

For custom catalogs that have access control, read and write permissions can be different. However, currently Spark always call `TableCatalog#loadTable` to look up the table, no matter it's for read or write.

This PR adds a variant of `loadTable`: `loadTableForWrite`, in `TableCatalog`. All the write commands will call this new method to look up tables instead. This new method has a default implementation that just calls `loadTable`, so there is no breaking change.

### Why are the changes needed?

allow more fine-grained access control for custom catalogs.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47772 from cloud-fan/write.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…be changed by falling back to v1 command

### What changes were proposed in this pull request?

This is a followup of apache#47772 . The behavior of SaveAsTable should not be changed by switching v1 to v2 command. This is similar to apache#47995. For the case of `DelegatingCatalogExtension` we need it goes to V1 commands to be consistent with previous behavior.

### Why are the changes needed?

Behavior regression.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#48019 from amaliujia/regress_v2.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants