{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":182849188,"defaultBranch":"master","name":"delta","ownerLogin":"delta-io","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2019-04-22T18:56:51.000Z","ownerAvatar":"https://github.com/avatars/u/49767398?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1721174201.0","currentOid":""},"activityList":{"items":[{"before":"93ad94f0bf45bca4c7f674c5b015e0f4b8fc9daa","after":"f8d7d76a272a0bb5c86cfbbcb4f19fe904010ac2","ref":"refs/heads/master","pushedAt":"2024-08-06T18:59:38.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] ALTER TABLE ALTER COLUMN SYNC IDENTITY SQL support (#3005)\n\n## Description\r\nThis PR is part of https://github.com/delta-io/delta/issues/1959\r\nIn this PR, we add SQL support for `ALTER TABLE ALTER COLUMN SYNC\r\nIDENTITY`.\r\n\r\nThis is used for GENERATED BY DEFAULT Identity Columns, where a user may\r\nwant to manually update the identity column high watermark.\r\n\r\n## How was this patch tested?\r\nThis PR adds a new test suite `IdentityColumnSyncSuite`.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\nYes. We introduce the SQL syntax `ALTER TABLE (ALTER| CHANGE) COLUMN?\r\n SYNC IDENTITY` into Delta. This will update the high watermark\r\nstored in the metadata for that specific identity column.\r\n**Example Usage**\r\n```\r\nALTER TABLE ALTER COLUMN id SYNC IDENTITY\r\nALTER TABLE CHANGE COLUMN id SYNC IDENTITY\r\nALTER TABLE ALTER id SYNC IDENTITY\r\nALTER TABLE CHANGE id SYNC IDENTITY\r\n```\r\n\r\n---------\r\n\r\nCo-authored-by: zhipeng.mao \r\nCo-authored-by: Thang Long Vu <107926660+longvu-db@users.noreply.github.com>","shortMessageHtmlLink":"[Spark] ALTER TABLE ALTER COLUMN SYNC IDENTITY SQL support (#3005)"}},{"before":"c1f42375eaf99c61dc4614fc7352c3af3cd7e874","after":"93ad94f0bf45bca4c7f674c5b015e0f4b8fc9daa","ref":"refs/heads/master","pushedAt":"2024-08-06T18:29:58.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[UNIFORM] Remove timestamp partition patch as it is not effective currently (#3486)\n\n## Description\r\nDelta-Iceberg uniform currently does not support partitioning on\r\ntimestamps. There was originally patch intended to address that from the\r\nbeginning of the project but it's not effective as Iceberg internally\r\nrelies on long timestamp values since epoch, and the patch converts to\r\njava.sql.Timestamp; as a result the conversion currently fails with\r\n```\r\nIllegalArgumentException: Wrong class, expected java.lang.Long, but was java.sql.Timestamp, for object\r\n```\r\n\r\nSince this patch is essentially ineffective, I think it makes the most\r\nsense to remove it.\r\nThis is also very important since it is required to enable us to upgrade\r\nIceberg versions since this patch does not cleanly apply on anything\r\nafter Iceberg 1.2!\r\n\r\nNote: I am also working towards adding this support so this gap will be\r\nclosed soon.\r\n\r\n## How was this patch tested?\r\nExisting CI\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\nTechnically now a user with an timestamp partitioned table will\r\nencounter a different error:\r\n\r\nBefore this change the error would manifest as:\r\n\r\n```\r\nIllegalArgumentException: Wrong class, expected java.lang.Long, but was java.sql.Timestamp, for object: \r\n```\r\n\r\nAfter this change the error would be \r\n```\r\n\"Unsupported type for fromPartitionString: Timestamp\"\r\n```\r\n\r\nConsidering it's unsupported, the new error change is a bit more clear.\r\nNote: I am also working towards adding this support so this gap will be\r\nclosed soon.","shortMessageHtmlLink":"[UNIFORM] Remove timestamp partition patch as it is not effective cur…"}},{"before":"cb6e386864fbd606cde81b8cffbecaf2eff868c9","after":"c1f42375eaf99c61dc4614fc7352c3af3cd7e874","ref":"refs/heads/master","pushedAt":"2024-08-06T15:35:49.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"scottsand-db","name":"Scott Sandre","path":"/scottsand-db","primaryAvatarUrl":"https://github.com/avatars/u/59617782?s=80&v=4"},"commit":{"message":"[Kernel] Configure code formatter for Java Code (#3466)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [ ] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [x] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\n\r\n\r\nThis configures a code formatter using *google-java-format* for the Java\r\ncode in the Kernel (similarly to how it was done in\r\nhttps://github.com/unitycatalog/unitycatalog/commit/54b76d88255dd2baa3f11e515159cfd34cb295e2).\r\n\r\nCode can be checked by running `build/sbt javafmtCheckAll`. Code can\r\nautomatically be formatted `build/sbt javafmtAll`.\r\nOnce this PR is in, we should open follow-up PRs where the code in the\r\nKernel is properly formatted.\r\nAfter that we can enforce the new code style during compilation\r\n\r\n## How was this patch tested?\r\n\r\n\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n","shortMessageHtmlLink":"[Kernel] Configure code formatter for Java Code (#3466)"}},{"before":"b1fdee86502eaa6329270f9502ad0adcde22d37b","after":"1543dd53652a3dc046919da074b730efb6da92f9","ref":"refs/heads/branch-3.2","pushedAt":"2024-08-06T04:12:51.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[3.2][Spark] Pin the `pip` version to `24.0` to get around the version format requirement (#3482)\n\nCherry-pick #3302\r\n\r\n... enforced by the `pip` from `24.1`\r\n\r\nRecent `delta-spark` [CI\r\n\r\njobs](https://github.com/delta-io/delta/actions/runs/9628486756/job/26556785657)\r\nare failing with the following error.\r\n```\r\nERROR: Invalid requirement: 'delta-spark==3.3.0-SNAPSHOT': Expected end or semicolon (after version specifier)\r\n delta-spark==3.3.0-SNAPSHOT\r\n ~~~~~~~^\r\n```\r\n\r\nEarlier\r\n\r\n[runs](https://github.com/delta-io/delta/actions/runs/9526169441/job/26261227425)\r\nhad the following warning\r\n```\r\nDEPRECATION: delta-spark 3.3.0-SNAPSHOT has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of delta-spark or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\r\n```\r\n\r\nPinning the `pip` version to `23.2.1` to let the jobs pass. We need to\r\nfind a long-term solution on the version of the PyPI generated to avoid\r\nthis issue but it is a bit complicated as the `delta-spark` PyPI also\r\ndepends on the delta jars with the same version as the PyPI package\r\nname.","shortMessageHtmlLink":"[3.2][Spark] Pin the pip version to 24.0 to get around the versio…"}},{"before":"6463b3e2359909f56b1e8750fcbb050ba8404059","after":"cb6e386864fbd606cde81b8cffbecaf2eff868c9","ref":"refs/heads/master","pushedAt":"2024-08-05T21:38:02.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Add commit version and logical records in Delta DML metrics (#3458)\n\n## Description\r\n\r\nExtend `delta.dml.{merge, update, delete}.stats` metrics with the\r\nfollowing fields:\r\n- `commitVersion` The commit version of the DML version. This allows\r\nassociating DML metrics with commit metrics and distinguishing DML\r\noperations that did not commit.\r\n- `numLogicalRecordsAdded` and `numLogicalRecordsRemoved`: The number of\r\nlogical records in AddFile and RemoveFile actions to be committed. These\r\nmetrics can be compared to the row-level metrics emitted by the DML\r\noperations.\r\n\r\nFinally, this commit adds the `isWriteCommand` field in DELETE metrics\r\nto distinguish DELETE operations that are performed in the context of\r\nWRITE commands that selectively overwrite data.\r\n\r\n\r\n## How was this patch tested?\r\nLog-only changes. Existing tests.","shortMessageHtmlLink":"[Spark] Add commit version and logical records in Delta DML metrics (#…"}},{"before":"ae5ca08a0e5c29bca7c06594a8cac15ec8f05b12","after":"10ed75aea76b9f0a1c4f70c60b58e76cb0e8fc9e","ref":"refs/heads/branch-3.1","pushedAt":"2024-08-05T20:29:30.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Pin the `pip` version to `24.0` to get around the version format requirement (#3483)\n\nCherry-pick #3302\r\n\r\n... enforced by the `pip` from `24.1`\r\n\r\nRecent `delta-spark` [CI\r\n\r\njobs](https://github.com/delta-io/delta/actions/runs/9628486756/job/26556785657)\r\nare failing with the following error.\r\n```\r\nERROR: Invalid requirement: 'delta-spark==3.3.0-SNAPSHOT': Expected end or semicolon (after version specifier)\r\n delta-spark==3.3.0-SNAPSHOT\r\n ~~~~~~~^\r\n```\r\n\r\nEarlier\r\n\r\n[runs](https://github.com/delta-io/delta/actions/runs/9526169441/job/26261227425)\r\nhad the following warning\r\n```\r\nDEPRECATION: delta-spark 3.3.0-SNAPSHOT has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of delta-spark or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\r\n```\r\n\r\nPinning the `pip` version to `23.2.1` to let the jobs pass. We need to\r\nfind a long-term solution on the version of the PyPI generated to avoid\r\nthis issue but it is a bit complicated as the `delta-spark` PyPI also\r\ndepends on the delta jars with the same version as the PyPI package\r\nname.","shortMessageHtmlLink":"[Spark] Pin the pip version to 24.0 to get around the version for…"}},{"before":"0183834213e22222bda7c5c825756f4bd5727dbc","after":"400caf49f286f09adb272a2a0556fb953a62635a","ref":"refs/heads/branch-3.0","pushedAt":"2024-08-05T20:29:06.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Pin the `pip` version to `24.0` to get around the version format requirement (#3484)\n\n(cherry-pick #3302)\r\n\r\n... enforced by the `pip` from `24.1`\r\n\r\nRecent `delta-spark` [CI\r\n\r\njobs](https://github.com/delta-io/delta/actions/runs/9628486756/job/26556785657)\r\nare failing with the following error.\r\n```\r\nERROR: Invalid requirement: 'delta-spark==3.3.0-SNAPSHOT': Expected end or semicolon (after version specifier)\r\n delta-spark==3.3.0-SNAPSHOT\r\n ~~~~~~~^\r\n```\r\n\r\nEarlier\r\n\r\n[runs](https://github.com/delta-io/delta/actions/runs/9526169441/job/26261227425)\r\nhad the following warning\r\n```\r\nDEPRECATION: delta-spark 3.3.0-SNAPSHOT has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of delta-spark or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\r\n```\r\n\r\nPinning the `pip` version to `23.2.1` to let the jobs pass. We need to\r\nfind a long-term solution on the version of the PyPI generated to avoid\r\nthis issue but it is a bit complicated as the `delta-spark` PyPI also\r\ndepends on the delta jars with the same version as the PyPI package\r\nname.","shortMessageHtmlLink":"[Spark] Pin the pip version to 24.0 to get around the version for…"}},{"before":"bc931015d33e66de8e3c8ccddac5412fb429a24d","after":"b1fdee86502eaa6329270f9502ad0adcde22d37b","ref":"refs/heads/branch-3.2","pushedAt":"2024-08-05T19:12:31.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://github.com/avatars/u/663212?s=80&v=4"},"commit":{"message":"[Spark][Backport 3.2] Fix the semantic of `shouldRewriteToBeIcebergCompatible` in REORG UPGRADE UNIFORM (#3474)\n\n## Which Delta project/connector is this regarding?\r\n\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\ncurrently we utilize the helper function\r\n`shouldRewriteToBeIcebergCompatible` to filter the portion of parquet\r\nfiles that need to be rewritten when running `REORG UPGRADE UNIFORM`\r\nbased on the tags in the `AddFile`.\r\n\r\nhowever, the `DeltaUpgradeUniformOperation.icebergCompatVersion` is\r\naccidentally shadowed, which will make\r\n`shouldRewriteToBeIcebergCompatible` always return `false` if the\r\n`AddFile.tags` is not `null` - this is not the expected semantic of this\r\nfunction.\r\n\r\nthis PR introduces the fix for this problem and add unit tests to ensure\r\nthe correctness.\r\n\r\n## How was this patch tested?\r\nthrough unit tests in `UniFormE2ESuite.scala`.\r\n\r\n## Does this PR introduce _any_ user-facing changes? \r\nno.","shortMessageHtmlLink":"[Spark][Backport 3.2] Fix the semantic of `shouldRewriteToBeIcebergCo…"}},{"before":"9be04ba143373f14c3b4a6e39822b27adf34fbfa","after":"6463b3e2359909f56b1e8750fcbb050ba8404059","ref":"refs/heads/master","pushedAt":"2024-08-05T18:49:33.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://github.com/avatars/u/663212?s=80&v=4"},"commit":{"message":"[Spark] Uses java-based coordinated commits classes in Delta Spark (#3470)\n\n#### Which Delta project/connector is this regarding?\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nThis PR adopts the coordinated commits interfaces from the storage\r\nmodule in Delta Spark. It removes the existing scala classes and adds\r\nthe necessary conversion code from java -> scala (and in the opposite\r\ndirection) where necessary.\r\n\r\n## How was this patch tested?\r\n\r\nAdds some unit tests for the critical code pieces (action\r\nserialization/deserialization and LogStore conversion). For the\r\nremainder, existing tests are sufficient.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo","shortMessageHtmlLink":"[Spark] Uses java-based coordinated commits classes in Delta Spark (#…"}},{"before":"1248c5ca2606edd48b10fb7ef468da77597e176a","after":"9be04ba143373f14c3b4a6e39822b27adf34fbfa","ref":"refs/heads/master","pushedAt":"2024-08-05T18:49:07.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://github.com/avatars/u/663212?s=80&v=4"},"commit":{"message":"[Spark] Add an integration test for DynamoDB Commit Coordinator (#3158)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [X] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\n\r\nAdds an integration test for the DynamoDB Commit Coordinator. Tests the\r\nfollowing scenarios\r\n1. Automated dynamodb table creation\r\n2. Concurrent reads and writes\r\n3. Table upgrade and downgrade\r\n\r\nThe first half of the test is heavily borrowed from\r\n`dynamodb_logstore.py`.\r\n\r\n## How was this patch tested?\r\n\r\n\r\nTest runs successfully with real DynamoDB and S3.\r\nSet the following environment variables (after setting the credentials\r\nin ~/.aws/credentials):\r\n```\r\nexport S3_BUCKET=\r\nexport AWS_PROFILE=\r\nexport RUN_ID=\r\nexport AWS_DEFAULT_REGION=\r\n```\r\n\r\nRan the test:\r\n```\r\n./run-integration-tests.py --use-local --run-dynamodb-commit-coordinator-integration-tests \\\r\n --dbb-conf io.delta.storage.credentials.provider=com.amazonaws.auth.profile.ProfileCredentialsProvider \\\r\n spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.profile.ProfileCredentialsProvider \\\r\n --dbb-packages org.apache.hadoop:hadoop-aws:3.4.0,com.amazonaws:aws-java-sdk-bundle:1.12.262\r\n```\r\n\r\n\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n","shortMessageHtmlLink":"[Spark] Add an integration test for DynamoDB Commit Coordinator (#3158)"}},{"before":"368296bcc513710954c1c11df12d6d5a2dd672e1","after":"bc931015d33e66de8e3c8ccddac5412fb429a24d","ref":"refs/heads/branch-3.2","pushedAt":"2024-08-05T17:30:42.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"scottsand-db","name":"Scott Sandre","path":"/scottsand-db","primaryAvatarUrl":"https://github.com/avatars/u/59617782?s=80&v=4"},"commit":{"message":"[3.2 Cherry Pick] [#3423] Fix unnecessary DynamoDB GET calls during LogStore::listFrom VACUUM calls (#3463)\n\nCherry-pick 03bdf8476c3e4f76d9a2d26592b7fd638736f57a to branch 3.2\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n- [X] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nResolves #3423.\r\n\r\nThis PR updates the logic in `BaseExternalLogStore::listFrom` so that it\r\ndoes not make a request to get the latest entry from the external store\r\n(which is used to perform recovery operations) in the event that a non\r\n`_delta_log` file is being listed.\r\n\r\nThis is useful for VACUUM operations which may do hundreds or thousands\r\nof list calls in the table directory and nested partition directories of\r\nparquet files. This is NOT the `_delta_log`. Thus, checking the external\r\nstore during these list calls is (1) useless and unwanted as we are not\r\nlisting the `_delta_log` so clearly now isn't the time to attempt to do\r\na fixup, and (2) expensive.\r\n\r\nThis PR makes it so that future VACUUM operations do not perform\r\nunnecessary calls to the external store (e.g. DyanamoDB).\r\n\r\n## How was this patch tested?\r\n\r\nUnit tests and an integration test that actually runs VACUUM and\r\ncompares the number of external store calls using the old/new logic. I\r\nran that test myself 50 times, too, and it passed every time (therefore,\r\nnot flaky).\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo","shortMessageHtmlLink":"[3.2 Cherry Pick] [#3423] Fix unnecessary DynamoDB GET calls during L…"}},{"before":"45258e1667dd0053cdce0b9da49ea6205bca0d9f","after":"ae5ca08a0e5c29bca7c06594a8cac15ec8f05b12","ref":"refs/heads/branch-3.1","pushedAt":"2024-08-05T17:30:22.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"scottsand-db","name":"Scott Sandre","path":"/scottsand-db","primaryAvatarUrl":"https://github.com/avatars/u/59617782?s=80&v=4"},"commit":{"message":"[3.1 Cherry Pick] [#3423] Fix unnecessary DynamoDB GET calls during LogStore::listFrom VACUUM calls (#3462)\n\nCherry-pick 03bdf8476c3e4f76d9a2d26592b7fd638736f57a to branch 3.1\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n- [X] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nResolves #3423.\r\n\r\nThis PR updates the logic in `BaseExternalLogStore::listFrom` so that it\r\ndoes not make a request to get the latest entry from the external store\r\n(which is used to perform recovery operations) in the event that a non\r\n`_delta_log` file is being listed.\r\n\r\nThis is useful for VACUUM operations which may do hundreds or thousands\r\nof list calls in the table directory and nested partition directories of\r\nparquet files. This is NOT the `_delta_log`. Thus, checking the external\r\nstore during these list calls is (1) useless and unwanted as we are not\r\nlisting the `_delta_log` so clearly now isn't the time to attempt to do\r\na fixup, and (2) expensive.\r\n\r\nThis PR makes it so that future VACUUM operations do not perform\r\nunnecessary calls to the external store (e.g. DyanamoDB).\r\n\r\n## How was this patch tested?\r\n\r\nUnit tests and an integration test that actually runs VACUUM and\r\ncompares the number of external store calls using the old/new logic. I\r\nran that test myself 50 times, too, and it passed every time (therefore,\r\nnot flaky).\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo","shortMessageHtmlLink":"[3.1 Cherry Pick] [#3423] Fix unnecessary DynamoDB GET calls during L…"}},{"before":"9be2d4d332f993822eb3b44d583aaa66e6546b29","after":"0183834213e22222bda7c5c825756f4bd5727dbc","ref":"refs/heads/branch-3.0","pushedAt":"2024-08-05T17:28:02.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"scottsand-db","name":"Scott Sandre","path":"/scottsand-db","primaryAvatarUrl":"https://github.com/avatars/u/59617782?s=80&v=4"},"commit":{"message":"[3.0 Cherry Pick] [#3423] Fix unnecessary DynamoDB GET calls during LogStore::listFrom VACUUM calls (#3461)\n\nCherry-pick 03bdf8476c3e4f76d9a2d26592b7fd638736f57a to branch 3.0\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n- [X] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nResolves #3423.\r\n\r\nThis PR updates the logic in `BaseExternalLogStore::listFrom` so that it\r\ndoes not make a request to get the latest entry from the external store\r\n(which is used to perform recovery operations) in the event that a non\r\n`_delta_log` file is being listed.\r\n\r\nThis is useful for VACUUM operations which may do hundreds or thousands\r\nof list calls in the table directory and nested partition directories of\r\nparquet files. This is NOT the `_delta_log`. Thus, checking the external\r\nstore during these list calls is (1) useless and unwanted as we are not\r\nlisting the `_delta_log` so clearly now isn't the time to attempt to do\r\na fixup, and (2) expensive.\r\n\r\nThis PR makes it so that future VACUUM operations do not perform\r\nunnecessary calls to the external store (e.g. DyanamoDB).\r\n\r\n## How was this patch tested?\r\n\r\nUnit tests and an integration test that actually runs VACUUM and\r\ncompares the number of external store calls using the old/new logic. I\r\nran that test myself 50 times, too, and it passed every time (therefore,\r\nnot flaky).\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo","shortMessageHtmlLink":"[3.0 Cherry Pick] [#3423] Fix unnecessary DynamoDB GET calls during L…"}},{"before":"ac2bcb4a6f82b9ead91e2ccae868ebecd4a87ae9","after":"1248c5ca2606edd48b10fb7ef468da77597e176a","ref":"refs/heads/master","pushedAt":"2024-08-05T16:58:21.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Fix DeltaConnectPlannerSuite by copying the moved createDummySessionHolder (#3465)\n\n## Description\r\nFixes `DeltaConnectPlannerSuite` by replacing `SessionHolder.forTesting`\r\nwith a copy of `createDummySessionHolder`, as this method got moved in\r\nthe Spark master branch:\r\nhttps://github.com/apache/spark/commit/acb2fecb8c174fa4e2f23c843a904161151c8dfa\r\n\r\n## How was this patch tested?\r\nFixes test.","shortMessageHtmlLink":"[Spark] Fix DeltaConnectPlannerSuite by copying the moved createDummy…"}},{"before":"930901237d814cf76595c5670855009e1e88e778","after":"ac2bcb4a6f82b9ead91e2ccae868ebecd4a87ae9","ref":"refs/heads/master","pushedAt":"2024-08-05T16:57:00.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] populate Delta clone override table properties to catalog (#3469)\n\n## Description\r\npopulate clone override table properties to catalog, which is missed in\r\nthe current impl\r\n\r\n## How was this patch tested?\r\nUT","shortMessageHtmlLink":"[Spark] populate Delta clone override table properties to catalog (#3469"}},{"before":"9151a5466217456e51079687d869c240b7dbb308","after":"930901237d814cf76595c5670855009e1e88e778","ref":"refs/heads/master","pushedAt":"2024-08-05T16:55:36.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Add DELTA_TESTING=1 environment variable when running Python tests (#3444)\n\n## Description\r\nIn Python tests, we want to test features that are only for testing. But\r\n`DELTA_TESTING=1` is missing for Python tests. So this PR adds it to the\r\nenvironment variable when running Python tests.","shortMessageHtmlLink":"[Spark] Add DELTA_TESTING=1 environment variable when running Python …"}},{"before":"63845c201643ac1571d58a3c5be1e6fd30761a46","after":"9151a5466217456e51079687d869c240b7dbb308","ref":"refs/heads/master","pushedAt":"2024-08-05T16:53:51.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Support clone and restore for Identity Columns (#3459)\n\n\r\n## Description\r\nThis PR is part of https://github.com/delta-io/delta/issues/1959 .\r\nIt adds support for clone and restore tables with identity columns.\r\n\r\n## How was this patch tested?\r\nClone and restore related test cases.","shortMessageHtmlLink":"[Spark] Support clone and restore for Identity Columns (#3459)"}},{"before":"2d1faaeccbf140fbad5fd17d1fd9bab19ee28912","after":"63845c201643ac1571d58a3c5be1e6fd30761a46","ref":"refs/heads/master","pushedAt":"2024-08-02T21:40:57.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Add Row Tracking Backfill Conflict Checker RemoveFile rule (#3467)\n\n## Description\r\n`RowTrackingBackfill` (or backfill for short) is a special operation\r\nthat materializes and recommits all existing files in table using one or\r\nseveral commits to ensure that every AddFile has a base row ID and\r\ndefault Row Commit Version.\r\n\r\nIn this PR, we add a new rule to `ConflictChecker` that resolve\r\nconcurrent conflicts involving Backfill. We check that\r\nRowTrackingBackfill is not resurrecting files that were removed\r\nconcurrently and that an AddFile and its corresponding RemoveFile have\r\nthe same base row ID and default RCV. We also add logic to skip certain\r\nconcurrency checks if it involves Backfill.\r\n\r\nThis opens up a lot of interesting Conflict Resolutions cases to test,\r\nso we will add more UTs revolving conflict checking/resolutions between\r\nBackfill and other operations with different scenario in the next PRs.\r\n\r\n\r\n## How was this patch tested?\r\nAdded ConflictResolution UTs.","shortMessageHtmlLink":"[Spark] Add Row Tracking Backfill Conflict Checker RemoveFile rule (#…"}},{"before":"19248992f0a4a14155a0b0e32d0202331b2a8955","after":"2d1faaeccbf140fbad5fd17d1fd9bab19ee28912","ref":"refs/heads/master","pushedAt":"2024-08-02T21:05:51.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Kernel][Expression] - Performance Optimization for LIKE expression evaluation (#3185)\n\n## Description\r\nResolves https://github.com/delta-io/delta/issues/3129\r\n\r\n## How was this patch tested?\r\nExisting tests validated. This is a performance optimization for LIKE\r\nexpression evaluation.\r\n\r\nSigned-off-by: Krishnan Paranji Ravi ","shortMessageHtmlLink":"[Kernel][Expression] - Performance Optimization for LIKE expression e…"}},{"before":"2e371e79e40806237e9bc97f3dd264aad48c9ae7","after":"19248992f0a4a14155a0b0e32d0202331b2a8955","ref":"refs/heads/master","pushedAt":"2024-08-02T16:39:17.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Kernel] Add support for nested schema fields (#3445)\n\nThis handles nested schema and resolves https://github.com/delta-io/delta/issues/3427.","shortMessageHtmlLink":"[Kernel] Add support for nested schema fields (#3445)"}},{"before":"8eb7a4fa7b68d7edd8862f3de59673b2ea743167","after":"2e371e79e40806237e9bc97f3dd264aad48c9ae7","ref":"refs/heads/master","pushedAt":"2024-08-02T16:03:42.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://github.com/avatars/u/663212?s=80&v=4"},"commit":{"message":"Use correct partition/batch size for Delta Uniform iceberg conversion (#3453)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nThe existing conversion logic used toLocalIterator which will spawn many\r\nSpark jobs to collect AddFiles to the driver based on default spark\r\npartition size. Mostly the default size is not good and thus conversion\r\nand commit to Iceberg will be bottlenecked.\r\n\r\nThe PR used repartition to size the partition properly to avoid the\r\nbottleneck.\r\n\r\n## How was this patch tested?\r\n\r\nmanually tested on a 5M files table. performance improved from tens of\r\nminutes to 5 minutes.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n","shortMessageHtmlLink":"Use correct partition/batch size for Delta Uniform iceberg conversion ("}},{"before":"9ca8e82ddd47e24fa84c105f88c3a5fe199bbdbc","after":"8eb7a4fa7b68d7edd8862f3de59673b2ea743167","ref":"refs/heads/master","pushedAt":"2024-08-02T15:11:33.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://github.com/avatars/u/663212?s=80&v=4"},"commit":{"message":"[Spark] Add RowTrackingBackfillCommand (#3449)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [X] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\nAdding the\r\n[RowTrackingBackfillCommand](https://docs.google.com/document/d/1ji3zIWURSz_qugpRHjIV_2BUZPVKxYMiEFaDORt_ULA/edit#heading=h.8al9qhd83yov),\r\nthe ability to assign row IDs to table rows after the table creation.\r\n\r\n\r\n## How was this patch tested?\r\nAdded UTs.\r\n\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\nNo.\r\n","shortMessageHtmlLink":"[Spark] Add RowTrackingBackfillCommand (#3449)"}},{"before":"eb719f8f2eedf6d010c54a69cf126321bcfa6f11","after":"9ca8e82ddd47e24fa84c105f88c3a5fe199bbdbc","ref":"refs/heads/master","pushedAt":"2024-08-01T21:14:00.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://github.com/avatars/u/663212?s=80&v=4"},"commit":{"message":"[Spark] Execute MERGE using Dataframe API in Scala (#3456)\n\n## Description\r\nDue to Spark unfortunate behavior of resolving plan nodes it doesn't\r\nknow, the `DeltaMergeInto` plan created when using the MERGE scala API\r\nneeds to be manually resolved to ensure spark doesn't interfere with its\r\nanalysis.\r\n\r\nThis currently completely bypasses Spark's analysis as we then manually\r\nexecute the MERGE command which has negatiev effects, e.g. the execution\r\nis not visible in QueryExecutionListener.\r\n\r\nThis change addresses this issue, by executing the plan using the\r\nDataframe API after it's manually resolved so that the command goes\r\nthrough the regular code path.\r\n\r\nResolves https://github.com/delta-io/delta/issues/1521\r\n## How was this patch tested?\r\nCovered by existing tests.","shortMessageHtmlLink":"[Spark] Execute MERGE using Dataframe API in Scala (#3456)"}},{"before":"890889a3b841f8157c833f813728b49d7276c73b","after":"eb719f8f2eedf6d010c54a69cf126321bcfa6f11","ref":"refs/heads/master","pushedAt":"2024-08-01T17:05:46.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://github.com/avatars/u/663212?s=80&v=4"},"commit":{"message":"[Spark] Add annotation for merge materialize source stage of merge (#3452)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nOriginally MERGE source materialization was lazy, and triggered when the\r\nsource was used for the first time. Because of that, it couldn't be\r\ncleanly separated as a stage. Since it was changed to be eager, we can\r\nnow annotate it, which should make it easier to find in Spark UI.\r\n\r\n## How was this patch tested?\r\n\r\nMerge materialize source stage is now annotated:\r\n\r\n![image](https://github.com/user-attachments/assets/5a468cda-ffae-40d4-9054-dcfca681c470)\r\n\r\nUnit tests validate that the new stage is present in MERGE commit\r\nmetrics.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo\r\n\r\nCo-authored-by: Julek Sompolski ","shortMessageHtmlLink":"[Spark] Add annotation for merge materialize source stage of merge (#…"}},{"before":"a88198709eb99f363e9f7377d8c6d234d44862dc","after":"890889a3b841f8157c833f813728b49d7276c73b","ref":"refs/heads/master","pushedAt":"2024-08-01T00:04:23.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Fix the inconsistencies in min/max Delta Log stats for special characters (#3430)\n\n## Description\r\n\r\nWhen truncating maxValue strings longer than 32 characters for\r\nstatistics, it's crucial to ensure the final truncated string is\r\nlexicographically greater than or equal to the input string in UTF-8\r\nencoded bytes.\r\n\r\nPreviously, we used the Unicode replacement character as the tieBreaker,\r\ncomparing it directly against one byte of the next character at a time.\r\nThis approach was insufficient because the tieBreaker could incorrectly\r\nwin against the non-first bytes of other characters (e.g., � < 🌼 but � >\r\nthe second byte of 🌼). We now compare one UTF-8 character (i.e. upto 2\r\nScala UTF-16 characters depending on surrogates) at a time to address\r\nthis issue.\r\n\r\nWe also start using U+10FFFD i.e. character with highest Unicode code\r\npoint as the tie-breaker now.\r\n\r\n## How was this patch tested?\r\nUTs","shortMessageHtmlLink":"[Spark] Fix the inconsistencies in min/max Delta Log stats for specia…"}},{"before":"cfd91336755eb98eae902250e354d84eb3724df2","after":"a88198709eb99f363e9f7377d8c6d234d44862dc","ref":"refs/heads/master","pushedAt":"2024-07-30T23:29:04.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://github.com/avatars/u/89107911?s=80&v=4"},"commit":{"message":"[Kernel] Add exception principles for Kernel to solidify the rules for our exception framework (#3408)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [ ] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\n\r\n\r\n## How was this patch tested?\r\n\r\n\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n","shortMessageHtmlLink":"[Kernel] Add exception principles for Kernel to solidify the rules fo…"}},{"before":"ef8c779c415ca305f36d60cfd4088fe96feaf436","after":"cfd91336755eb98eae902250e354d84eb3724df2","ref":"refs/heads/master","pushedAt":"2024-07-30T23:28:54.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://github.com/avatars/u/89107911?s=80&v=4"},"commit":{"message":"[Docs] Clean up the docs in master for future 3.X releases (#3440)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [ ] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [X] Other (Docs)\r\n\r\n## Description\r\n\r\nRemoves the docs changes for some Delta 4.0+ specific features since the\r\nnext release will be 3.X. Also adds a banner to the landing page that\r\nannounces and points to the Delta 4.0 Preview release.\r\n\r\nAlso fixes the version numbers in the quickstart page to be `3.2.0` and\r\nnot `3.1.0`\r\n\r\n## How was this patch tested?\r\n\r\nLocal build.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo.","shortMessageHtmlLink":"[Docs] Clean up the docs in master for future 3.X releases (#3440)"}},{"before":"03ca73fe70cceea2999edb7ebbbb66d4e75f5055","after":"ef8c779c415ca305f36d60cfd4088fe96feaf436","ref":"refs/heads/master","pushedAt":"2024-07-30T23:26:54.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://github.com/avatars/u/89107911?s=80&v=4"},"commit":{"message":"[Spark] Add variant integration test to master (#3439)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [X] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nVariant type support was added for the Delta 4.0 preview release on\r\nSpark 4.0 Preview. An integration test was added to the release branch\r\nin https://github.com/delta-io/delta/pull/3220, this PR adds the\r\nintegration test to master with some updated infra.\r\n- Doesn't run the test when Spark version is too low\r\n- Updates `examples/scala/build.sbt` to work for 4.0.0+\r\n\r\n\r\n## How was this patch tested?\r\n\r\nRan the scala integration tests using both `3.2.0` and `4.0.0rc1`\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo.\r\n\r\n---------\r\n\r\nCo-authored-by: richardc-db <87336575+richardc-db@users.noreply.github.com>","shortMessageHtmlLink":"[Spark] Add variant integration test to master (#3439)"}},{"before":"c45c6e6bb4b7e34a2ddcd8e47aaacbe169e1729a","after":"03ca73fe70cceea2999edb7ebbbb66d4e75f5055","ref":"refs/heads/master","pushedAt":"2024-07-30T21:42:47.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Kernel] Add get method to FieldMetadata for easier extraction of value in the correct type (#3435)\n\n## Description\r\nThis adds a `get()` method to `FieldMetadata` for easier extraction of\r\nthe value in the correct type and fixes\r\nhttps://github.com/delta-io/delta/issues/3419","shortMessageHtmlLink":"[Kernel] Add get method to FieldMetadata for easier extraction of val…"}},{"before":"2156efde82048e90bf25d31ae44e138f538be6aa","after":"c45c6e6bb4b7e34a2ddcd8e47aaacbe169e1729a","ref":"refs/heads/master","pushedAt":"2024-07-30T20:08:36.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://github.com/avatars/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Add BackfillBatchIterator for Row Tracking Backfill (#3441)\n\n## Description\r\nIn this PR, we add `BackfillBatchIterator` which contains the core logic\r\nfor selecting files for the WIP Row Tracking Backfill operation,\r\ncreating iterators which batches files together. The Row Tracking\r\nBackfill operation will have multiple commits, each commit adds\r\n`baseRowId` to a batch of files. More details can be found in the\r\n[Design\r\nDoc](https://docs.google.com/document/d/1ji3zIWURSz_qugpRHjIV_2BUZPVKxYMiEFaDORt_ULA/edit#heading=h.8al9qhd83yov).\r\n\r\n## How was this patch tested?\r\nAdded new Test Suite.","shortMessageHtmlLink":"[Spark] Add BackfillBatchIterator for Row Tracking Backfill (#3441)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEk3AFJwA","startCursor":null,"endCursor":null}},"title":"Activity · delta-io/delta"}