Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19221. S3A: Unable to recover from failure of multipart block upload attempt #6938

Open
wants to merge 5 commits into
base: trunk
Choose a base branch
from

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Jul 10, 2024

HADOOP-19221

Adds custom set of content providers in UploadContentProviders which

  • restart on failures
  • do not copy buffers/byte buffers into new private byte arrays, so avoid exacerbating memory problems.

org.apache.hadoop.fs.store.ByteBufferInputStream has been pulled out of org.apache.hadoop.fs.store.DataBlocks to assist.

  • S3A fs collects statistics on http 400+ error codes received by sdk, implemented through the logging auditor. Note: 404s are not collected as they are so common during normal operation.
  • Improved handling of interrupt exceptions raised while waiting for block uploads to complete when spark wants to abort a speculative task.
  • fault injection test used to recreate the failure (could only do this in CommitOperations), and verify the fix is good.

Description of PR

How was this patch tested?

Fault injection through AWS SDK extension point which changes the status from 200 to 400 after the targeted operation completes. This puts the SDK into retry/recovery mode.

Some new unit tests

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran steveloughran force-pushed the s3/HADOOP-19221-multipart-put-failures branch from bf5d5ec to a93afe6 Compare July 12, 2024 18:30
@steveloughran steveloughran marked this pull request as draft July 12, 2024 18:39
@steveloughran steveloughran force-pushed the s3/HADOOP-19221-multipart-put-failures branch from a93afe6 to 9b942c7 Compare July 12, 2024 18:47
@steveloughran
Copy link
Contributor Author

I believe I have a way to test this by injecting 500 exceptions with a custom execution interceptor added to the audit chain

@steveloughran steveloughran force-pushed the s3/HADOOP-19221-multipart-put-failures branch from 76deb75 to 65fd797 Compare July 18, 2024 18:37
@steveloughran
Copy link
Contributor Author

tested s3 london with -Dparallel-tests -DtestsThreadCount=12 -Dscale

prefetch failures, timing related (12 too big...)

[ERROR] Failures: 
[ERROR]   ITestS3APrefetchingInputStream.testReadLargeFileFully:136 [Maxiumum named action_executor_acquired.max] 
Expecting:
 <0L>
to be greater than:
 <0L> 
[ERROR] Errors: 
[ERROR]   ITestS3APrefetchingLruEviction.testSeeksWithLruEviction:162 » TestTimedOut tes...

@apache apache deleted a comment from hadoop-yetus Jul 22, 2024
@apache apache deleted a comment from hadoop-yetus Jul 22, 2024
@apache apache deleted a comment from hadoop-yetus Jul 22, 2024
@apache apache deleted a comment from hadoop-yetus Jul 22, 2024
@apache apache deleted a comment from hadoop-yetus Jul 22, 2024
@apache apache deleted a comment from hadoop-yetus Jul 22, 2024
@apache apache deleted a comment from hadoop-yetus Jul 22, 2024
@steveloughran steveloughran marked this pull request as ready for review July 22, 2024 17:37
@steveloughran steveloughran force-pushed the s3/HADOOP-19221-multipart-put-failures branch 2 times, most recently from 2790d56 to b95ee7c Compare July 23, 2024 16:09
@steveloughran
Copy link
Contributor Author

s3 london with: -Dparallel-tests -DtestsThreadCount=8 -Dscale

This is ready to be reviewed. @mukund-thakur, @HarshitGupta11 and @shameersss1 could you all look at this?

@steveloughran steveloughran force-pushed the s3/HADOOP-19221-multipart-put-failures branch from b187942 to 1fb04e9 Compare July 24, 2024 14:58
*/
public final class ByteBufferInputStream extends InputStream {
private static final Logger LOG =
LoggerFactory.getLogger(DataBlocks.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoudn't this be ByteBufferInputStream.class ?

} catch (ExecutionException ee) {
//there is no way of recovering so abort
//cancel all partUploads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we cancelling all the uploads here ?

Copy link
Contributor Author

@steveloughran steveloughran Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at this. may need some more review to be confident we are doing abort here properly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..done a lot more work into aborting.

*/
public class AWSStatus500Exception extends AWSServiceIOException {
public AWSStatus500Exception(String operation,
AwsServiceException cause) {
super(operation, cause);
}

@Override
public boolean retryable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this make all 500 retriable ? I mean if we S3 throws exception like 500 S3 Server Internal error. Do we need to retry from S3A client as well ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the latest change..I've made it an option

// there is specific handling for some 5XX codes (501, 503);
// this is for everything else
policyMap.put(AWSStatus500Exception.class, fail);
policyMap.put(AWSStatus500Exception.class, retryAwsClientExceptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to selectively retry 500 exception? Say only when the cause is "Your socket connection......."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the full comment below. along with that I really don't like looking in error strings, way too brittle for production code. Even in tests I like to share the text across production and test classes as constants.

(yes, I know about org.apache.hadoop.fs.s3a.impl.ErrorTranslation ....doesn't mean I like it)

Copy link
Contributor

@shameersss1 shameersss1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes looks mostly good to me. I have left some minor comments and questions.

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Production code looks good overall. Have to look at the tests.

@Override
protected ByteBufferInputStream createNewStream() {
// set the buffer up from reading from the beginning
blockBuffer.limit(initialPosition);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why setting the limit is important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to start reading from the initial position every time the stream is opened.

Retrying _should_ make it go away.

The 500 error is considered retryable by the AWS SDK, which will have already
tried it `fs.s3a.attempts.maximum` times before reaching the S3A client -which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: retried?

@steveloughran
Copy link
Contributor Author

@shameersss1
I really don't know what best to do here.

We have massively cut back on the number of retries which take place in the V2 SDK compared to V1; even though we have discussed in the past turning it off completely and handling it all ourselves. However, that would break things the transfer manager does in separate threads.

The thing is, I do not know how often we see 500 errors against AWS S3 stores (rather than third party ones with unrecoverable issues) -and now we have seen them I don't know what the right policy should be. The only documentation on what to do seems more focused on 503s, and doesn't provide any hints about why a 500 could happen or what to do other than "keep trying maybe it'll go away": https://repost.aws/knowledge-center/http-5xx-errors-s3 . I do suspect it is very rare -otherwise the AWS team might have noticed their lack of resilience here, and we would've found it during our own testing. Any 500 error at any point other than multipart uploads probably gets recovered from nicely so that could've been a background noise of these which we have never noticed before. s3a FS stats will now track these, which may be informative.

I don't want to introduce another configuration switch if possible because that at more to documentation testing maintenance et cetera. One thing I was considering is should we treat this exactly the same as a throttling exception which has its own configuration settings for retries?

Anyway, if you could talk to your colleagues and make some suggestions based on real knowledge of what can happen that would be really nice. Note that we are treating 500 as idempotent, the way we do with all the other failures even though from a distributed computing purism perspective it is not in fact true.

Not looked at the other comments yet; will do later. Based on a code walk-through with Mukud, Harshit and Saikat, I've realised we should make absolutely sure that the stream providing a subset of file fails immediately if the read() goes past the allocated space. With tests, obviously.

@steveloughran
Copy link
Contributor Author

@shameersss1 here is what I propose:

add a boolean config option fs.s3a.retry.all.http.errors

retry on all "maybe unrecoverable" http errors; default is false.

I did think about a comma separated list "500, 4xx, 510" but decided it was overcomplex

@steveloughran
Copy link
Contributor Author

  • updated PR has the new field; do need to document it though.
  • pulled out fault injector class for reuse

based on @shameersss1 comments I've reviewed S3ABlockOutputStream aborting

  • use our own FutureIO to wait for results; this unwraps exceptions
    for us
  • On InterruptedIOException, upload is aborted but no attempt made to cancel
    the requests (things are being interrupted, after all).
  • Atomic bool stopFutureUploads is used to signal to future uploads that they
    should stop uploading but still clean up data.
  • when the awaiting for future IO operation is interrupted, no attempt
    is made to cancel/interrupt the uploads, but that flag is still set.

Now unsure about what is the best policy to avoid ever leaking buffers
if an upload is cancelled.

  1. Should we ever use future.cancel()? or just set stopFutureUploads knowing the uploads are skipped
  2. would we want the upload stream to somehow trigger a failure which gets through the SDK (i.e. no retries?) and then exits?

We could do this now we have our own content provider: raise a nonrecoverable AwsClientException...

@steveloughran steveloughran force-pushed the s3/HADOOP-19221-multipart-put-failures branch from 0d47447 to b793661 Compare August 2, 2024 18:33
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
@apache apache deleted a comment from hadoop-yetus Aug 12, 2024
…upload attempt

Adds custom set of content providers in UploadContentProviders which
* restart on failures
* do not copy buffers/byte buffers into new private byte arrays,
  so avoid exacerbating memory problems.

org.apache.hadoop.fs.store.ByteBufferInputStream has been pulled out
of org.apache.hadoop.fs.store.DataBlocks to assist.

ITestUploadRecovery triggers the failure mode through fault injection

new ContentProviders used as a appropriate in S3ABlockOutputStream
and CommitOperations

IOStatistics
* new IOStatistics for select http error codes
* s3a auditor updates filesystem IOStatistics when these happen

Consider AWSStatus500Exception recoverable

* AWSStatus500Exception is now an optionally recoverable
* section in troubleshooting on it
* and one on 503

The core work here is to ensure that
1. cancel of PUT/POST of data outcome "correctly" in close()
2. blocks are always closed, so as to ensure files are deleted.

S3ABlockOutputStream has major changes here

Key changes
* FutureIO javadocs cover CancellationException
  and new method cancelAllFuturesAndAwaitCompletion()
  to trigger a cancel then await (briefly) for threads to
  be cancelled.
* UploadContentProviders and BlockUploadData take a Supplier<Boolean>
  predicate to probe for a stream still being open; this is wired
  up so if a block is closed, newStream() raises IllegalStateException.
* ProgressListenerEvent enum expanded to track many more events
* S3ABlockOutputStream reports as a appropriate
* CountingProgressListener pulled from AbstractSTestS3AHugeFiles;
  expanded to track new progress events
* New test suite ITestS3ABlockOutputStreamInterruption to abort streams
…ream abort

Make sure that when abort is rejected by s3 this is caught and increments the
statistics rather than escalated to a failure.

Change-Id: I66e65219086cb2d90d2a65dcb4000703f66cf15d
Final bit of work on what was already a significant enough change:
move all the s3client upload commands from S3AFS into S3AStore,
as part of the incremental migration and lining up for a move to AsyncClient
everywhere.

* WriteOperationHelper still takes an S3AFS instance, as most of the
  methods it calls are still implemented there.
* Some changes to the mock testing as required

+ review of the retry and translation attributes of the operations, pulling
  up through interface and verification that all looks good.
+ review handling of CancellationExceptions

Change-Id: Icb3a8286b6e2a423469f534b244dd90e383fd662
Change-Id: Ie4955bc0ef887dbd8686f5ae6681334055e22429
@steveloughran steveloughran force-pushed the s3/HADOOP-19221-multipart-put-failures branch from 8be4c96 to 0e1207a Compare August 12, 2024 15:20
@steveloughran
Copy link
Contributor Author

@shameersss1 can you review this again?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 20 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 14s Maven dependency ordering for branch
+1 💚 mvninstall 32m 34s trunk passed
+1 💚 compile 17m 30s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 compile 16m 15s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 checkstyle 4m 26s trunk passed
+1 💚 mvnsite 2m 42s trunk passed
+1 💚 javadoc 1m 57s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 1m 46s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 3m 54s trunk passed
+1 💚 shadedclient 34m 34s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for patch
+1 💚 mvninstall 1m 31s the patch passed
+1 💚 compile 17m 6s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javac 17m 6s the patch passed
+1 💚 compile 16m 3s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 javac 16m 3s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 4m 21s root: The patch generated 0 new + 25 unchanged - 1 fixed = 25 total (was 26)
+1 💚 mvnsite 2m 38s the patch passed
+1 💚 javadoc 1m 51s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
-1 ❌ javadoc 0m 48s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_422-8u422-b05-120.04-b05 with JDK Private Build-1.8.0_422-8u422-b05-120.04-b05 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 spotbugs 4m 15s the patch passed
+1 💚 shadedclient 34m 58s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 48s hadoop-common in the patch passed.
+1 💚 unit 2m 57s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 58s The patch does not generate ASF License warnings.
244m 56s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6938/19/artifact/out/Dockerfile
GITHUB PR #6938
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 3ccb1076968e 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0e1207a
Default Java Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6938/19/testReport/
Max. process+thread count 1263 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6938/19/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Change-Id: I956c3eeded349291cc13feaf6eac68a1a368949a
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 58s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 20 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 7s Maven dependency ordering for branch
+1 💚 mvninstall 32m 5s trunk passed
+1 💚 compile 17m 24s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 compile 15m 51s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 checkstyle 4m 22s trunk passed
+1 💚 mvnsite 2m 42s trunk passed
+1 💚 javadoc 1m 57s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 1m 43s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 3m 57s trunk passed
+1 💚 shadedclient 34m 34s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for patch
+1 💚 mvninstall 1m 27s the patch passed
+1 💚 compile 17m 0s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javac 17m 0s the patch passed
+1 💚 compile 16m 16s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 javac 16m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 4m 20s root: The patch generated 0 new + 25 unchanged - 1 fixed = 25 total (was 26)
+1 💚 mvnsite 2m 40s the patch passed
+1 💚 javadoc 1m 54s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 1m 46s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 4m 18s the patch passed
+1 💚 shadedclient 34m 59s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 49s hadoop-common in the patch passed.
+1 💚 unit 2m 56s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 5s The patch does not generate ASF License warnings.
244m 12s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6938/20/artifact/out/Dockerfile
GITHUB PR #6938
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 842c8f4fd9b4 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a559590
Default Java Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6938/20/testReport/
Max. process+thread count 1263 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6938/20/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

*/
@InterfaceAudience.Public
@InterfaceStability.Unstable
public final class FutureIO {

private static final Logger LOG =
LoggerFactory.getLogger(TaskPool.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be FutureIO.class ?

Copy link
Contributor

@shameersss1 shameersss1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Steve for putting this together.
The changes looks good to me +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants