Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading reactor.netty.http.client.PrematureCloseException: Connection prematurely closed BEFORE response #2825

Closed
himanshunp opened this issue Jun 8, 2023 · 2 comments
Assignees
Labels
type/bug A general bug
Milestone

Comments

@himanshunp
Copy link

himanshunp commented Jun 8, 2023

When we use spring boot's web client with netty connection provider or http client and call http POST with very large JSON payload, it throws error reactor.netty.http.client.PrematureCloseException: Connection prematurely closed BEFORE response.
But actually this is misleading instead of propagating real http 400 BAD_REQUEST. I went into rabbit hole of this guide: https://projectreactor.io/docs/netty/release/reference/index.html#faq.connection-closed and still could not figure out until I kept increasing load on the REST api.

I was able to reproduce this locally running spring boot app without any load balancer or proxy in between.

Expected Behavior

client should propagate actual http 400 BAD REQUEST error instead of masking with PrematureCloseException.

Actual Behavior

throws error reactor.netty.http.client.PrematureCloseException: Connection prematurely closed BEFORE response

Steps to Reproduce

Run the unit test:
https://github.com/himanshunp/spring-web-client/blob/master/src/test/java/com/parmar/himanshu/spring/webclient/ControllerTest.java#L37
If you look at the three parameters, the last one tests with very large json payload and reproduces above error. There should be a better way to capture http 400 and response body.

Possible Solution

I was able to track down this by using X-Request-ID header. The documentation should add this as a debugging step.

Your Environment

  • Reactor version(s) used: 3.5.6
  • Other relevant libraries versions (eg. netty, ...): reactor-netty-core: 1.1.7
  • JVM version (java -version): 20
  • OS and version (eg. uname -a): macOS Monterey 12.6.6 Darwin himanshus-mbp.lan 21.6.0 Darwin Kernel Version 21.6.0: Mon Apr 24 21:10:53 PDT 2023; root:xnu-8020.240.18.701.5~1/RELEASE_X86_64 x86_64
@himanshunp himanshunp added status/need-triage A new issue that still need to be evaluated as a whole type/bug A general bug labels Jun 8, 2023
@violetagg violetagg removed the status/need-triage A new issue that still need to be evaluated as a whole label Jun 9, 2023
@violetagg violetagg self-assigned this Jun 9, 2023
@pderop pderop assigned pderop and unassigned violetagg Jul 13, 2023
@pderop pderop added this to the 1.0.35 milestone Jul 27, 2023
@pderop
Copy link
Contributor

pderop commented Jul 27, 2023

Hi @himanshunp ,

thank you for reporting and for the nice reproducer project.

The PR #2864 may alleviate the issue. I have targeted it to 1.0.x branch.
please, expect some times before it can be reviewed, validated and merged into the main 1.1.x branch.

Now, as a work around, in your reproducer, I think you can use Flux instead of Mono when sending the body, that might avoid the issue.
For example, in SvnClient.callWithWebclientWithConnectionPool method, you could do this change:

  public Integer callWithWebclientWithConnectionPool(String payload) {
    log.info("calling webclient with connection pool");
    SampleRequest req = new SampleRequest(UUID.randomUUID().toString(), payload);
    return svcWebClientWithConnectionPool
        .post()
        .uri("/payload-size")
        //.body(Mono.just(req), SampleRequest.class)
        .body(Flux.just(req), SampleRequest.class)
        .retrieve()
        .bodyToMono(Integer.class)
        .block();
  }

that might fix the problem because when using body(Mono), there was a missing read() that the PR has corrected.
However, notice that in some situation, even if we do the read, and if the early 400 response comes in multiple TCP frames (like 1st with the headers, 2nd with the body, and the 3rd with the last zero-length chunk), then since the connection is reset (TCP/RST), the client might miss the last zero-length chunk and then you might still get a premature close exception during response (in such situation, even curl is sometimes unable to get the 400 bad request).

thanks.

@himanshunp
Copy link
Author

Thank you so much @pderop for such a detailed information. Really appreciate it. Can’t wait for new release to come out.

@violetagg violetagg modified the milestones: 1.0.35, 1.0.36 Aug 9, 2023
@violetagg violetagg modified the milestones: 1.0.36, 1.0.37 Sep 4, 2023
@pderop pderop closed this as completed in 50b24b3 Sep 15, 2023
pderop added a commit that referenced this issue Sep 22, 2023
Fixed flaky test in HttpClientWithTomcatTest.testIssue2825.

To reproduce the issue, we need Tomcat to close the connection while the client is still writing. This flakiness occurs because Tomcat closes the connection without reading all remaining data. Depending on the unread data’s size, it may result in TCP sending a TCP/RST instead of a FIN. When the client receives TCP/RST, some or all unread data may be dropped.

So, the socket send buffer size in HttpClient has been reduced, which eliminated the flakiness of the test and most of TCP/RST. Additionally, returning a 400 bad request without chunk encoding reduces the chance of losing data, as it sends only one TCP segment (compared to two segments with chunk encoding). These workarounds seem to fix the instability of the test, and if the patch is disabled, the PrematureCloseException reliably reoccurs with the test. I also removed the retries, the tests are running in around 1,5-2 seconds.

The test for the case when HttpClient sends the request using Flux has been removed, because it seems unstable, and maybe it's a different problem, which must be addressed in a different issue.

Related to #2864 #2825
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug A general bug
Projects
None yet
Development

No branches or pull requests

3 participants