Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/1.1 header parser received no bytes with k8s and wave #3983

Closed
rulixxx opened this issue May 29, 2023 · 6 comments
Closed

HTTP/1.1 header parser received no bytes with k8s and wave #3983

rulixxx opened this issue May 29, 2023 · 6 comments

Comments

@rulixxx
Copy link

rulixxx commented May 29, 2023

Bug report

On two of my pipelines running on EKS/wave/fusion I keep getting a HTTP/1.1 header parser received no bytes error that kills the pipeline at different stages of the workflows (non-deterministic). Workflows were working OK until Friday last week.

Expected behavior and actual behavior

Pipeline can run processes and retry on errors.

Error is killing entire workflow with a HTTP/1.1 header parser received no bytes error.

Steps to reproduce the problem

Its a bit difficult to reproduce as pipeline can run OK for about 1-3 hrs but then fail with the HTTP/1.1 header parser received no bytes error. Its affecting other workflows so suspect is wave or fusion that might causing problems.

Environment

  • Nextflow version: 22.10.7
  • Java version: 17.0.5
  • Operating system: AWS/EKS/wave/fusion

abbreviated log

~> TaskHandler[id: 547; name: PARALLEL:ADD_RB (ELIB034); status: RUNNING; exit: -; error: -; workDir: s3://fl86-pipeline-research/user/ralcantara/work/M-HBV-BULK-NANOWES-23-02/NanoSeq/4d/242e4ddde463e3f4f95ab756b7860b]
~> TaskHandler[id: 558; name: PARALLEL:NEAT (ELIB034); status: RUNNING; exit: -; error: -; workDir: s3://fl86-pipeline-research/user/ralcantara/work/M-HBV-BULK-NANOWES-23-02/NanoSeq/45/694fa977dacc9ce57c089df10ea337]
.. remaining tasks omitted.
May-29 21:20:47.165 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 558; name: PARALLEL:NEAT (ELIB034); status: COMPLETED; exit: 0; error: -; workDir: s3://fl86-pipeline-research/user/ralcantara/work/M-HBV-BULK-NANOWES-23-02/NanoSeq/45/694fa977dacc9ce57c089df10ea337]
May-29 21:20:47.225 [Actor Thread 112] DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=[name:ELIB034] size=13
May-29 21:22:27.156 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 541; name: PARALLEL:ADD_RB (ELIB034); status: COMPLETED; exit: 0; error: -; workDir: s3://fl86-pipeline-research/user/ralcantara/work/M-HBV-BULK-NANOWES-23-02/NanoSeq/76/6cab5c6ea88ea14f127e6a84d4d591]
May-29 21:22:27.219 [Actor Thread 106] DEBUG nextflow.extension.GroupTupleOp - GroupTuple dynamic size: key=[name:ELIB034] size=13
May-29 21:22:27.234 [Actor Thread 106] DEBUG io.seqera.wave.plugin.WaveClient - Wave request: https://wave.seqera.io/container-token; attempt=1 - request: SubmitContainerTokenRequest(towerAccessToken:eyJ0aWQiOiA3NDQ0fS5mNmYyZjIyOGNkOGFiYWZhMmYwMWU4NjBjNTlkZWI1MmFkZDU4MDNi, towerRefreshToken:null, towerWorkspaceId:263979779054039, towerEndpoint:https://api.tower.nf, containerImage:quay.io/rulixxx/nanoseq:3.3.2, containerFile:null, containerConfig:ContainerConfig(entrypoint:[/opt/fusion/entrypoint.sh], cmd:null, env:null, workingDir:null, layers:[ContainerLayer[location=https://fusionfs.seqera.io/releases/pkg/0/6/5/fusionfs-amd64.tar.gz; tarDigest=sha256:75eb1a83c6458cb578f74ead02bc4550bd8b751c71629f6d6f05d5fc9b615267; gzipDigest=sha256:61a09b63b9e79eb11a0a102f7a13f47ed5c18bb5c45f2873d8cd391d67adc887; gzipSize=6192047]]), condaFile:null, containerPlatform:null, buildRepository:null, cacheRepository:null, timestamp:2023-05-29T21:22:27.234658728Z, fingerprint:62c7c73e12525e02401d902f4cf9fde7)
May-29 21:22:27.249 [Actor Thread 106] ERROR nextflow.processor.TaskProcessor - Error executing process > 'PARALLEL:NEAT (ELIB034)'

Caused by:
  HTTP/1.1 header parser received no bytes

java.util.concurrent.ExecutionException: java.io.IOException: HTTP/1.1 header parser received no bytes
	at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588)
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:547)
	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:113)
	at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:240)
	at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2317)
	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2283)
	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159)
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049)
	at com.google.common.cache.LocalCache.get(LocalCache.java:3966)
	at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863)
	at io.seqera.wave.plugin.WaveClient.fetchContainerImage(WaveClient.groovy:368)
	at io.seqera.wave.plugin.resolver.WaveContainerResolver.waveContainer(WaveContainerResolver.groovy:109)
	at io.seqera.wave.plugin.resolver.WaveContainerResolver.resolveImage(WaveContainerResolver.groovy:71)
	at jdk.internal.reflect.GeneratedMethodAccessor321.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSite.invoke(PogoMetaMethodSite.java:170)
	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:73)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148)
	at nextflow.processor.TaskRun.memoizedMethodPriv$getContainerInfo0(TaskRun.groovy:600)
	at jdk.internal.reflect.GeneratedMethodAccessor319.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:362)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:61)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:176)
	at nextflow.processor.TaskRun$_closure3.doCall(TaskRun.groovy)
	at nextflow.processor.TaskRun$_closure3.doCall(TaskRun.groovy)
	at jdk.internal.reflect.GeneratedMethodAccessor318.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.lambda$call$0(Memoize.java:137)
	at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:137)
	at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:113)
	at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.call(Memoize.java:136)
	at groovy.lang.Closure.call(Closure.java:406)
	at java_util_concurrent_Callable$call.call(Unknown Source)
	at nextflow.processor.TaskRun.getContainerInfo0(TaskRun.groovy)
	at nextflow.processor.TaskRun.getContainer(TaskRun.groovy:608)
	at nextflow.processor.TaskRun.isContainerEnabled(TaskRun.groovy:641)
	at nextflow.processor.TaskRun$isContainerEnabled$10.call(Unknown Source)
	at nextflow.processor.TaskProcessor.createTaskHashKey(TaskProcessor.groovy:2013)
	at jdk.internal.reflect.GeneratedMethodAccessor317.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:193)
	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:61)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185)
	at nextflow.processor.TaskProcessor.invokeTask(TaskProcessor.groovy:599)
	at nextflow.processor.InvokeTaskAdapter.call(InvokeTaskAdapter.groovy:53)
	at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
	at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor.access$001(ForkingDataflowOperatorActor.java:35)
	at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor$1.run(ForkingDataflowOperatorActor.java:58)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes
	at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:586)
	at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:123)
	at io.seqera.wave.plugin.WaveClient.sendRequest0(WaveClient.groovy:190)
	at io.seqera.wave.plugin.WaveClient.sendRequest(WaveClient.groovy:165)
	at io.seqera.wave.plugin.WaveClient.sendRequest(WaveClient.groovy:150)
	at io.seqera.wave.plugin.WaveClient$_fetchContainerImage_closure4.doCall(WaveClient.groovy:368)
	at io.seqera.wave.plugin.WaveClient$_fetchContainerImage_closure4.call(WaveClient.groovy)
	at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4868)
	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3533)
	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2282)
	... 59 common frames omitted
Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes
	at java.net.http/jdk.internal.net.http.common.Utils.wrapWithExtraDetail(Utils.java:348)
	at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:675)
	at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.checkForErrors(Http1AsyncReceiver.java:302)
	at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:268)
	at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
	at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
	at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
	... 3 common frames omitted
Caused by: java.net.SocketException: Connection reset
	at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394)
	at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426)
	at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1170)
	at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:833)
	at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:181)
	at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
	at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:303)
	at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:256)
	at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:774)
	at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:957)
	at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:253)
	at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:979)
	at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:934)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
@rulixxx
Copy link
Author

rulixxx commented May 29, 2023

Full log
nextflow.log

@pditommaso
Copy link
Member

A retry policy has been added to better handle this 👉 1daebee

@pditommaso pditommaso added this to the 23.10.0 milestone May 30, 2023
@pditommaso
Copy link
Member

It looks like the the cause of this error is a bug in the Http client on Java 17. It should be solved by Java 17.0.7

@rulixxx
Copy link
Author

rulixxx commented Jun 11, 2023

Thanks for looking into this Paolo. I started trying the 23.04.02 release as soon as it came out and thus far it's been looking good...

@pditommaso
Copy link
Member

pditommaso commented Sep 1, 2023

More evidence suggests the cause of this issue is the excessive timeout of the connection pool used by the Java HTTP client that by default is 1200 seconds (20 mins!). Changing it to 10 seconds via the following setting it looks like to address the problem

-Djdk.httpclient.keepalive.timeout=10

@pditommaso
Copy link
Member

Solved by fa961e7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants