Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cancellation to initial socket connection #1428

Merged
merged 1 commit into from
Dec 14, 2023

Conversation

lukebakken
Copy link
Contributor

@lukebakken lukebakken commented Nov 22, 2023

Part of the fix to #1420

@lukebakken lukebakken added this to the 7.0.0 milestone Nov 22, 2023
@lukebakken lukebakken self-assigned this Nov 22, 2023
Base automatically changed from rabbitmq-dotnet-client-1425 to main November 22, 2023 18:03
@lukebakken lukebakken force-pushed the rabbitmq-dotnet-client-1420 branch 2 times, most recently from 8c68cf8 to 7e14222 Compare November 23, 2023 14:47
@lukebakken
Copy link
Contributor Author

lukebakken commented Dec 7, 2023

Thanks @danielmarbach!

cc @paulomorgado @bording (in case y'all have an idea)

I'm busy trying to figure out the following exception (from some of the recent workflow runs). The exception normally would indicate that RabbitMQ closed the connection abruptly, but the RabbitMQ logs state the following:

2023-12-06 22:12:01.703000+00:00 [warning] <0.54213.0> closing AMQP connection <0.54213.0> (127.0.0.1:56730 -> 127.0.0.1:5672 - I.TestInitialConnection.TestBasicConnectionRecoveryWithHostnameList:2023-12-06T22:11:27.4855406Z:16, vhost: '/', user: 'guest'):
2023-12-06 22:12:01.703000+00:00 [warning] <0.54213.0> client unexpectedly closed TCP connection

This error never happens on my local workstation, and only happens with .NET Framework 4.7.2.

This error (socket error 10054) has cropped up over the years with this client library and RabbitMQ with no clear reason why.

Exception Stacktrace
Failed Test.Integration.TestInitialConnection.TestBasicConnectionRecoveryWithHostnameList [22 s]
  Error Message:
   RabbitMQ.Client.Exceptions.BrokerUnreachableException : None of the specified endpoints were reachable
---- System.IO.IOException : Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host.
-------- System.Net.Sockets.SocketException : An existing connection was forcibly closed by the remote host
  Stack Trace:
     at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\api\ConnectionFactory.cs:line 692
   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEnumerable`1 hostnames, String clientProvidedName) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\api\ConnectionFactory.cs:line 537
   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEnumerable`1 hostnames) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\api\ConnectionFactory.cs:line 490
   at Test.IntegrationFixtureBase.CreateAutorecoveringConnection(IEnumerable`1 hostnames, TimeSpan requestedConnectionTimeout, TimeSpan networkRecoveryInterval) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\Test\Common\IntegrationFixtureBase.cs:line 214
   at Test.IntegrationFixtureBase.CreateAutorecoveringConnection(IList`1 hostnames) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\Test\Common\IntegrationFixtureBase.cs:line 202
   at Test.Integration.TestInitialConnection.TestBasicConnectionRecoveryWithHostnameList() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\Test\Integration\TestInitialConnection.cs:line 49
----- Inner Stack Trace -----
   at System.Net.Sockets.NetworkStream.BeginWrite(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)
   at System.IO.Stream.<>c.<BeginEndWriteAsync>b__53_0(Stream stream, ReadWriteParameters args, AsyncCallback callback, Object state)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncTrim[TInstance,TArgs](TInstance thisRef, TArgs args, Func`5 beginMethod, Func`3 endMethod)
   at System.IO.Stream.BeginEndWriteAsync(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.Stream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
   at System.IO.Pipelines.StreamExtensions.WriteAsync(Stream stream, ReadOnlyMemory`1 buffer, CancellationToken cancellationToken)
   at System.IO.Pipelines.StreamPipeWriter.<FlushAsyncInternal>d__39.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()
   at RabbitMQ.Client.Impl.SocketFrameHandler.<SendProtocolHeaderAsync>d__37.MoveNext() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\SocketFrameHandler.cs:line 293
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()
   at RabbitMQ.Client.Framing.Impl.Connection.<StartAndTuneAsync>d__4.MoveNext() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\Connection.Commands.cs:line 90
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable.ConfiguredValueTaskAwaiter.GetResult()
   at RabbitMQ.Client.Framing.Impl.Connection.<OpenAsync>d__89.MoveNext() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\Connection.cs:line 233
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at RabbitMQ.Client.Framing.Impl.Connection.<OpenAsync>d__89.MoveNext() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\Connection.cs:line 252
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at RabbitMQ.Client.TaskExtensions.EnsureCompleted[T](ValueTask`1 task) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\TaskExtensions.cs:line 158
   at RabbitMQ.Client.Framing.Impl.Connection.Open() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\Connection.cs:line 217
   at RabbitMQ.Client.Framing.Impl.AutorecoveringConnection.Open() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\AutorecoveringConnection.cs:line 72
   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\api\ConnectionFactory.cs:line 680
----- Inner Stack Trace -----
   at System.Net.Sockets.Socket.BeginSend(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, AsyncCallback callback, Object state)
   at System.Net.Sockets.NetworkStream.BeginWrite(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)

@paulomorgado
Copy link
Contributor

Sorry. Networking is not really one of my expertises.

This might be related to some networking equipment/software between the client and the server.

I don't know if there are some ETW events you can collect to investigate.

You may find interesting tools here: https://www.youtube.com/watch?v=wf2M0GNQpWI

@lukebakken
Copy link
Contributor Author

@paulomorgado thanks for taking a look.

This might be related to some networking equipment/software between the client and the server

Yep, normally I would tell a RabbitMQ user "your firewall or load balancer is acting up" but in the case of GitHub Actions and Windows, RabbitMQ and the tests are running on the same machine, communicating via the loopback adapter.

@bording
Copy link
Collaborator

bording commented Dec 7, 2023

I'm pretty sure I have observed the same problem occasionally, but I also don't have any insights to offer as to the cause.

@paulomorgado
Copy link
Contributor

Is this something that can be forced to happen?

@lukebakken
Copy link
Contributor Author

Is this something that can be forced to happen?

I sure wish it were!

@lukebakken lukebakken force-pushed the rabbitmq-dotnet-client-1420 branch 2 times, most recently from 81b4128 to f62dcda Compare December 11, 2023 15:45
@lukebakken lukebakken marked this pull request as ready for review December 11, 2023 16:50
Part of the fix to #1420

Use a linked token source to combine external cancellation and timeout cancelltion for initial RMQ connection

Fix API since we have to use the same for all target frameworks.

Fix the order of operations when establishing a connection.

Add missing ConfigureAwait

Add some more missing ConfigureAwait(false) statements.

A timeout of 1 second can result in OperationCanceledException

Validate hostname in SocketFrameHandler to restore expectation that an exception will be thrown. Ugh.

Increase connection timeout for CI

Add helpful message to connection failure.

Fix tests in a very ugly way. Much TODO

Make endpoint resolver SelectOne async.

Pass on CancellationToken

Pass cancellation token into more connection establishment methods.

Fix expected exception for net472

Use Async method otherwise GetAwaiter/GetResult locks up

Change expected exception type

Increase connection timeout for CI due to slow runners

Try out idea to fix odd CI error in GHA

Bump version

Upgrade dependencies.

Modify Windows GHA setup to explicitly create firewall rules for Erlang programs.

Tweak setting up firewalls on Windows

Use Get-Random to help ensure unique fw rule names
@lukebakken
Copy link
Contributor Author

lukebakken commented Dec 11, 2023

@stebet @danielmarbach thanks again for taking a look. I realize this isn't a complete use of cancellation tokens, but is a good start. I'd like to merge this soon and produce another alpha release. Community users are using and testing out version 7 alpha releases already.

@lukebakken
Copy link
Contributor Author

Oh, and magically it seems like that socket error is less likely to happen 🤷‍♂️

@lukebakken
Copy link
Contributor Author

No comments in a couple days, merging and producing a new 7.0 alpha!

@lukebakken lukebakken merged commit 3794b4c into main Dec 14, 2023
11 checks passed
@lukebakken lukebakken deleted the rabbitmq-dotnet-client-1420 branch December 14, 2023 13:59
@danielmarbach
Copy link
Collaborator

I'm happy to give feedback but sometimes I just can't do that in a matter of days and especially not before holiday season

@lukebakken
Copy link
Contributor Author

I'm happy to give feedback but sometimes I just can't do that in a matter of days and especially not before holiday season

No worries @danielmarbach. All your feedback is appreciated. I just want to keep the ball rolling on version 7 with new alpha releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants