Segmentation fault in async_client only with the support SSL #856

ropieur · 2021-09-09T09:30:44Z

Hi,
My application crashes with a segmentation fault when I use async_client with the support of SSL.
We managed to reproduce the issue in our environment by performing a sequence of steps (which I can't explain here), and the issue does not occur without SSL. Possibly that this bug is ASIO and not MQTT.
During my investigations, I discovered a very similar issue that I wanted to share: xtreemfs/xtreemfs#281. My understanding is that the issue is related to the lifetime of the boost asio socket, so I cannot conclude whether it is really an ASIO bug or a misuse by MQTT library.

redboltz · 2021-09-09T09:33:04Z

Could you post a Minimal, Reproducible Example?
https://stackoverflow.com/help/minimal-reproducible-example

ropieur · 2021-09-09T09:40:23Z

Hi redboltz,

Thank you for your quick reaction.
I was expecting such a request ;-) I'll need to discuss internally whether I can get the budget ...

In the meantime, here is the stacktrace during the crash and you can observe that it is quite similar to the one reported in xtreemfs/xtreemfs#281. It occurs after a disconnection from the broker.

#0  0x00007f649001cee0 in ?? ()
#1  0x00007f64cfb79a4a in BIO_write () from /home/cbo/debug-dmm-on-diasimul/20210909T111110/usr/local/dmm/lib/libcrypto.so.1.1
#2  0x0000000000e9d34f in boost::asio::ssl::detail::engine::put_input(boost::asio::const_buffer const&) [clone .isra.124] ()
#3  0x0000000000f0ea39 in boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, boost::asio::ssl::detail::read_op<boost::asio::mutable_buffer>, boost::asio::detail::read_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor> >, boost::asio::mutable_buffer, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, boost::asio::executor_binder<std::function<void (boost::system::error_code, unsigned long)>, boost::asio::io_context::strand> > >::operator()(boost::system::error_code, unsigned long, int) ()
#4  0x0000000000ea0cba in void boost::asio::io_context::strand::dispatch<boost::asio::detail::binder2<boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, boost::asio::ssl::detail::read_op<boost::asio::mutable_buffer>, boost::asio::detail::read_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor> >, boost::asio::mutable_buffer, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, boost::asio::executor_binder<std::function<void (boost::system::error_code, unsigned long)>, boost::asio::io_context::strand> > >, boost::system::error_code, unsigned long>, std::allocator<void> >(boost::asio::detail::binder2<boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, boost::asio::ssl::detail::read_op<boost::asio::mutable_buffer>, boost::asio::detail::read_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor> >, boost::asio::mutable_buffer, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, boost::asio::executor_binder<std::function<void (boost::system::error_code, unsigned long)>, boost::asio::io_context::strand> > >, boost::system::error_code, unsigned long>&&, std::allocator<void> const&) const [clone .isra.6872] ()
#5  0x0000000000f1048d in boost::asio::detail::completion_handler<boost::asio::detail::binder2<boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, boost::asio::ssl::detail::read_op<boost::asio::mutable_buffer>, boost::asio::detail::read_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor> >, boost::asio::mutable_buffer, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, boost::asio::executor_binder<std::function<void (boost::system::error_code, unsigned long)>, boost::asio::io_context::strand> > >, boost::system::error_code, unsigned long>, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0u> >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) ()
#6  0x0000000000eae349 in boost::asio::detail::strand_service::do_dispatch(boost::asio::detail::strand_service::strand_impl*&, boost::asio::detail::scheduler_operation*) ()
#7  0x0000000000ea0dbc in void boost::asio::io_context::strand::dispatch<boost::asio::detail::binder2<boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, boost::asio::ssl::detail::read_op<boost::asio::mutable_buffer>, boost::asio::detail::read_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor> >, boost::asio::mutable_buffer, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, boost::asio::executor_binder<std::function<void (boost::system::error_code, unsigned long)>, boost::asio::io_context::strand> > >, boost::system::error_code, unsigned long>, std::allocator<void> >(boost::asio::detail::binder2<boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, boost::asio::ssl::detail::read_op<boost::asio::mutable_buffer>, boost::asio::detail::read_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor> >, boost::asio::mutable_buffer, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, boost::asio::executor_binder<std::function<void (boost::system::error_code, unsigned long)>, boost::asio::io_context::strand> > >, boost::system::error_code, unsigned long>&&, std::allocator<void> const&) const [clone .isra.6872] ()
#8  0x0000000000f1020f in boost::asio::detail::reactive_socket_recv_op<boost::asio::mutable_buffer, boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, boost::asio::ssl::detail::read_op<boost::asio::mutable_buffer>, boost::asio::detail::read_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor> >, boost::asio::mutable_buffer, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, boost::asio::executor_binder<std::function<void (boost::system::error_code, unsigned long)>, boost::asio::io_context::strand> > >, boost::asio::any_io_executor>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) ()
#9  0x0000000000c13ab5 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
#10 0x0000000000ea31bf in Mobility::DMM::Mqtt::Publisher::run() ()
#11 0x00000000013dd1eb in thread_proxy ()
#12 0x00007f64cf6a1f94 in start_thread (arg=0x7f6499ffb700) at pthread_create.c:309
#13 0x00007f64ce9393cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

ropieur · 2021-09-09T09:50:37Z

I quote you the interesting part of the xtreemfs ticket:

It was caused by the fact that newer boost (?) versions try to write to the socket, even after the SSL stream was shutdown. However, the socket was already deleted when the connection was closed. Usually, programs using boost::asio use a shared_ptr for the socket and are therefore not affected by such obstacles. However, we once agreed to avoid using shared_ptr and therefore do not delete the socket object before the connection is closed. Evidently, this isn't enough.

Maybe it could give you some hints

redboltz · 2021-09-09T09:52:31Z

I just looked the issue and solution. Unfortunately I don't get any useful information from that.
I use TLS and async_client combination by many application but no errors are reported. It is also a part of CI tests.

I will start some analysis when you will post a Minimal, Reproducible Example.

redboltz · 2021-09-09T09:55:57Z

Also I recommend that applying valgrind and/or address sanitizer to your application. You would get some information.

redboltz · 2021-09-09T11:35:12Z

I noticed that #78 might cause the issue.
I have an idea to improve #78 fix that is mentioned async_shutdown call.
When I send a PR for that, do you check it on your reproducible project?

ropieur · 2021-09-09T11:58:42Z

@redboltz , yes sure, I gonna give a try.

redboltz · 2021-09-09T23:08:27Z

@ropieur
Could you tell me your environment information?
OS version, OpenSSL version, Boost version, and mqtt_cpp version (it it is released version otherwise commit hash).

redboltz · 2021-09-09T23:34:56Z

Additional questions:

You said that you are using async_client. Please tell me disconnecting situation.

Which factory function is used to create async_client.
In the problem situation, disconnect from the broker or from the client ?
If from the client, which API are you using ?
Your application is multi-threaded?

Implemented clean shutdown.

ropieur · 2021-09-11T17:15:02Z

Hi Redboltz,

mqtt::make_tls_async_client
Disconnect is initiated by the broker
The client resides inside an active object (1 single thread running an io_context). All calls from other threads are posted to the io_context. There is NO direct call to the client from another thread

redboltz · 2021-09-12T00:29:11Z

Thank you.
Please answer the question #856 (comment) too.

The current master contains some fix of shutdown process.
https://github.com/redboltz/mqtt_cpp/tree/11d26fae1a6c87466a1eab02a494c89e18f84a1d

It still doesn't clean TLS layer but on TCP layer not only socket close() but also socket shutdown(). It might solve your problem.

Implemented clean shutdown.

ropieur · 2021-09-12T15:21:44Z

OS: ubuntu 20.04
BOOST 1.76 and 1.77
MQTT 8.0.0 and 11.1.0
OPENSSL 1.1.1

redboltz · 2021-09-13T00:15:37Z

Thank you.

layer	operation	implemented mqtt_cpp version
web socket	send close packet, then read until closed or error	#860
TLS	(async_)shutdown	#860
TCP	shutdown (no async version exists)	#860 `11d26fa`
TCP	close (no async version exists)	#860 `11d26fa` 11.1.0 8.0.0

Due to #78, version 11.1.0 and 8.0.0 only close TCP socket. It worked well so far. But I guess it could cause the problem you reported.
Lack of TCP shutdown, #848 was found.
The has 11d26fa (current master HEAD) has TCP shutdown. It might solve your problem.

Finally, I implemented graceful shutdown for all layers. It is #860. It is not merged yet. During implementing #860, I experienced a lot of blocking problem. I will merge it test with #860 on my proprietary project's production code.

Could you test the following two versions ?

redboltz · 2021-09-17T00:48:23Z

The PR #860 works fine on my project. I'm going to merge it.
But I haven't experienced your problem on my project. I think that #860 should be merged but I'm not sure it would solve your problem.
@ropieur , is your problem solved?

ropieur · 2021-09-17T06:40:30Z

@redboltz , we are working on it. I delivered a package internally with the branch fix_856. We need to resolve issues on our platform and then we will retest the disconnection with the broker. It can take multiple days before we get there. Anyway I'll keep you posted with the result in this ticket.

redboltz · 2021-09-17T06:45:52Z

@ropieur thank you! I will wait your feedback :)

ropieur · 2021-09-17T16:11:15Z

We managed to reproduce the crash with the branch fix_856 and I get the exact same backtrace from the coredump.
FYI, it is quite difficult to reproduce it, because the tester needs to perform a sequence of manipulations multiple times to reproduce it (because the issue does not necessarily reproduce immediately).

ropieur · 2021-09-17T16:26:14Z

In the error handler, my app typically performs the next operations:

this->tls_async_client->set_error_handler([this](const auto &error) { // triggered error = connection reset by peer
    this->tls_async_client->async_force_disconnect();
    this->steady_timer.expires_after(2s);
    this->steady_timer.async_wait([this](const auto &error) {
        if (error) return; // timer canceled
        this->tls_async_client->async_connect();
    });
});

Question:
May I invoke async_force_disconnect from the error handler? (Originally, the code invoked force_disconnect)

redboltz · 2021-09-17T23:50:03Z

set_error_handler 's callback is called from on_error() as follwos:

mqtt_cpp/include/mqtt/endpoint.hpp

Lines 5045 to 5062 in f2ee0b8

    
           bool handle_close_or_error(error_code ec) { 
        
               if (connected_) { 
        
                   if (!ec) return false; 
        
                   MQTT_LOG("mqtt_impl", trace) 
        
                       << MQTT_ADD_VALUE(address, this) 
        
                       << "handle_close_or_error call shutdown"; 
        
                   shutdown(socket()); 
        
               } 
        
               connect_requested_ = false; 
        
               clean_sub_unsub_inflight(); 
        
               if (disconnect_requested_) { 
        
                   on_close(); 
        
                   disconnect_requested_ = false; 
        
               } 
        
               else { 
        
                   if (!ec) ec = boost::system::errc::make_error_code(boost::system::errc::not_connected); 
        
                   on_error(ec);

In the line 5051, shutdown() is called internally. So you don't need to call async_force_disconnect().

So I recommend the following code;

this->tls_async_client->set_error_handler([this](const auto &error) { // triggered error = connection reset by peer

    // remove async_force_disconnect()

    this->steady_timer.expires_after(2s);
    this->steady_timer.async_wait([this](const auto &error) { // capture this's weak_ptr to avoid fire after cancel
        if (error) return; // timer canceled
        this->tls_async_client->async_connect();
    });
});

The timer could be fired with success error code after cancelled (destroyed):
https://stackoverflow.com/questions/43045192/how-to-avoid-firing-already-destroyed-boostasiodeadline-timer
It very rarely happens.
So you need to make sure the object's lifetimes in the handler.

Typecal approach is async_wait handler captures this's weak_ptr if this is shared_ptr.
See

mqtt_cpp/include/mqtt/endpoint.hpp

Lines 2265 to 2277 in f2ee0b8

    
           void async_force_disconnect( 
        
               async_handler_t func = {} 
        
           ) { 
        
               MQTT_LOG("mqtt_api", info) 
        
                   << MQTT_ADD_VALUE(address, this) 
        
                   << "async_force_disconnect"; 
        
               socket_->post( 
        
                   [this, self = this->shared_from_this(), func = force_move(func)] { 
        
                       shutdown(socket()); 
        
                       if (func) func(boost::system::errc::make_error_code(boost::system::errc::success)); 
        
                   } 
        
               ); 
        
           }

FYI:

socket_->post() is

mqtt_cpp/include/mqtt/tcp_endpoint.hpp

Lines 72 to 77 in f2ee0b8

    
           MQTT_ALWAYS_INLINE void post(std::function<void()> handler) override final { 
        
               as::post( 
        
                   strand_, 
        
                   force_move(handler) 
        
               ); 
        
           }

So, it is protected by strand. It can be called after closed.

Calling force_disconnect() in the timer handler is dangerous() on multi-threaded environment.

mqtt_cpp/include/mqtt/endpoint.hpp

Lines 1078 to 1084 in f2ee0b8

    
           void force_disconnect() { 
        
               MQTT_LOG("mqtt_api", info) 
        
                   << MQTT_ADD_VALUE(address, this) 
        
                   << "force_disconnect"; 
        
               shutdown(socket()); 
        
           }

Because timer handler's code can be worked on any threads. It is not protected by strand. So users needed to call post with strand. Now, async_force_disconnect() do it.
If the async_client want to disconnect forcibly then call async_disconnect().
Users don't need to call async_force_disconnect() in close and/or error handler.

shutdown() has the guard to avoid call twice:

mqtt_cpp/include/mqtt/endpoint.hpp

Lines 5263 to 5273 in f2ee0b8

    
           void shutdown(MQTT_NS::socket& s) { 
        
               MQTT_LOG("mqtt_impl", trace) 
        
                   << MQTT_ADD_VALUE(address, this) 
        
                   << "shutdown"; 
        
               if (shutdown_requested_) { 
        
                   MQTT_LOG("mqtt_impl", trace) 
        
                       << MQTT_ADD_VALUE(address, this) 
        
                       << "already shutdowned"; 
        
                   return; 
        
               } 
        
               shutdown_requested_ = true;

redboltz · 2021-09-17T23:58:45Z

FYI, it is quite difficult to reproduce it, because the tester needs to perform a sequence of manipulations multiple times to reproduce it (because the issue does not necessarily reproduce immediately).

This is a TLS client and server example:
https://github.com/redboltz/mqtt_cpp/blob/master/example/tls_both.cpp

This is a async client example:
https://github.com/redboltz/mqtt_cpp/blob/master/example/no_tls_async_client.cpp

This is simple connect/subscribe/publish/disconnect example.
But you can create small code as follows:

Async Client async_connect()
Server force_disconnect()
Async_Client set timer 2s and async_wait()
On timer fired, async_connect() , goto step2

It could preproduce the crash on your environment.

redboltz · 2021-09-18T01:04:51Z

We managed to reproduce the crash with the branch fix_856 and I get the exact same backtrace from the coredump.

I suspect that the memory corruption could be happened in somewhere in your application.
The backtrace doesn't contain mqtt_cpp code.

#856 (comment)

Did valgrind and/or address sanitizer report nothing?

Fixed #856.

ropieur · 2021-09-21T08:32:11Z

Hi @redboltz ,

The timer could be fired with success error code after cancelled (destroyed):
https://stackoverflow.com/questions/43045192/how-to-avoid-firing-already-destroyed-boostasiodeadline-timer
It very rarely happens.
So you need to make sure the object's lifetimes in the handler.

I'm aware about the different traps with asio, like the lifetime of the objects passed to the handlers and about the fact that cancelling a socket and timer operations may not necessarily lead to the error operation_aborted in their handler. I will still consider that the corruption resides in my application and continue investigations on my side. My apologies if I abused your time.
Thank you for your reactivity and support, really appreciated!

redboltz · 2021-09-21T08:35:07Z

Ok, if you would found the root of issue on your application side, then please close the issue.

NOTE #860 has been merged. It should be merged for clean (graceful) shutdown even if it is not a reason of the issue.

ropieur · 2021-09-23T15:51:25Z

Hi @redboltz ,
Before invoking async_connect, my app start a timer (handling connection timeout) that will drop the current connection attempt and reinitiates a new connection

    void mqtt_active_object::connect()
    {
        this->timer.expires_after(5s);
        this->timer.async_wait([this](const auto &error) {
            if (error) { // canceled
                return;
            }
            if (!this->tls_async_client->connected()) { // skip if client connected meanwhile
                // this->tls_async_client->force_disconnect(); // with mqtt_cpp v8.0
                this->tls_async_client->async_force_disconnect(); // with mqtt_cpp v11.1 or later
                this->connect();
            }
        });
        this->tls_async_client->async_connect();
    }

Based on my observations, my feeling is that the sequence of [async_]force_disconnect followed by async_connect could potentially lead to the bug. My observation is that the crash appears now almost immediately with async_force_disconnect.
Could you tell me whether this usage is formally correct?

Note: timer and async_client both run on the same io_context which itself runs in a single thread (active object)

redboltz · 2021-09-24T00:26:52Z

It is very difficult to understand. My first impression is it is tricky code.

It seems that it is different situation from

Disconnect is initiated by the broker

#856 (comment)

Anyway, if this->tls_async_client->connected() returns false, then this->tls_async_client->async_force_disconnect() and this->tls_async_client->async_connect(), run concurrently. Even if they share the same io_context and run on the one thread, it is bad.
Because mqtt_cpp can multiple async operation in the APIs. It depends on the implementation and could be changed.
I'm not sure why async_force_disconnect() is required but you can know when async_force_disconnect() is finished.
Simple passing async_force_disconnect(your_completion_handler). In the handler, you can call async_connect().

redboltz · 2021-09-24T00:48:54Z

I noticed one more thing.
On async_connect() , you should add completion handler.
See
https://github.com/redboltz/mqtt_cpp/blob/master/example/no_tls_async_client.cpp

On Async APIs, all APIs has completion handler that's type is async_handler_t.
This should be added. You can know the timing of async operation completion.
It can eliminate query function like connected().
It is async coding style.

redboltz · 2021-09-24T01:17:29Z

I noticed that after #860 merged, async_force_disconnect() handler is called before all shutdown processes are completed.
It could be wrong. I will take a look it.
But it seems that it is not related to your issue.

redboltz · 2021-09-24T13:32:04Z

I create #868. But it is not directly relate to the issue, I guess.

I considered what your code want to do. Then I assume that you want to care the case that the broker no response.
I updated your code as follows:

    void mqtt_active_object::connect()
    {
        // 1. call async_connect()
        this->tls_async_client->async_connect(
            [](mqtt::error_code ec) {
                if (ec) {
                    // 4. The ec is set by async_connect() process, or
                    //    by the result of async_force_disconnect() call
                    //    Reconnect 3s later
                    this->retry_timer.expires_after(3s);
                    this->retry_timer.async_wait(
                        [this](const auto &error) {
                            if (error) { // canceled
                                return;
                            }
                            // 5. retry (goto step1)
                            this->connect();
                        }
                    );
                }
                else {
                    // 3. If async_connect() is succesfully finished,
                    //    cancel connect_timer.
                    this->connect_timer.cancel();
                }
            }
        );
        // 2. set connect_timer to retry if broker won't response
        this->connect_timer.expires_after(5s);
        this->connect_timer.async_wait(
            [this](const auto &error) {
                if (error) { // canceled
                    return;
                }
                // 3. if connect_timer is fired, call async_force_disconnect().
                //    Callback handler is not used.
                //    During or after async_force_disconnect processing,
                //    1. async_connect() callback handler is called with failure
                this->tls_async_client->async_force_disconnect();
            }
        );
    }

The important point is avoiding multiple async_connect() call. It could make UB.
So the retry code should be in async_connect() handler.
If something wrong, async_connect() handler is called with error code.
But it could take a long time (e.g. broker is freeze and long TCP timeout is set by OS).
NOTE: I couldn't reproduce the scenario.

In order to avoid it, call async_force_disconnect() from the client side.
But you shouldn't call async_connect() for retry in the async_force_disconnect() handler.
async_force_disconnect() calling is indirectly boost async_connect() handler call.
So longer no-response is solved.
Finally, async_connect() handler is called with error code and then retry.
I recommend that setting retry timer.

ropieur · 2021-09-24T17:12:12Z

Hi @redboltz ,
Thank you for the code excerpt. I now start having a better understanding on how to play with the async client interface.
I implemented your proposed solution, but I meet another issue

async_connect succeeded (no error), but the connack_handler is never invoked (I suppose because the broker didn't ack). This leaves my client stucked, because the connect_timer has been canceled and there will be no retry connection attempt.
I decided to move the connect_timer.cancel inside the connack_handler instead, in order to prolong the timeout. The connect_timer is then triggered, which invokes async_force_disconnect. But unfortunately, this does not result in the invocation of connect_handler with error (because it was already invoked without error, I guess), nor in the invocation of error_handler (which is more surprising to me).
Do you have a link to a documentation describing in which circumstances the different handlers are invoked (handlers passed to async_* calls and handlers registered like set_error_handler, set_connack_handler ...) ? With such a documentation, I could design a robust solution according to the different error scenarii.

Thank you

redboltz · 2021-09-24T23:36:34Z

This is my observation.

TCP connection is successfully established.
TLS handshaking is successfully finished.
mqtt_cpp client sends CONNECT packet (Need checking)
broker doesn't send CONNACK packet

First, you need to check 3.

You can set a breakpoint or debug output at the following lines:

mqtt_cpp/include/mqtt/endpoint.hpp

Line 10456 in 6d26414

switch (version_) {

mqtt_cpp/include/mqtt/endpoint.hpp

Line 11472 in 6d26414

socket_->async_write(

The first line is sending CONNECT packet.
The second line is the last phase to send any packets.
Both line should be executed.

If they are executed, the problem is at the broker side.
But even if the broker has a problem, the client should work better.

So you can move connect_timer.cancel(); to connack handler as follows:

    void mqtt_active_object::connect()
    {
        // 1. call async_connect()
        this->tls_async_client->async_connect(
            [](mqtt::error_code ec) {
                if (ec) {
                    // 4. The ec is set by async_connect() process, or
                    //    by the result of async_force_disconnect() call
                    //    Reconnect 3s later
                    this->retry_timer.expires_after(3s);
                    this->retry_timer.async_wait(
                        [this](const auto &error) {
                            if (error) { // canceled
                                return;
                            }
                            // 5. retry (goto step1)
                            this->connect();
                        }
                    );
                }
            }
        );

        this->tls_async_client->set_connack_handler( // if you use v5 then set_v5_connack_handler
            [&]
            (bool sp, MQTT_NS::connect_return_code connack_return_code){ 
                // 3. All underlying connection (e.g.TCP, TLS) connected, and then MQTT connected
                this->connect_timer.cancel();
        }

        // 2. set connect_timer to retry if broker won't response
        this->connect_timer.expires_after(5s);
        this->connect_timer.async_wait(
            [this](const auto &error) {
                if (error) { // canceled
                    return;
                }
                // 3. if connect_timer is fired, call async_force_disconnect().
                //    Callback handler is not used.
                //    During or after async_force_disconnect processing,
                //    1. async_connect() callback handler is called with failure
                this->tls_async_client->async_force_disconnect();
            }
        );
    }

Do you have a link to a documentation describing in which circumstances the different handlers are invoked (handlers passed to async_* calls and handlers registered like set_error_handler, set_connack_handler ...) ? With such a documentation, I could design a robust solution according to the different error scenarii.

See https://github.com/redboltz/mqtt_cpp/wiki/Sync-vs-Async#connect-error-handling-1

In this case, step 1 to 4 finished successfully. But step5 isn't passed. No error is happened.
So the connect_timer should care step1 to step5.

If my updated code would solve your issue, then I will add the document to the wiki. It could be one of the best practice for connection sequence.

redboltz · 2021-09-25T11:23:31Z

I created #869 (merged). The test broker supports no CONNACK response emulation mode. And added test using the mode.
The PR also fixed broker side code. (invalid connack call)
The broker side fix is not related to the issue because you are not using mqtt_cpp example broker.
The PR doesn't contain the client side modification.

After async_connack()'s acallback is called with success error code, but the broker doesn't send CONNACK packet, nothing happens.
So the client side needs to set a timer. When the timer is fired, call async_force_disconnect().
It does shutdown process on the client side. As the result of that process, error handler (set by set_error_handler()) will be called.

So you can write the reconnect code in the error handler. I guess that it has already been written to reconnect after MQTT connection is established.

Also I merged #870.
https://github.com/redboltz/mqtt_cpp/blob/master/example/no_tls_async_client_reconnect.cpp demonstrates how to implement practical reconnecting logic.

ropieur · 2021-09-27T10:54:17Z

Hi @redboltz ,
I finally fully fixed based on your suggestions/modifications. For sure, I didn't modify other parts of my app in order to fix. So, this is certainly due to a misuse of the async client. Difficult to me to tell what exactly leads to the crash, but might be due to consecutive calls to async_connect. Anyway, happy that it is now completely fixed on my side.
In order to deploy my fix, I would need an official tag (v12? v11.2?) of mqtt_cpp with your latest modifications. Would it be possible?
Thank you

redboltz · 2021-09-27T13:38:21Z

Now v12.0.0 has been released.
If your problem is solved, please close the issue.

redboltz added a commit that referenced this issue Sep 10, 2021

Fixed #856.

f8e0023

Implemented clean shutdown.

redboltz added a commit that referenced this issue Sep 10, 2021

Fixed #856.

ddf22e8

Implemented clean shutdown.

redboltz added a commit that referenced this issue Sep 11, 2021

Fixed #856.

463b40d

Implemented clean shutdown.

redboltz added a commit that referenced this issue Sep 11, 2021

Fixed #856.

d7ee9a2

Implemented clean shutdown.

redboltz added a commit that referenced this issue Sep 11, 2021

Fixed #856.

db770dc

Implemented clean shutdown.

redboltz added a commit that referenced this issue Sep 11, 2021

Fixed #856.

458ee1e

Implemented clean shutdown.

redboltz added a commit that referenced this issue Sep 11, 2021

Fixed #856.

ccbbd09

Implemented clean shutdown.

redboltz added a commit that referenced this issue Sep 12, 2021

Fixed #856.

015917f

Implemented clean shutdown.

redboltz closed this as completed in f2ee0b8 Sep 20, 2021

redboltz added a commit that referenced this issue Sep 20, 2021

Merge pull request #860 from redboltz/fix_856

2060721

Fixed #856.

redboltz reopened this Sep 20, 2021

ropieur closed this as completed Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault in async_client only with the support SSL #856

Segmentation fault in async_client only with the support SSL #856

ropieur commented Sep 9, 2021

redboltz commented Sep 9, 2021

ropieur commented Sep 9, 2021

ropieur commented Sep 9, 2021

redboltz commented Sep 9, 2021 •

edited

Loading

redboltz commented Sep 9, 2021

redboltz commented Sep 9, 2021

ropieur commented Sep 9, 2021

redboltz commented Sep 9, 2021 •

edited

Loading

redboltz commented Sep 9, 2021 •

edited

Loading

ropieur commented Sep 11, 2021 •

edited

Loading

redboltz commented Sep 12, 2021

ropieur commented Sep 12, 2021

redboltz commented Sep 13, 2021 •

edited

Loading

redboltz commented Sep 17, 2021

ropieur commented Sep 17, 2021 •

edited

Loading

redboltz commented Sep 17, 2021

ropieur commented Sep 17, 2021

ropieur commented Sep 17, 2021

redboltz commented Sep 17, 2021 •

edited

Loading

redboltz commented Sep 17, 2021

redboltz commented Sep 18, 2021

ropieur commented Sep 21, 2021

redboltz commented Sep 21, 2021

ropieur commented Sep 23, 2021 •

edited

Loading

redboltz commented Sep 24, 2021

redboltz commented Sep 24, 2021

redboltz commented Sep 24, 2021

redboltz commented Sep 24, 2021

ropieur commented Sep 24, 2021 •

edited

Loading

redboltz commented Sep 24, 2021 •

edited

Loading

redboltz commented Sep 25, 2021

ropieur commented Sep 27, 2021 •

edited

Loading

redboltz commented Sep 27, 2021

Segmentation fault in async_client only with the support SSL #856

Segmentation fault in async_client only with the support SSL #856

Comments

ropieur commented Sep 9, 2021

redboltz commented Sep 9, 2021

ropieur commented Sep 9, 2021

ropieur commented Sep 9, 2021

redboltz commented Sep 9, 2021 • edited Loading

redboltz commented Sep 9, 2021

redboltz commented Sep 9, 2021

ropieur commented Sep 9, 2021

redboltz commented Sep 9, 2021 • edited Loading

redboltz commented Sep 9, 2021 • edited Loading

ropieur commented Sep 11, 2021 • edited Loading

redboltz commented Sep 12, 2021

ropieur commented Sep 12, 2021

redboltz commented Sep 13, 2021 • edited Loading

redboltz commented Sep 17, 2021

ropieur commented Sep 17, 2021 • edited Loading

redboltz commented Sep 17, 2021

ropieur commented Sep 17, 2021

ropieur commented Sep 17, 2021

redboltz commented Sep 17, 2021 • edited Loading

redboltz commented Sep 17, 2021

redboltz commented Sep 18, 2021

ropieur commented Sep 21, 2021

redboltz commented Sep 21, 2021

ropieur commented Sep 23, 2021 • edited Loading

redboltz commented Sep 24, 2021

redboltz commented Sep 24, 2021

redboltz commented Sep 24, 2021

redboltz commented Sep 24, 2021

ropieur commented Sep 24, 2021 • edited Loading

redboltz commented Sep 24, 2021 • edited Loading

redboltz commented Sep 25, 2021

ropieur commented Sep 27, 2021 • edited Loading

redboltz commented Sep 27, 2021

redboltz commented Sep 9, 2021 •

edited

Loading

redboltz commented Sep 9, 2021 •

edited

Loading

redboltz commented Sep 9, 2021 •

edited

Loading

ropieur commented Sep 11, 2021 •

edited

Loading

redboltz commented Sep 13, 2021 •

edited

Loading

ropieur commented Sep 17, 2021 •

edited

Loading

redboltz commented Sep 17, 2021 •

edited

Loading

ropieur commented Sep 23, 2021 •

edited

Loading

ropieur commented Sep 24, 2021 •

edited

Loading

redboltz commented Sep 24, 2021 •

edited

Loading

ropieur commented Sep 27, 2021 •

edited

Loading