-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The callbacks of subscribers for records are executed in different threads #679
Comments
Currently, all the processing of each callback is synchronized at the writer level (to protect some stl collections and splitting logic) and the storage level (to protect some stl collections and db access due to async writes). The writer level lock means that the all the operations that can be executed in parallel are limited to the body of lambda function in What is important though is to minimize the synchronized time in It would be good to profile afterwards, all the operations before a message is written to the cache - and see where to optimize next. The spinning itself might benefit from multi-threading and it is worth checking. |
Thanks for your reply.
Yeah. should_split_bagfile() will call disk I/O operation to get size of database for each callback. Currently this point should be considered for improvement. I will further consider it. |
@Barry-Xu-2018 when considering changes in threading model, pay attention to all stl containers (such as map of topics in the SequentialWriter) and all unsynchronized non-atomics in general. There are various race conditions to avoid. Benchmarking packages are good to test for the race conditions once you go through code analysis and feel pretty confident it should be ok. |
btw, do you have already plan to support MultiThreadedExecutor in milestone at this moment? |
Not at this moment (we finished our project). |
okay, thanks! |
FYI. I modify code to use MultiThreadedExecutor without considering split bag.
According to test result, current cache mechanism doesn't take pronounced effect while recording much more instances (topics). If use MultiThreadedExecutor, it has a little effect. About Modification for MultiThreadedExecutor, refer to Barry-Xu-2018@c1bb25b Next, I will change code to support split bag. Of course, #647 is important. Disk I/O will lead to hold the lock for long time. Other threads have to wait the lock. |
Now support split mode and test was finished. While split mode is enabled, many codes have to be process under lock. So this is not good for multiple threads. Test condition:
Test result:
Based on the above result, there is a little improvement for performance. And cache almost doesn't affect the result. Next, I check the effect for a few instances. 10 instance (topics), message size is 100 KB. Other conditions are the same as the above test. Test result:
Current implementation is Barry-Xu-2018@18fdeb2. Now, the maximum number of thread is set to 20 and cannot be changed by the user. About the user interface to change this, it needs to be discussed. |
thanks for the information. i see that multi-thread does not improve a lot with current cache mechanism. i am not sure why this cache mechanism cannot be in the plugin part, does it have to be in generic part? (if anyone knows this, please enlighten me 😄 ) if there is not a requirement about that, why we would move buffer/cache mechanism into sqlite3 implementation, so that we can keep the generic part simple and plugin can be responsible for performance. i think that would be more room for plugin implementation can manage and differentiate for the use cases. on the other hand, this could be kinda surgery, so probably it would take a while... |
Yes. While using multi-thread, the current cache mechanism cannot take the effect. For the transport layer, it should only take charge of calling writing interface which is implemented in storage plugin. Different storage plugins use a suitable cache mechanism according to database feature. Current implementation also provide to disable cache mechanism by setting cache size as 0. |
|
Description
According to current codes, the current implementation likes below (If not right, please help to correct me)
Currently, these callbacks are executed in one thread in turn.
rosbag2/rosbag2_transport/src/rosbag2_transport/recorder.cpp
Lines 212 to 215 in b64cf74
If there are many topics to be recorded, maybe one thread processing affects the performance.
Could we change to use
MultiThreadedExecutor
?Of course, this way may lead to the bottleneck at one cache buffer.
The text was updated successfully, but these errors were encountered: