Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[14681] Fixing thread sanitizer issue 4169 <master> #2717

Merged
merged 23 commits into from
Jun 13, 2022

Conversation

MiguelBarro
Copy link
Contributor

@MiguelBarro MiguelBarro commented Jun 2, 2022

Description

Review and testing against intermediate branch #2704

This pull request addresses the following family of deadlocks A => B => C => A where:

  • A RecursiveTimedMutex& Endpoint::getMutex() Endpoint is EDP DataWriter in Participant 2.
  • B Participant 1. std::recursive_mutex RTPSParticipantImpl::getParticipantMutex()
  • C Participant 2. std::recursive_mutex RTPSParticipantImpl::getParticipantMutex()

It involves a test case where two participants are destroyed from the Main Thread.
The execution sequence that leads to deadlocks:

  • A=>B Flow controller thread for EDP DataWriter in Participant 2.
    EDP DataWriter sending messages through its flow controller, that is, flow controller's asynchronous thread
    (FlowControllerImpl::run()). The asynchronous thread takes the DataWriter mutex (A).
    Later it delivers the sample via intraprocess to EDPDataReader in participant 2 which takes the participant
    mutex (B) to traverse endpoint collections and do matching.

  • B=>C Main Thread.
    On test destruction comes participant 1 removal.
    Participant 1 calls RTPSParticipantImpl::disable() which takes mutex B to traverse it's endpoint list for
    removal. Here the EDP demise notifications go via intraprocess and ends up in EDP::unpairWriterProxy()
    call that requires C (participant 2 mutex) for unmatching.

  • C=>A Main Thread
    On test destruction comes participant 2 removal.
    Participant 2 calls RTPSParticipantImpl::disable() which takes mutex C to traverse it's endpoint list for
    removal. In order to send DATA[U]R message it requires mutex (A) to allocate a new CacheChange (single mutex for
    history and endpoint).

In order to solve the deadlock, the participant's endpoint collections will no longer be protected by the general
participant mutex but by a devoted one (this was partially done before but the refactor extends it to all collections).

    mutable shared_mutex endpoints_list_mutex;

    std::vector<RTPSWriter*> m_allWriterList;
    std::vector<RTPSReader*> m_allReaderList;
    std::vector<RTPSWriter*> m_userWriterList;
    std::vector<RTPSReader*> m_userReaderList;

Other changes added in this pull request:

  • In order to prevent deadlocks with the temp_data_lock mutex (devoted to protect EDP temporary proxies access) a new proxy pool is created in EDP. The proxies are then synchronized via ownership instead of mutual exclusion.
  • A thread local storage variable is added to the RTPSParticipantImpl class in order to prevent reentrancy in the endpoints_list_mutex (this situation takes place very often in intraprocess calls).

Contributor Checklist

  • Commit messages follow the project guidelines.
  • The code follows the style guidelines of this project.
  • NA Tests that thoroughly check the new feature have been added/Regression tests checking the bug and its fix have been added.
  • Any new/modified methods have been properly documented using Doxygen.
  • Fast DDS test suite has been run locally.
  • Changes are ABI compatible.
  • Changes are API compatible.
  • NA Documentation builds and tests pass locally.
  • NA New feature has been added to the versions.md file (if applicable).
  • NA New feature has been documented/Current behavior is correctly described in the documentation.

Reviewer Checklist

  • Check contributor checklist is correct.
  • Check CI results: changes do not issue any warning.
  • Check CI results: failing tests are unrelated with the changes.

@MiguelBarro MiguelBarro changed the title Test/issue/4169 [14681] Fixing thread sanitizer issue 4169 <master> Jun 3, 2022
@MiguelBarro MiguelBarro force-pushed the test/issue/4169 branch 4 times, most recently from 0fd6978 to 2a28411 Compare June 7, 2022 06:00
Miguel Barro and others added 16 commits June 7, 2022 08:13
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
…proxy mutex.

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Miguel Barro added 2 commits June 7, 2022 10:47
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
@MiguelBarro MiguelBarro self-assigned this Jun 7, 2022
Miguel Barro added 4 commits June 7, 2022 16:37
… fixed

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
…ataExceedsSizeLimit blackbox

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
@jsan-rt
Copy link
Contributor

jsan-rt commented Jun 13, 2022

@richiprosima please test mac

@MiguelCompany MiguelCompany merged commit 271929b into master Jun 13, 2022
@MiguelCompany MiguelCompany deleted the test/issue/4169 branch June 13, 2022 07:36
@MiguelCompany
Copy link
Member

@Mergifyio backport 2.6.x

@mergify
Copy link
Contributor

mergify bot commented Jun 22, 2022

backport 2.6.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jun 22, 2022
* Refs #14285: Trigger workflow

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs #14354: Trigger workflows

Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs #14069: Trigger workflow

Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs #14069: Workflow trigger

Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Fixing issue 4169.

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Fixing compiler errors & warnings.

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Fixing testing Mocks.

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Addressing reviewer comments.

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Debugging fixes.

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Linter pass

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Adding a ProxyPool class to avoid EDP ABBAs on temporary proxy mutex.

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Prevent reentrancy issues on shared_mutexes using tls

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Mock updates to match new RTPSParticipantImpl behavior.

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Linter pass..

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Fixing gcc/clang warnings

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Fixing clang warnings

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Unifying all temporary proxies in PDP owned pools

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Linter pass

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. RTPSStatisticsTests.statistics_rpts_listener_gap_callback fixed

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. RTPSWriterTests CMakelists.txt fix for windows.

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Fixing clang -Wunused-lambda-capture issue

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Fixing proxies copy to fix PubSubBasic.ReceivedPartitionDataExceedsSizeLimit blackbox

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

* Refs 14681. Addressing reviewer's comments

Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>

Co-authored-by: Javier Santiago <javiersantiago@eprosima.com>
(cherry picked from commit 271929b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants