Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to use RTLD_NODELETE when loading a library #102

Merged
merged 6 commits into from
Aug 31, 2022

Conversation

azeey
Copy link
Contributor

@azeey azeey commented Aug 15, 2022

🎉 New feature

Summary

Some of our segfaults in gz-sim runs and tests (flaky tests) is caused by the access of memory (data or code) that resides in a plugin/shared library after the library has been unloaded and deleted. With RTLD_NODELETE, dlclose will unload the shared library and run it's fini function, but doesn't actually delete it. Thus, access to code (eg. destructors) after the library has been "dlclosed" will not cause a segfault.

Test it

TODO

Checklist

  • Signed all commits for DCO
  • Added tests
  • Added example and/or tutorial
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • Consider updating Python bindings (if the library has them)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

Signed-off-by: Addisu Z. Taddese <addisu@openrobotics.org>
@github-actions github-actions bot added the 🌱 garden Ignition Garden label Aug 15, 2022
@codecov
Copy link

codecov bot commented Aug 15, 2022

Codecov Report

Merging #102 (591243e) into gz-plugin2 (daa9a65) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 591243e differs from pull request most recent head 4f988f0. Consider uploading reports for the commit 4f988f0 to get more accurate results

@@             Coverage Diff             @@
##           gz-plugin2     #102   +/-   ##
===========================================
  Coverage       98.25%   98.26%           
===========================================
  Files              23       23           
  Lines             745      748    +3     
===========================================
+ Hits              732      735    +3     
  Misses             13       13           
Impacted Files Coverage Δ
loader/src/Loader.cc 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@chapulina chapulina added the bug Something isn't working label Aug 15, 2022
@chapulina
Copy link
Contributor

This could target ign-plugin1. It's backwards-compatible and also a bug to be fixed in earlier versions.

Copy link
Contributor

@ahcorde ahcorde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way to add some test for this new method?

@@ -431,7 +437,8 @@ namespace gz

// NOTE: We open using RTLD_LOCAL instead of RTLD_GLOBAL to prevent the
// symbols of different libraries from writing over each other.
void *dlHandle = dlopen(_full_path.c_str(), RTLD_LAZY | RTLD_LOCAL);
int dlopenMode = RTLD_LAZY | RTLD_LOCAL | (_noDelete ? RTLD_NODELETE : 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need an ifdef for RTLD_NODELETE on windows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 644f613

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, the behavior of dlopen is different between Linux and macOS. On Linux, if you dlopen (with RTLD_NODELETE) , dlclose, and dlopen again, since the plugin is not deleted, dlopen will reuse the existing plugin. On macOS, however, even though the plugin is not deleted, dlopen will treat the second dlopen as though a new library is being loaded and end up loading it into a new memory location. Thus, the type of test I was using won't work. Since we're running out of time, I've decided to enable the test only on Linux. If there are other ways to test this, I'm open to ideas.

Should we add __APPLE__ to the ifdef?

Signed-off-by: Addisu Z. Taddese <addisu@openrobotics.org>
Signed-off-by: Addisu Z. Taddese <addisu@openrobotics.org>
@azeey
Copy link
Contributor Author

azeey commented Aug 23, 2022

is there any way to add some test for this new method?

Added a test in 739ae12.

Signed-off-by: Addisu Z. Taddese <addisu@openrobotics.org>
@azeey
Copy link
Contributor Author

azeey commented Aug 26, 2022

@ahcorde Apparently, the behavior of dlopen is different between Linux and macOS. On Linux, if you dlopen (with RTLD_NODELETE) , dlclose, and dlopen again, since the plugin is not deleted, dlopen will reuse the existing plugin. On macOS, however, even though the plugin is not deleted, dlopen will treat the second dlopen as though a new library is being loaded and end up loading it into a new memory location. Thus, the type of test I was using won't work. Since we're running out of time, I've decided to enable the test only on Linux. If there are other ways to test this, I'm open to ideas.

@azeey azeey requested a review from ahcorde August 26, 2022 21:50
Copy link
Contributor

@jennuine jennuine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, left a minor comment

@@ -431,7 +437,8 @@ namespace gz

// NOTE: We open using RTLD_LOCAL instead of RTLD_GLOBAL to prevent the
// symbols of different libraries from writing over each other.
void *dlHandle = dlopen(_full_path.c_str(), RTLD_LAZY | RTLD_LOCAL);
int dlopenMode = RTLD_LAZY | RTLD_LOCAL | (_noDelete ? RTLD_NODELETE : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, the behavior of dlopen is different between Linux and macOS. On Linux, if you dlopen (with RTLD_NODELETE) , dlclose, and dlopen again, since the plugin is not deleted, dlopen will reuse the existing plugin. On macOS, however, even though the plugin is not deleted, dlopen will treat the second dlopen as though a new library is being loaded and end up loading it into a new memory location. Thus, the type of test I was using won't work. Since we're running out of time, I've decided to enable the test only on Linux. If there are other ways to test this, I'm open to ideas.

Should we add __APPLE__ to the ifdef?

@azeey
Copy link
Contributor Author

azeey commented Aug 30, 2022

@jennuine , RTLD_NODELETE still works on macOS, it just that dlopen behaves differently when trying to load a plugin that's already in memory. I think the main thing issue with the segfaults we were getting was that some class destructors were being deleted when a plugin got unloaded, and being accessed by the system afterward. Using RTLD_NODELETE should prevent the segfault on macOS as well, although I haven't tested this.

@azeey
Copy link
Contributor Author

azeey commented Aug 30, 2022

@ahcorde, do you have time to give this another look?

Copy link
Contributor

@ahcorde ahcorde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor style comments

test/integration/plugin_unload.hh Outdated Show resolved Hide resolved
test/integration/plugin_unload.hh Show resolved Hide resolved
loader/include/gz/plugin/Loader.hh Outdated Show resolved Hide resolved
Signed-off-by: Louise Poubel <louise@openrobotics.org>
@chapulina chapulina enabled auto-merge (squash) August 31, 2022 16:18
@chapulina chapulina merged commit c189b78 into gazebosim:gz-plugin2 Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working 🌱 garden Ignition Garden
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants