Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply netlink retries #591

Merged
merged 2 commits into from
Jan 19, 2024
Merged

Conversation

hefroy
Copy link
Contributor

@hefroy hefroy commented Dec 18, 2023

Reapplying commit a268e2d from @phiwer with #283 which was removed by commit 826ebb8 from @DiogoPedrozza.

Additionally:

  1. limit the number of retries to three, since otherwise a persistent error would cause infinite retries and high CPU load.
  2. Improve the logging, to only output the error message after the retries are exhausted and fix the error message output, since strerror takes a positive errno.

implementation/endpoints/include/netlink_connector.hpp Outdated Show resolved Hide resolved
implementation/endpoints/src/netlink_connector.cpp Outdated Show resolved Hide resolved
implementation/endpoints/src/netlink_connector.cpp Outdated Show resolved Hide resolved
implementation/endpoints/src/netlink_connector.cpp Outdated Show resolved Hide resolved
implementation/endpoints/src/netlink_connector.cpp Outdated Show resolved Hide resolved
implementation/endpoints/src/netlink_connector.cpp Outdated Show resolved Hide resolved
implementation/endpoints/src/netlink_connector.cpp Outdated Show resolved Hide resolved
The stack sends three requests to the kernel using
the netlink socket. If the kernel is e.g. busy at this point,
netlink will respond with a EBUSY error value.

The current code does not handle these errors gracefully, but instead
silently ignores them. This can lead to vsomeip stack being stalled,
and not starting certain services, e.g. service discovery.

This patch helps fix these issues by sending a retry of the messages
that fail.
Copy link
Collaborator

@anaritarodrigues anaritarodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please apply the following patch
0001-creted-method.zip

Only log error after retries exhausted and fix error message output,
as strerror takes positive errno.
@hefroy
Copy link
Contributor Author

hefroy commented Jan 19, 2024

Please apply the following patch 0001-creted-method.zip

Applied ✔️

Copy link
Collaborator

@DiogoPedrozza DiogoPedrozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@anaritarodrigues anaritarodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the input.

@DiogoPedrozza DiogoPedrozza merged commit 339a2ca into COVESA:master Jan 19, 2024
2 checks passed
@DiogoPedrozza
Copy link
Collaborator

@phiwer and @hefroy thank you for your support!

milancpp pushed a commit to ApexAI/vsomeip that referenced this pull request Apr 26, 2024
* Retry failed netlink operations

The stack sends three requests to the kernel using
the netlink socket. If the kernel is e.g. busy at this point,
netlink will respond with a EBUSY error value.

The current code does not handle these errors gracefully, but instead
silently ignores them. This can lead to vsomeip stack being stalled,
and not starting certain services, e.g. service discovery.

This patch helps fix these issues by sending a retry of the messages
that fail.

* Limit netlink retries to 3, improve error logging

Only log error after retries exhausted and fix error message output,
as strerror takes positive errno.

---------

Co-authored-by: Philip Werner <phiwer@gmail.com>
milancpp added a commit to ApexAI/vsomeip that referenced this pull request Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants