Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVS_GenerateEventTelemetry has a race condition #2523

Open
dallen-osr opened this issue Feb 29, 2024 · 2 comments
Open

EVS_GenerateEventTelemetry has a race condition #2523

dallen-osr opened this issue Feb 29, 2024 · 2 comments

Comments

@dallen-osr
Copy link

Describe the bug
In the function EVS_GenerateEventTelemetry, CFE_EVS_Global.EVS_TlmPkt.Payload.MessageSendCounter is read, and then modified, in a manner which allows a race condition to increment twice and roll-over, when the desired effect is saturation.

To Reproduce
Steps to reproduce the behavior:

  1. Go to modules/evs/fsw/src/cfe_evs_utils.c, line 520 in function EVS_GenerateEventTelemetry
  2. Observe that the variable is read to compare against CFE_EVS_MAX_EVENT_SEND_COUNT.
  3. Observe that after the comparison, a write is made, but that a competing thread may have invalidated the prior condition.
  4. Run cFS in a multithreaded environment under helgrind repeatedly to manifest the behavior.

Expected behavior
Some kind of atomic compare and swap, or mutex protection.

Code snips

    /* Increment message send counters (prevent rollover) */
    if (CFE_EVS_Global.EVS_TlmPkt.Payload.MessageSendCounter < CFE_EVS_MAX_EVENT_SEND_COUNT)
    {
        CFE_EVS_Global.EVS_TlmPkt.Payload.MessageSendCounter++;
    }

System observed on:

  • Hardware
  • OS: Fedora 39, amd64, kernel version 6.7.4
  • Versions: CFE draco-rc5, main (as of filing this issue on 29 February, 2024)

Additional context
No additional context.

Reporter Info
Dominick Allen, Odyssey Space Research contractor to NASA JSC.

@dmknutsen
Copy link
Contributor

@dallen-osr Good catch and thanks for pointing this out! Was this caught through manual inspection or did you use a tool to catch the error? If a tool was used...where other errors detected?

@dallen-osr
Copy link
Author

dallen-osr commented Apr 17, 2024

I caught it with a tool - I don't remember if it was tsan (thread sanitizer) or one of valgrind's tools, but I was combing through with both of them. They are incompatible with each other, so you have to use one or the other at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants