Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auditbeat] Cherry-pick #10942 to 7.x: User dataset: Numerous fixes to error handling #11191

Closed
wants to merge 1 commit into from

Conversation

cwurm
Copy link
Contributor

@cwurm cwurm commented Mar 11, 2019

Cherry-pick of PR #10942 to 7.x branch. Original message:

When testing the user dataset on Fedora 29, I noticed it's not working. Digging into it, there were a few issues, all of which this PR fixes:

  1. When calling C functions via cgo, we should not rely on the error return value to determine if it failed. In most cases, there is no guarantee that errno is going to be zero on success (see here for some detail). Instead, we should check the return value first, and if that indicates that there might have been an error (usually when it's NULL/nil) we should check the error return.
  2. getpwent(3) explicitly states If one wants to check errno after the call, it should be set to zero before the call. errno cannot be accessed directly in Go code, so we introduce a helper function setErrno for that. getspent(3) does not have the same warning, but given that at least glibc uses the same code path for both, it's reasonable to assume the same applies. So we set errno to 0 before calling both functions.
  3. errno is thread-local, and the function families setpwent/endpwent/getpwent and setspent/endspent/getspent are not thread-safe, so we introduce runtime.LockOSThread()/UnlockOSThread() to make sure all C functions are run on the same OS thread.
  4. Usually, when iterating through users getpwent() should return NULL and errno should be zero when all entries have been returned. However, there is a bug in systemd that causes it to set errno to ENOENT even when there is no error (nss-systemd's getpwent sets errno to ENOENT systemd/systemd#9585). This bug affects at least Fedora 29. Following this change ENOENT is treated as if there is no error. It's not supposed to be a valid error value for getpwent() anyway, so should not happen anytime outside this bug.
  5. The previous systemd bug caused the user dataset to return no users, even though all users had been read successfully. This PR changes to gathering errors in multierrors and returning them alongside whatever data could be read. This is in line with wanting to make the System module more resilient to non-fatal failures during data collection (better to return some things alongside an error than no things at all).
  6. The system test will now fail when there are WARN/ERROR messages in the log. This change can probably also be made for the other datasets.

Writing up this PR description I'm realizing this combines quite a few issues. I could split it into multiple PRs if anybody wants, though most would have only a few lines of changed code. Because some indentation changed it's best to check Github's No Whitespace button.

…0942)

Fixes a number of issues in the `user` dataset discovered while testing on Fedora 29:

* When calling C functions via cgo, don't rely on errno as an indicator of success - check the return value first.
* Set errno to 0 before calling getpwent(3) or getspent(3).
* Make C function calls thread-safe.
* Address a systemd bug where it can return ENOENT even when there is no error.
* Don't fail on every error.
* Fail system test on WARN/ERROR messages in the log.

(cherry picked from commit 8ce5440)
@cwurm cwurm requested a review from a team as a code owner March 11, 2019 17:45
@cwurm cwurm changed the title Cherry-pick #10942 to 7.x: [Auditbeat] User dataset: Numerous fixes to error handling [Auditbeat] Cherry-pick #10942 to 7.x: User dataset: Numerous fixes to error handling Mar 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/secops

@adriansr
Copy link
Contributor

Approved, but last news is we're doing weekly bulk backports to 7.x

@cwurm
Copy link
Contributor Author

cwurm commented Apr 5, 2019

Backported with #11633, closing.

@cwurm cwurm closed this Apr 5, 2019
@cwurm cwurm deleted the backport_10942_7.x branch April 5, 2019 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants