Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: consider removing negative (non-canonical) __NR_xyz defines #27

Closed
amluto opened this issue Feb 12, 2016 · 14 comments
Closed

RFE: consider removing negative (non-canonical) __NR_xyz defines #27

amluto opened this issue Feb 12, 2016 · 14 comments

Comments

@amluto
Copy link
Contributor

amluto commented Feb 12, 2016

IMO __NR_xyz for syscall xyz should either match the actual kernel-assigned number or should not be defined. This code, for example, can cause confusion:

#define __PNR_userfaultfd   -10200
#ifndef __NR_userfaultfd
#define __NR_userfaultfd    __PNR_userfaultfd
#endif /* __NR_userfaultfd */

libseccomp is not the sole user of __NR_xyz defines.

Using LIBSECCOMP_xyz or sticking with __PNR_syz would be fine.

@amluto
Copy link
Contributor Author

amluto commented Feb 12, 2016

@pcmoore
Copy link
Member

pcmoore commented Feb 12, 2016

@amluto Out of curiosity, in the sandstorm issue you mentioned the following:

At some point we should drop libseccomp. It's problematic for a few reasons.

Other than the issue you mention here, can you explain the others? We would like to make libseccomp useful for other projects, and if it isn't it would be good to know why.

@pcmoore pcmoore changed the title Consider removing negative (non-canonical) __NR_xyz defines RFE: consider removing negative (non-canonical) __NR_xyz defines Feb 12, 2016
@amluto
Copy link
Contributor Author

amluto commented Feb 16, 2016

IIRC the main issue I had was that filtering on anything other than syscall nr is very inflexible. For example, I couldn't express:

Block init_module.
Allow socket, but only for the following address families

unless I inverted it and disallowed families above the largest family number I wanted and also disallowed families in between the ones that are okay.

@kentonv
Copy link

kentonv commented Mar 19, 2016

@dwrensha is reporting that if he builds a binary on a system whose headers are old and don't define the __NR_blah constant for a syscall, and so instead gets libseccomp's pseudo-number, the result is that the seccomp filter does not filter the syscall. Does this seem correct? If so this seems like a serious security problem.

@dwrensha
Copy link

Here is an example program:

#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <seccomp.h>
#include <string.h>
#include <fcntl.h>

int main() {
  scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_ALLOW);
  if (ctx == NULL) return 1;

  int result = seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EACCES), SCMP_SYS(userfaultfd), 0);
  if (result < 0) return 1;

  result = seccomp_load(ctx);
  if (result < 0) return 1;

  long result2 = syscall(323, O_NONBLOCK);  // 323 = userfaultfd() on x86_64
  if (result2 >= 0) {
    printf("successfully called userfaultfd()\n");  // should never happen!
  } else {
    printf("errno %d: %s\n", errno, strerror(errno));
  }
  return 0;
}

I statically compiled this program on the following box:

$ uname -a
Linux beryllium 4.2.0-34-generic #39-Ubuntu SMP Thu Mar 10 22:13:01 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

and it gave me the following output:

$ ./a.out 
errno 38: Function not implemented

There doesn't seem to be any Seccomp filtering happening, but, whatever, the syscall isn't implemented anyway.

Then I copied that binary over to the following box:

$ uname -a
Linux radon 4.4.4-301.fc23.x86_64 #1 SMP Fri Mar 4 17:42:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

and it gave me this result:

$ ./a.out 
successfully called userfaultfd()

Which goes completely against my expectation. Shouldn't Seccomp block this call?

@dwrensha
Copy link

Note that if I compile the program on radon, the box with the newer kernel, then I get

$ ./a.out 
errno 13: Permission denied

as expected.

@kentonv
Copy link

kentonv commented Mar 20, 2016

Just to be really clear on the problem here: This implies that binaries built on machines with older kernel headers will fail to actually filter syscalls that the code very clearly is trying to filter. This could be a problem if:

  • The machine's kernel is newer than its headers (common e.g. on Debian Stable w/backports, or when people build their own kernels).
  • The machine's kernel is upgraded later.
  • The binary is copied to a different machine with a newer kernel.

For example, there was recently a CVE in bpf(), which is a very new syscall. Sandstorm dodged the CVE because we block this call, and because we build our releases on Debian Unstable, which has very up-to-date headers. But, we are planning to switch to building releases on Debian Stable instead. Had we done so before noticing this problem, we would have shipped a release which fails to filter this syscall and would have been vulnerable to the CVE.

@pcmoore
Copy link
Member

pcmoore commented Mar 21, 2016

As a FYI, I am considering changing some of how we handle non-native ABIs in libseccomp, which is the main reason why libseccomp needs to provide its own __NR_xxx defines; hopefully we'll have something in the next release.

@pcmoore
Copy link
Member

pcmoore commented Aug 14, 2019

Related: systemd/systemd#13319

@pcmoore
Copy link
Member

pcmoore commented Aug 14, 2019

Since this is starting to become more of an issue, here is one of the possible solutions I was thinking should solve this problem.

We currently need __NR_xyx defines for all syscalls that libseccomp supports, regardless of support on the native ABI, because we support creating non-native ABI filters, e.g. 32-bit x86 filters on a 64-bit x86_64 host. However, we've long taken the stance that libseccomp callers really shouldn't use the __NR_xyz defines directly, they should use SCMP_SYS(...) or preferably seccomp_arch_resolve_name(...). With that in mind, I think we can take the following approach:

  • Create a secondary public header file ("seccomp-syscalls.h"?) that generated at build time using the internal libseccomp syscall tables, and is included directly from the main "seccomp.h" (no change in includes from a caller's perspective). In this header file will be a number of cpp defines:

    • __PNR_xyz: similar to their current definitions (definition for syscalls that aren't present in every supported ABI)
    • __SNR_xyz: create a definition for every syscall and either assign it a value of __NR_xyz or __PNR_xyz depending on the host system's ABI
  • Change the SCMP_SYS(...) macro to the following:

/**
 * Convert a syscall name into the associated syscall number
 * @param x the syscall name
 */
#define SCMP_SYS(x)             (__SNR_##x)

I believe this should eliminate the need for libseccomp to create it's own __NR_xyz definitions while preserving the SCMP_SYS(...) macro both for use in the libseccomp API and as an initializer. Of course callers which had been directly using __NR_xyz macros in their code will now fail if they are using syscalls that are not universally supported, but that has always been a risk.

@drakenclimber
Copy link
Member

I need to think about it some more, but this sounds like it meets everyone's needs without breaking backward compatibility. Nice!

@pcmoore pcmoore self-assigned this Oct 1, 2019
@pcmoore
Copy link
Member

pcmoore commented Oct 1, 2019

All tests look good on my system, but I'm waiting on the Travis run to come back clean before submitting the PR. In the meantime, my proposed fix is at the link below:

@pcmoore
Copy link
Member

pcmoore commented Oct 1, 2019

See PR #172.

@pcmoore
Copy link
Member

pcmoore commented Oct 8, 2019

This should be resolved in PR #172 and #174, closing this out.

@pcmoore pcmoore closed this as completed Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants