Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VNet: Make sure daemon client has adequate permissions for passed TELEPORT_HOME #44751

Merged
merged 10 commits into from
Jul 30, 2024

Conversation

ravicious
Copy link
Member

@ravicious ravicious commented Jul 29, 2024

This is the last PR we need before we're able to ship the launch daemon version of VNet.

When starting VNet, the unprivileged process sends a couple of arguments to the privileged launch daemon, among them a path to TELEPORT_HOME of the user that wants to start VNet.

The daemon then lists profiles based on files from that dir, creates Teleport clients and for each profile it fetches the list of leaf clusters and a VNet config resource for each cluster.

This potentially allows a user to make the daemon read data belonging to any other user. To prevent this, this PR makes it so that before reading files from TELEPORT_HOME, the daemon sets its egid and euid to that of the daemon client. This way the daemon won't read data that the user doesn't have access to.

egid and euid of the daemon client is returned by the kernel based on an audit token included with each XPC message. I don't know if there's an official docs on this fact, but there's a couple of blog posts from security researchers on this topic (example 1, example 2). I used Obj-C APIs for that which I assume use C APIs underneath.

vnet-euid.mov

In the demo video above, I first spawn a launch daemon as usual with my regular TELEPORT_HOME. Then I spawn it again but with TELEPORT_HOME set to a dir I don't have access to. Most of the time the execution won't even get to that point, because tsh will attempt to read its config.yaml from TELEPORT_HOME. The dir I used in the demo has permissions set to 777, unlike something like /var/root with 750.


I don't have much experience with syscalls that change (e)uids and guids, I know about them mostly from caveats in blogposts and manpages. I based my implementation on Temporarily dropping root privileges in Go and a couple of other articles I read about setuid.

Edoardo has already brought up on Slack the problem of not spawning a child process and setting the euid from there. For now, I think it should be safe, as the VNet admin process itself is quite simple and beyond the loop where it configures the DNS, it doesn't do any extra async work. As such, setting a process-wide euid and egid does not conflict with any other code that's executed at the same time.

If we wanted to reexec, we'd have to stop caching the VNet configs (lib/vnet/clusterconfigcache.go) or reimplement the cache so that it works across processes.

@ravicious ravicious added no-changelog Indicates that a PR does not require a changelog entry backport/branch/v16 labels Jul 29, 2024
@github-actions github-actions bot added size/md tsh tsh - Teleport's command line tool for logging into nodes running Teleport. labels Jul 29, 2024
@ravicious ravicious removed the request for review from capnspacehook July 29, 2024 15:28
lib/vnet/daemon/common.go Show resolved Hide resolved
Comment on lines +83 to +88
// egid of the user starting VNet. Unsafe for production use, as the egid comes from an unstrusted
// source.
egid int
// euid of the user starting VNet. Unsafe for production use, as the euid comes from an unstrusted
// source.
euid int
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike, say, sudo, osascript -e "do shell script \"…\" with administrator privileges" does not seem to offer SUDO_UID or any other way to identify the user who spawned osascript in the first place.

osascript -e "do shell script \"id -ru\" with administrator privileges" returns 0.

However, after releasing the launch daemon we don't plan to use osascript in production (since we compile tsh with VNETDAEMON=yes). It's there only to make development easier. As such, I pass egid and euid through argv.

@@ -77,6 +78,7 @@ func Start(ctx context.Context, workFn func(context.Context, Config) error) erro
"ipv6_prefix", config.IPv6Prefix,
"dns_addr", config.DNSAddr,
"home_path", config.HomePath,
"cred", fmt.Sprintf("%#v", config.ClientCred),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I had to look up the %#v format string, I think here %+v is probably slightly better because its prints field names without the type names

Suggested change
"cred", fmt.Sprintf("%#v", config.ClientCred),
"cred", fmt.Sprintf("%+v", config.ClientCred),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cleaner to implement String on the ClientCred type (and then remember to wrap it in logutils.StringerAttr) or maybe even just implement LogValue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogValue is pretty cool, TIL. It ends up being logged like this. Admittedly, slog.GroupValue seems to look better with JSON formatters but I'll take this.

2024-07-30T16:58:15+02:00 INFO [VNET:DAEM] Received VNet config socket_path:/var/folders/1x/xhqmbxsd3_v9zdv78ybr14zr0000gn/T/vnetffa74891-73de-43c8-9df1-23d80ca0873c.sock ipv6_prefix:fde3:5ebb:fece:: dns_addr:fde3:5ebb:fece::2 home_path:/Users/fav/.tsh creds_valid:true egid:20 euid:501 trace_id:8a25ef3f4bcc9bf975b59e63996dbe41 span_id:ff3cdafcd7d941c7 daemon/service_darwin.go:75

Comment on lines 165 to 171

log.InfoContext(ctx, "Temporarily dropping root privileges.", "egid", c.daemonClientCred.Egid, "euid", c.daemonClientCred.Euid)
defer func() {
if err == nil {
log.InfoContext(ctx, "Restored root privileges.", "egid", rootEgid, "euid", rootEuid)
}
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend moving this to the top of the function, before the Setegid/Seteuid calls/defers. Then we'll actually print "Temporarily dropping root privileges." before the attempt, and "Restored root privileges." after successfully doing that

lib/vnet/daemon/service_darwin.m Outdated Show resolved Hide resolved
Copy link
Contributor

@espadolini espadolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to be concerned about the size of struct VnetConfigResult changing and two processes potentially disagreeing about it (an older service running and a freshly upgraded client connecting to it) or is it fine as long as the new fields are at the end of the struct?

}
defer func() {
syscallErr := trace.Wrap(syscall.Setegid(rootEgid), "reverting egid")
err = trace.NewAggregate(err, syscallErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace.aggregate composes really poorly with itself, I'd find some other way to keep track of the multierror before returning if we really care about that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved by removing trace.NewAggregate altogether in favor of using panic.

return trace.Wrap(err, "setting egid")
}
defer func() {
syscallErr := trace.Wrap(syscall.Setegid(rootEgid), "reverting egid")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to do anything but panic if the OS prevents us from restoring the egid/euid that we had before? The whole process is basically unusable at that point right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, true. I've never had a legitimate reason to use panic in any program that I've written, so I was hesitant to use it. But an error is technically recoverable, while here we just shouldn't continue execution at all.


// doWithDroppedRootPrivileges drops the privileges of the current process to those of the client
// process that called the VNet daemon.
func (c *osConfigurator) doWithDroppedRootPrivileges(ctx context.Context, fn func() error) (err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing prevents this from being called in parallel. Should we have a package-level mutex (or maybe just a package level atomic bool) that prevents more than one goroutine from running while we do these privilege shenanigans?

Copy link
Member Author

@ravicious ravicious Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add an atomic bool and make sure that this method returns an error if compare and swap fails. If someone runs into that error during development, they'll have to consciously think about what the behavior of this function should be, instead of simply waiting for a mutex (which could be a valid option, but I don't think we should make this decision now).

lib/vnet/daemon/service_darwin.h Show resolved Hide resolved
lib/vnet/daemon/service_darwin.m Outdated Show resolved Hide resolved

// doWithDroppedRootPrivileges drops the privileges of the current process to those of the client
// process that called the VNet daemon.
func (c *osConfigurator) doWithDroppedRootPrivileges(ctx context.Context, fn func() error) (err error) {
Copy link
Member Author

@ravicious ravicious Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add an atomic bool and make sure that this method returns an error if compare and swap fails. If someone runs into that error during development, they'll have to consciously think about what the behavior of this function should be, instead of simply waiting for a mutex (which could be a valid option, but I don't think we should make this decision now).

return trace.Wrap(err, "setting egid")
}
defer func() {
syscallErr := trace.Wrap(syscall.Setegid(rootEgid), "reverting egid")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, true. I've never had a legitimate reason to use panic in any program that I've written, so I was hesitant to use it. But an error is technically recoverable, while here we just shouldn't continue execution at all.

@ravicious ravicious enabled auto-merge July 30, 2024 15:16
@ravicious ravicious added this pull request to the merge queue Jul 30, 2024
Merged via the queue into master with commit bcebb07 Jul 30, 2024
40 checks passed
@ravicious ravicious deleted the r7s/vnet-euid branch July 30, 2024 15:38
@public-teleport-github-review-bot

@ravicious See the table below for backport results.

Branch Result
branch/v16 Create PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/branch/v16 no-changelog Indicates that a PR does not require a changelog entry size/md tsh tsh - Teleport's command line tool for logging into nodes running Teleport.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants