Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement shadow stacks #455

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Freax13
Copy link
Contributor

@Freax13 Freax13 commented Sep 12, 2024

This PR implements shadow stacks.

Shadow stacks are enabled/disabled at compile time. AFAIK all processors supporting either AMD SEV-SNP or Intel TDX support shadow stacks, so no runtime checks are implemented. This also avoids the pitfall where the hypervisor lies about shadow stack availability to maliciously lower the security of the SVSM.

KVM intercepts accesses to the MSRs used for shadow stacks, so some host-side modifications are required to make this work.

I implemented a #CP exception handler to display diagnostic information when a shadow stack-related issue occurs. We might want to disable this exception handler for release builds and cause a triple fault instead. #CP exceptions are likely a sign of something fishy going on and we might want to terminate the guest just to be sure.

Currently, shadow stacks are disabled by default for a couple of reasons:

  1. The KVM patches mentioned above are needed to make this work.
  2. We haven't implemented proper user syscalls (using syscall & sysret) yet and I don't want to hinder any efforts in that direction by forcing a proper syscall implementation to support shadow stacks out of the box. We should probably get syscalls without shadow stacks working first and once that's merged, we can add shadow stack support later and eventually enable shadow stacks by default.
  3. I haven't tested shadow-stacks together with enable-gdb yet, but I suspect this probably needs some additional work.

Copy link
Member

@joergroedel joergroedel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is impressive, thanks for implementing this feature!

There are some open questions, one around the task-switch handling. I left a comment there. The other question is, whether there is a way to enable/disable this feature at boot-time, instead of compile time.

Finally, I found that it uses the old PageRef interface. Since the new one is merged now, this can be re-based on top of latest HEAD.

Comment on lines 507 to 514
/// 1. Switch to the shadow stack at the fixed address using `rstorssp`.
/// 2. Transfer the shadow stack restore token from the shadow stack at the
/// fixed address to the previous shadow stack by executing `saveprevssp`.
/// 3. Switch the page tables. This doesn't lead to problems with the shadow
/// stack because is mapped into both page tables.
/// 4. Switch to the new shadow stack using `rstorssp`.
/// 5. Transfer the shadow stack restore token from the new shadow stack back
/// to the shadow stacks at the fixed address by executing `saveprevssp`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another issue around the task-switch code which needs fixing, and that might help for shadow stacks as well.

The problem currently is that the task-switch assembly is not safe against exceptions (like #HV and #NMI) that can occur at any time. The unsafe part come from the fact that the code can not switch page-tables and stack pointers atomically. The solution for that problem is either using IST stacks for these exceptions, or using a per-cpu task-switch stack.

The preferred solution is a per-cpu task switch stack, which would then also need a shadow stack. I think this would solve this problem as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't consider that IRQ may come in during the context switch. I implemented a CPU-local dedicated stack and shadow stack as you suggested.

The initialization and pt_flags are a bit special for shadow stack
pages, so this warrants a new `VirtualMapping` implementations.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
This shadow stack is used when not using a task's shadow stack.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
The interrupt shadow stack table (ISST) is very similar to the
interrupt stack table (IST) except that it contains shadow stack
addresses instead of normal stack addresses.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
Each task needs to a normal shadow stack and shadow stack used for
exception handling.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
Some exception handlers will need to update the shadow stack, so they
need to know the shadow stack pointer at the time of the exception.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
Whenever we update the return address on the shadow stack, we'll also
need to update the return address on the shadow stack.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
We need to guard against IRQs coming in after switching to the new page tables
and before switching to the new stack.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
Each task has separate shadow stacks, so we need to switch them when
switching tasks.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
This enables shadow stacks for the BSP.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
This enables shadow stacks on the secondary APs.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
This exception handler will be executed when the CPU detects a mismatch
between the return address on the stack and the return address on the
shadow stack.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
Trusted CPUID values are hard to come by, so let's just try to enable
CET in CR4 and handle failure gracefully.

Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
@Freax13
Copy link
Contributor Author

Freax13 commented Sep 18, 2024

There are some open questions, one around the task-switch handling. I left a comment there. The other question is, whether there is a way to enable/disable this feature at boot-time, instead of compile time.

I added a patch that tries to detect CET support at runtime by enabling CET in CR4 and catching any faults that might occur if not supported. This still doesn't technically detect shadow stack support because AFAICT CPUs could theoretically only support CET lBT (indirect branch tracking) and not CET SHSTK (shadow stacks), but in practice no such CPUs exist (and AFAICT TDX only allows turning all of CET on or off, not individual sub-parts).

Finally, I found that it uses the old PageRef interface. Since the new one is merged now, this can be re-based on top of latest HEAD.

Done.

@msft-jlange
Copy link
Contributor

It doesn't look like this change is compatible with the existing #HV handling code. For example, there is code in asm_entry_hv which specifically checks to see whether a recursive #HV has been delivered while executing an IRET sequence, and if so, it overwrites the old IRET data with the newly pushed IRET data. The shadow stack equivalent of this code must be written, but it is also inherently unsafe, because it requires popping values from the shadow stack and using the WRSS instruction to write them to a previous location on the shadow stack. Regardless of what approach we ultimately decide to take here, it is critical to support Restricted Injection correctly, and we can't allow shadow stack support to break it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants