Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simd intrinsics: add simd_shuffle_generic and other missing intrinsics #119213

Merged
merged 4 commits into from
Feb 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 51 additions & 3 deletions library/core/src/intrinsics/simd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -190,14 +190,27 @@ extern "platform-intrinsic" {
///
/// `T` must be a vector.
///
/// `U` must be a const array of `i32`s.
/// `U` must be a **const** array of `i32`s. This means it must either refer to a named
/// const or be given as an inline const expression (`const { ... }`).
///
/// `V` must be a vector with the same element type as `T` and the same length as `U`.
///
/// Concatenates `x` and `y`, then returns a new vector such that each element is selected from
/// the concatenation by the matching index in `idx`.
/// Returns a new vector such that element `i` is selected from `xy[idx[i]]`, where `xy`
/// is the concatenation of `x` and `y`. It is a compile-time error if `idx[i]` is out-of-bounds
/// of `xy`.
pub fn simd_shuffle<T, U, V>(x: T, y: T, idx: U) -> V;

/// Shuffle two vectors by const indices.
///
/// `T` must be a vector.
///
/// `U` must be a vector with the same element type as `T` and the same length as `IDX`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good catch. ^^;

///
/// Returns a new vector such that element `i` is selected from `xy[IDX[i]]`, where `xy`
/// is the concatenation of `x` and `y`. It is a compile-time error if `IDX[i]` is out-of-bounds
/// of `xy`.
workingjubilee marked this conversation as resolved.
Show resolved Hide resolved
pub fn simd_shuffle_generic<T, U, const IDX: &'static [u32]>(x: T, y: T) -> U;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this intrinsic before 🤔 this is what we need in std::simd

Copy link
Member Author

@RalfJung RalfJung Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an experiment to see how far we can get with the current const-generic support. I don't think std::simd can use it yet, that would need generic_const_exprs which is still a highly experimental feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if we could stabilize a small enough subset of GCE that this sort of thing becomes feasible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From talking with @lcnr that seems pretty far out currently. And in particular reference types in const generics have pretty thorny unsolved theoretical questions as well.


/// Read a vector of pointers.
///
/// `T` must be a vector.
Expand Down Expand Up @@ -232,6 +245,9 @@ extern "platform-intrinsic" {
/// corresponding value in `val` to the pointer.
/// Otherwise if the corresponding value in `mask` is `0`, do nothing.
///
/// The stores happen in left-to-right order.
/// (This is relevant in case two of the stores overlap.)
Comment on lines +248 to +249
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( probably not relevant, but fun facts to know and tell: if the machine instruction gets executed, this guarantee also affects the state of the CPU during exception handling. )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, exception handles can observe these instructions as non-atomic? Wow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, on Intel CPUs at least, for most variants of these instructions, the operands (esp. the mask operand) get updated so that continuing with the same operands will complete the operation, completing a single store for each index.

Copy link
Member Author

@RalfJung RalfJung Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "get updated"? Does this change the contents of some other register, where the mask operand is stored? I hope it restored the original value when the instruction is done?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "get updated"? Does this change the contents of some other register, where the mask operand is stored?

Yes, the mask register.

I hope it restored the original value when the instruction is done?

uhhh probably! 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it restored the original value when the instruction is done?

uhhh probably! 😁

nope, it's defined to clear the mask register: https://www.felixcloutier.com/x86/vscatterdps:vscatterdpd:vscatterqps:vscatterqpd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay... well as long as LLVM takes that into account it's all good.

///
/// # Safety
/// Unmasked values in `T` must be writeable as if by `<ptr>::write` (e.g. aligned to the element
/// type).
Expand Down Expand Up @@ -468,4 +484,36 @@ extern "platform-intrinsic" {
///
/// `T` must be a vector of integers.
pub fn simd_cttz<T>(x: T) -> T;

/// Round up each element to the next highest integer-valued float.
///
/// `T` must be a vector of floats.
pub fn simd_ceil<T>(x: T) -> T;

/// Round down each element to the next lowest integer-valued float.
///
/// `T` must be a vector of floats.
pub fn simd_floor<T>(x: T) -> T;

/// Round each element to the closest integer-valued float.
/// Ties are resolved by rounding away from 0.
///
/// `T` must be a vector of floats.
pub fn simd_round<T>(x: T) -> T;

/// Return the integer part of each element as an integer-valued float.
/// In other words, non-integer values are truncated towards zero.
///
/// `T` must be a vector of floats.
pub fn simd_trunc<T>(x: T) -> T;

/// Takes the square root of each element.
///
/// `T` must be a vector of floats.
pub fn simd_fsqrt<T>(x: T) -> T;

/// Computes `(x*y) + z` for each element, but without any intermediate rounding.
///
/// `T` must be a vector of floats.
pub fn simd_fma<T>(x: T, y: T, z: T) -> T;
}
Loading