Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64: Casting a specific double value to uint32 caused an unexpected divide by zero to occur #95290

Closed
MCPtz opened this issue Nov 27, 2023 · 7 comments
Labels
area-System.Numerics question Answer questions and provide assistance, not an issue with source code or documentation.

Comments

@MCPtz
Copy link

MCPtz commented Nov 27, 2023

Description

On .NET6, on an arm64 processor, a cast from large negative double to uint32 caused a value to be zero.
In other casting situations on arm64 and on x86, it was not zero.
It later resulted in divide by zero errors.

The flag --runtime linux-arm64 doesn't effect the problem, as it occurs with and without it.

Reproduction Steps

In a unit test on the arm64 environment we used, this C# code will produce the bug.
Unclear if this occurs on other processors, operating systems, dotnet runtime environments.

ushort Factor = 256;
//FYI: Negative direction of -1 millimeter on this particular motor goes to home position. 
double InitialStep = -1.0000000000000001E-05;//We translate this to units the motor controller understands
double MicroStep = InitialStep / Factor; //-3.9062500000000003E-08
//Offending line of code:
//x86: 4134117753 //expected value and what we get on x86 processors
//arm64: 0 //unexpected zero only on this arm64 env
//2 * Math.PI / MicroStep: double at -160849543.8637974
//On arm64:
//  (uint) -160849543.8637974 == 0
//  (uint) (int) -160849543.8637974 == 4134117753 //a workaround is to cast to int32 first
uint FullRotationMicroSteps => (uint) (2 * Math.PI / MicroStep);

//Will fail on the arm64 processor we used, with FullRotationMicroSteps equal to zero
//Will pass on x86
Assert.AreEqual(4134117753, FullRotationMicroSteps);

//Later steps in our code would throw a divide by zero exception as it was used on the right side of a % modulus operator

Workaround. We cast to int32 first, then uint32 after:

uint FullRotationMicroSteps => (uint) (int) (2 * Math.PI / MicroStep);

Expected behavior

This double cast into uint32 should result in this number:

//  (uint) -160849543.8637974 == 4134117753

Actual behavior

//  (uint) -160849543.8637974 == 0
//  (uint) (int) -160849543.8637974 == 4134117753

Regression?

N/A

Known Workarounds

Cast to int first:

//  (uint) (int) -160849543.8637974 == 4134117753

Configuration

This is on an 80 core Neoverse-N1 running Rocky 9.2 with .NET 6 sdk.

https://www.arm.com/products/silicon-ip-cpu/neoverse/neoverse-n1

 dotnet --info
.NET SDK (reflecting any global.json):
 Version:   6.0.416
 Commit:    db3bc4a6c6

Runtime Environment:
 OS Name:     rocky
 OS Version:  9.2
 OS Platform: Linux
 RID:         rocky.9-arm64
 Base Path:   /usr/share/dotnet/sdk/6.0.416/

global.json file:
  Not found

Host:
  Version:      6.0.24
  Architecture: arm64
  Commit:       e7b8488daf

.NET SDKs installed:
  6.0.416 [/usr/share/dotnet/sdk]

80 total processors. Here's the first one:

cat /proc/cpuinfo | head -20
processor       : 0
BogoMIPS        : 50.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1
uname -a
Linux computer.name.com 5.14.0-284.11.1.el9_2.aarch64 #1 SMP PREEMPT_DYNAMIC Tue May 9 13:04:57 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
cat /etc/os-release
NAME="Rocky Linux"
VERSION="9.2 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.2"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.2 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.2"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.2"
lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 80
  On-line CPU(s) list:  0-79
Vendor ID:              ARM
  Model name:           Neoverse-N1
    Model:              1
    Thread(s) per core: 1
    Core(s) per socket: 80
    Socket(s):          1
    Stepping:           r3p1
    Frequency boost:    disabled
    CPU max MHz:        3000.0000
    CPU min MHz:        1000.0000
    BogoMIPS:           50.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
Caches (sum of all):
  L1d:                  5 MiB (80 instances)
  L1i:                  5 MiB (80 instances)
  L2:                   80 MiB (80 instances)
NUMA:
  NUMA node(s):         1
  NUMA node0 CPU(s):    0-79
Vulnerabilities:
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Mitigation; CSV2, BHB
  Srbds:                Not affected
  Tsx async abort:      Not affected

On the same OS, we run a container in podman with Ubuntu 22 container OS, and the same error occurs.
How we can run in podman. After logging into the bash prompt, run the dotnet test command on that specific DLL with the unit test and filter for it.

podman run --privileged --cpus=8 --rm -v /localhome/username/test/:/test --workdir /test/dotnet -i -t test:dbzero /bin/bash

OS release:

cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Other information

No response

@ghost
Copy link

ghost commented Nov 27, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

On .NET6, on an arm64 processor, a cast from large negative double to uint32 caused a value to be zero.
In other casting situations on arm64 and on x86, it was not zero.
It later resulted in divide by zero errors.

The flag --runtime linux-arm64 doesn't effect the problem, as it occurs with and without it.

Reproduction Steps

In a unit test on the arm64 environment we used, this C# code will produce the bug.
Unclear if this occurs on other processors, operating systems, dotnet runtime environments.

ushort Factor = 256;
//FYI: Negative direction of -1 millimeter on this particular motor goes to home position. 
double InitialStep = -1.0000000000000001E-05;//We translate this to units the motor controller understands
double MicroStep = InitialStep / Factor; //-3.9062500000000003E-08
//Offending line of code:
//x86: 4134117753 //expected value and what we get on x86 processors
//arm64: 0 //unexpected zero only on this arm64 env
//2 * Math.PI / MicroStep: double at -160849543.8637974
//On arm64:
//  (uint) -160849543.8637974 == 0
//  (uint) (int) -160849543.8637974 == 4134117753 //a workaround is to cast to int32 first
uint FullRotationMicroSteps => (uint) (2 * Math.PI / MicroStep);

//Will fail on the arm64 processor we used, with FullRotationMicroSteps equal to zero
//Will pass on x86
Assert.AreEqual(4134117753, FullRotationMicroSteps);

//Later steps in our code would throw a divide by zero exception as it was used on the right side of a % modulus operator

Workaround. We cast to int32 first, then uint32 after:

uint FullRotationMicroSteps => (uint) (int) (2 * Math.PI / MicroStep);

Expected behavior

This double cast into uint32 should result in this number:

//  (uint) -160849543.8637974 == 4134117753

Actual behavior

//  (uint) -160849543.8637974 == 0
//  (uint) (int) -160849543.8637974 == 4134117753

Regression?

N/A

Known Workarounds

Cast to int first:

//  (uint) (int) -160849543.8637974 == 4134117753

Configuration

This is on an 80 core Neoverse-N1 running Rocky 9.2 with .NET 6 sdk.

https://www.arm.com/products/silicon-ip-cpu/neoverse/neoverse-n1

 dotnet --info
.NET SDK (reflecting any global.json):
 Version:   6.0.416
 Commit:    db3bc4a6c6

Runtime Environment:
 OS Name:     rocky
 OS Version:  9.2
 OS Platform: Linux
 RID:         rocky.9-arm64
 Base Path:   /usr/share/dotnet/sdk/6.0.416/

global.json file:
  Not found

Host:
  Version:      6.0.24
  Architecture: arm64
  Commit:       e7b8488daf

.NET SDKs installed:
  6.0.416 [/usr/share/dotnet/sdk]

80 total processors. Here's the first one:

cat /proc/cpuinfo | head -20
processor       : 0
BogoMIPS        : 50.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1
uname -a
Linux computer.name.com 5.14.0-284.11.1.el9_2.aarch64 #1 SMP PREEMPT_DYNAMIC Tue May 9 13:04:57 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
cat /etc/os-release
NAME="Rocky Linux"
VERSION="9.2 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.2"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.2 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.2"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.2"
lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 80
  On-line CPU(s) list:  0-79
Vendor ID:              ARM
  Model name:           Neoverse-N1
    Model:              1
    Thread(s) per core: 1
    Core(s) per socket: 80
    Socket(s):          1
    Stepping:           r3p1
    Frequency boost:    disabled
    CPU max MHz:        3000.0000
    CPU min MHz:        1000.0000
    BogoMIPS:           50.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
Caches (sum of all):
  L1d:                  5 MiB (80 instances)
  L1i:                  5 MiB (80 instances)
  L2:                   80 MiB (80 instances)
NUMA:
  NUMA node(s):         1
  NUMA node0 CPU(s):    0-79
Vulnerabilities:
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Mitigation; CSV2, BHB
  Srbds:                Not affected
  Tsx async abort:      Not affected

On the same OS, we run a container in podman with Ubuntu 22 container OS, and the same error occurs.
How we can run in podman. After logging into the bash prompt, run the dotnet test command on that specific DLL with the unit test and filter for it.

podman run --privileged --cpus=8 --rm -v /localhome/username/test/:/test --workdir /test/dotnet -i -t test:dbzero /bin/bash

OS release:

cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Other information

No response

Author: MCPtz
Assignees: -
Labels:

area-System.Numerics

Milestone: -

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 27, 2023
@MichalPetryka
Copy link
Contributor

This is by design, ARM64 saturates when converting floating point numbers to integers and it's planned to mimic this behaviour on X64 (and other platforms) in .NET 9 too.

@MCPtz
Copy link
Author

MCPtz commented Nov 27, 2023

Ah good! It's consistent with C++ code too.

On the same arm64 server, create this testint.cpp:

#include <stdio.h>
#include <cstdint>

int main() {
  double value = -160849543.8637974;
  printf("uint32_t(%.2f) = %u\n", value, uint32_t(value));
  printf("int32_t(%.2f) = %d\n", value, int32_t(value));
  printf("double cast uint32_t(int32_t(%.2f)) = %u\n", value, uint32_t(int32_t(value)));
}
g++ testint.cpp -o testint
 ./testint
uint32_t(-160849543.86) = 0
int32_t(-160849543.86) = -160849543
double cast uint32_t(int32_t(-160849543.86)) = 4134117753

@MCPtz
Copy link
Author

MCPtz commented Nov 27, 2023

Looks like something we can close as not an issue, at least in .NET6

@tannergooding
Copy link
Member

Ah good! It's consistent with C++ code too.

Most languages do not define a behavior for overflow of floating-point to integral conversions. This is namely because IEEE 754 does not define a behavior. Accordingly, different compilers (MSVC vs Clang vs GCC vs ...) and different languages (C# vs C++ vs Rust vs ...) may do different things.

In many cases these languages have one behavior for constant folding and another behavior for when the value is unknown. When the value is unknown they often defer to the underlying platform. Thus you can get different results for constant folding between Arm64 and x64 depending on which your compiler is running on. You can likewise get different results for ToUInt32(-1) given uint32_t ToUInt32(float value) { return static_cast<uint32_t>(value); }, depending on whether ToUInt32 is inlined or not.

Newer languages have started prescribing a particular behavior to avoid these cross platform differences. For example, WASM requires that the behavior saturate (but leaves the handling for NaN as undefined). Rust also requires saturation and requires NaN to convert to 0.

.NET plans on also having deterministic behavior long term (tracked by #61885) and we hope to complete this work in .NET 9. We will match what Rust does (which is also what Arm64 does and what the new x64 instructions being exposed in AVX10 will do) and to saturate, with NaN converting to 0. This likewise matches, conceptually, the IEEE 754 rules that exist for operations in general; which is that results are computed as if to infinite precision and unbounded range and then rounded, using the target rounding mode, to the nearest representable result. This means that out of range values would round to Max and Min for the given type, respectively.

Until that work is complete, you will have to manually handle these edge cases if that's important to you.

@tannergooding tannergooding added question Answer questions and provide assistance, not an issue with source code or documentation. needs-author-action An issue or pull request that requires more info or actions from the author. and removed untriaged New issue has not been triaged by the area owner labels Nov 28, 2023
@ghost
Copy link

ghost commented Nov 28, 2023

This issue has been marked needs-author-action and may be missing some important information.

@MCPtz
Copy link
Author

MCPtz commented Nov 28, 2023

Excellent explanation. Thanks for the effort.

@MCPtz MCPtz closed this as completed Nov 28, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 29, 2023
@tannergooding tannergooding removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Numerics question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

4 participants