Scalar/Packed conversions for floating point to integer (#97529)

* merging with main Initial changes for scalar conversion double -> ulong * Basic working version of double -> ulong saturation * Moving the code in a do-while with proper checks to amke sure we are adding the fixup node at all cases * adjusting comments * Merging with main Saturating NaN to 0 and also adding Dbl2Ulng implementation in MathHelpers. Adding vector conversion support for double /float -> ulong conversion * removing conflicts from gentree.h flags merging with main doubel to uint conversion * float to uint conversion verified. removing commented code * merging with main. Making changes to simdashwintrinsic.cpp and listxarch.h float -> uint packed conversion * progress on double to long morphing * another attempt at double to long conversion * Merge with main Merge with main adding a new helper function ofr float to uint scalar conversion for SSE2. * adding handling for scalar conversion cases for SSE2. Remaining float/double -> long/int for AVX512. * partial changes for float to int conversion using double to int for avx512. vfixup not working. next step is to fix the vfixup instruction and get it working * adding float to int working scalar conversion case. Working on vectro case here on. * partial work on float to int packed conversion * partial version of float to int conversion * working version of float to int scalar/packed for avx512 * complete conversions code for floating point to integral conversions for scalar/packed for SSE / avx512 * Merging with main. fixing out of range test case adn adding conversion changes to simdashwintrinsic * fixing debug checks hitting asserts for TYP_ULONG and TYP_UINT at IR level * adding JIT_Dbl2Int for target_x86 and other architectures. * Supporting x86 for saturating conversions as well * fixing errors in packed conversion * accomodate unsigned in IR * adding evex support for cvttss2si * Mergw with main defining nativeaot helpers for x86 * Catch divide by zero exception * Handle overflow cases * Fix tests to check saturating behavior * Correct mapping of instructions * Convert float -> ulong / long as float -> double -> ulong / long * Merging with main Initial changes for scalar conversion double -> ulong * Merging with main adjusting comments * removing conflicts from gentree.h flags merging with main doubel to uint conversion * merging with main. Making changes to simdashwintrinsic.cpp and listxarch.h float -> uint packed conversion * adding a new helper function ofr float to uint scalar conversion for SSE2. * Merging with main adding handling for scalar conversion cases for SSE2. Remaining float/double -> long/int for AVX512. * partial changes for float to int conversion using double to int for avx512. vfixup not working. next step is to fix the vfixup instruction and get it working * partial version of float to int conversion * working version of float to int scalar/packed for avx512 * Merging with main. fixing out of range test case adn adding conversion changes to simdashwintrinsic * Changing the way helper functions are handled in morph fixing debug checks hitting asserts for TYP_ULONG and TYP_UINT at IR level * adding JIT_Dbl2Int for target_x86 and other architectures. * Supporting x86 for saturating conversions as well * fixing errors in packed conversion * Correct mapping of instructions * delete extra files * Merging main review changes * Merge with main and adding new helpers in nativeaot Rebasing with main * changing type of cast node as signed when making cast nodes * Avoiding removing extra element from the stack * Fix formatting, Change comp->IsaSupportedDebugOnly to IsBaselineVector512SupportedDebugOnly * Reverting some changes to maintain uniformity in code * Handling cases where AVX512 is not supported in simdashwintrinsic.cpp * fixing exit conditions for ConvertVectorT_ToDouble * Check for AVX512 support for TARGET_XARCH * Avoid avx512 path for x86 * Enable AVX512F codepath for conversions in x86 arch. Move x86 to using c++ helpers * Add SSE41 path for scalar conversions and 128 bit float to int packed conversions * Adding SSE41 path for floating point to UINT scalar conversions * Add AVX path for ConvertToInt32 * Adding comments and cleaning the code * Fix errors in double to ulong * Addressing review comments * Fix tests * Reverse val < 0 check in dbltoUint and dbltoUlng helpers * Add overflow conversions for 86/x64, remove FastDbl2Lng and inline it * Apply suggestions from code review Co-authored-by: Jan Kotas <jkotas@microsoft.com> * Correct Dbl2UlngOvf * Apply suggestions from code review * Apply suggestions from code review * Update src/coreclr/vm/jithelpers.cpp * Disable failing mono tests * Working version of saturating logic moved to lowering for x86/x64 * Making changes for pre SSE41 * Apply suggestions from code review Co-authored-by: Jan Kotas <jkotas@microsoft.com> * Removing dead code * Fix formatting * Address review comments, add proper docstrings --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com>
dotnet · Apr 5, 2024 · 1a7904e · 1a7904e
1 parent beac274
commit 1a7904e
Show file tree

Hide file tree

Showing 30 changed files with 987 additions and 597 deletions.
diff --git a/src/coreclr/inc/jithelpers.h b/src/coreclr/inc/jithelpers.h
@@ -55,11 +55,11 @@
     JITHELPER(CORINFO_HELP_ULMOD,               JIT_ULMod,          CORINFO_HELP_SIG_16_STACK)
     JITHELPER(CORINFO_HELP_LNG2DBL,             JIT_Lng2Dbl,        CORINFO_HELP_SIG_8_STACK)
     JITHELPER(CORINFO_HELP_ULNG2DBL,            JIT_ULng2Dbl,       CORINFO_HELP_SIG_8_STACK)
-    DYNAMICJITHELPER(CORINFO_HELP_DBL2INT,      JIT_Dbl2Lng,        CORINFO_HELP_SIG_8_STACK)
+    JITHELPER(CORINFO_HELP_DBL2INT,             JIT_Dbl2Int,        CORINFO_HELP_SIG_8_STACK)
     JITHELPER(CORINFO_HELP_DBL2INT_OVF,         JIT_Dbl2IntOvf,     CORINFO_HELP_SIG_8_STACK)
-    DYNAMICJITHELPER(CORINFO_HELP_DBL2LNG,      JIT_Dbl2Lng,        CORINFO_HELP_SIG_8_STACK)
+    JITHELPER(CORINFO_HELP_DBL2LNG,             JIT_Dbl2Lng,        CORINFO_HELP_SIG_8_STACK)
     JITHELPER(CORINFO_HELP_DBL2LNG_OVF,         JIT_Dbl2LngOvf,     CORINFO_HELP_SIG_8_STACK)
-    DYNAMICJITHELPER(CORINFO_HELP_DBL2UINT,     JIT_Dbl2Lng,        CORINFO_HELP_SIG_8_STACK)
+    JITHELPER(CORINFO_HELP_DBL2UINT,            JIT_Dbl2UInt,       CORINFO_HELP_SIG_8_STACK)
     JITHELPER(CORINFO_HELP_DBL2UINT_OVF,        JIT_Dbl2UIntOvf,    CORINFO_HELP_SIG_8_STACK)
     JITHELPER(CORINFO_HELP_DBL2ULNG,            JIT_Dbl2ULng,       CORINFO_HELP_SIG_8_STACK)
     JITHELPER(CORINFO_HELP_DBL2ULNG_OVF,        JIT_Dbl2ULngOvf,    CORINFO_HELP_SIG_8_STACK)

diff --git a/src/coreclr/jit/codegenxarch.cpp b/src/coreclr/jit/codegenxarch.cpp
@@ -7602,21 +7602,24 @@ void CodeGen::genFloatToIntCast(GenTree* treeNode)
     noway_assert((dstSize == EA_ATTR(genTypeSize(TYP_INT))) || (dstSize == EA_ATTR(genTypeSize(TYP_LONG))));
 
     // We shouldn't be seeing uint64 here as it should have been converted
-    // into a helper call by either front-end or lowering phase.
-    assert(!varTypeIsUnsigned(dstType) || (dstSize != EA_ATTR(genTypeSize(TYP_LONG))));
+    // into a helper call by either front-end or lowering phase, unless we have AVX512F
+    // accelerated conversions.
+    assert(!varTypeIsUnsigned(dstType) || (dstSize != EA_ATTR(genTypeSize(TYP_LONG))) ||
+           compiler->compIsaSupportedDebugOnly(InstructionSet_AVX512F));
 
     // If the dstType is TYP_UINT, we have 32-bits to encode the
     // float number. Any of 33rd or above bits can be the sign bit.
     // To achieve it we pretend as if we are converting it to a long.
-    if (varTypeIsUnsigned(dstType) && (dstSize == EA_ATTR(genTypeSize(TYP_INT))))
+    if (varTypeIsUnsigned(dstType) && (dstSize == EA_ATTR(genTypeSize(TYP_INT))) &&
+        !compiler->compOpportunisticallyDependsOn(InstructionSet_AVX512F))
     {
         dstType = TYP_LONG;
     }
 
     // Note that we need to specify dstType here so that it will determine
     // the size of destination integer register and also the rex.w prefix.
     genConsumeOperands(treeNode->AsOp());
-    instruction ins = ins_FloatConv(TYP_INT, srcType, emitTypeSize(srcType));
+    instruction ins = ins_FloatConv(dstType, srcType, emitTypeSize(srcType));
     GetEmitter()->emitInsBinary(ins, emitTypeSize(dstType), treeNode, op1);
     genProduceReg(treeNode);
 }

diff --git a/src/coreclr/jit/compiler.h b/src/coreclr/jit/compiler.h
@@ -3204,6 +3204,14 @@ class Compiler
                                  CorInfoType simdBaseJitType,
                                  unsigned    simdSize);
 
+#if defined(TARGET_XARCH)
+    GenTree* gtNewSimdCvtNode(var_types              type,
+                              GenTree*               op1,
+                              CorInfoType            simdTargetBaseJitType,
+                              CorInfoType            simdSourceBaseJitType,
+                              unsigned               simdSize);
+#endif //TARGET_XARCH
+
     GenTree* gtNewSimdCreateBroadcastNode(
         var_types type, GenTree* op1, CorInfoType simdBaseJitType, unsigned simdSize);
 

diff --git a/src/coreclr/jit/emit.h b/src/coreclr/jit/emit.h
@@ -4012,7 +4012,8 @@ emitAttr emitter::emitGetBaseMemOpSize(instrDesc* id) const
         case INS_comiss:
         case INS_cvtss2sd:
         case INS_cvtss2si:
-        case INS_cvttss2si:
+        case INS_cvttss2si32:
+        case INS_cvttss2si64:
         case INS_divss:
         case INS_extractps:
         case INS_insertps:
@@ -4055,7 +4056,8 @@ emitAttr emitter::emitGetBaseMemOpSize(instrDesc* id) const
         case INS_comisd:
         case INS_cvtsd2si:
         case INS_cvtsd2ss:
-        case INS_cvttsd2si:
+        case INS_cvttsd2si32:
+        case INS_cvttsd2si64:
         case INS_divsd:
         case INS_maxsd:
         case INS_minsd:

diff --git a/src/coreclr/jit/emitxarch.cpp b/src/coreclr/jit/emitxarch.cpp
@@ -1522,9 +1522,11 @@ bool emitter::TakesRexWPrefix(const instrDesc* id) const
         switch (ins)
         {
             case INS_cvtss2si:
-            case INS_cvttss2si:
+            case INS_cvttss2si32:
+            case INS_cvttss2si64:
             case INS_cvtsd2si:
-            case INS_cvttsd2si:
+            case INS_cvttsd2si32:
+            case INS_cvttsd2si64:
             case INS_movd:
             case INS_movnti:
             case INS_andn:
@@ -1544,7 +1546,6 @@ bool emitter::TakesRexWPrefix(const instrDesc* id) const
 #endif // TARGET_AMD64
             case INS_vcvtsd2usi:
             case INS_vcvtss2usi:
-            case INS_vcvttsd2usi:
             {
                 if (attr == EA_8BYTE)
                 {
@@ -2723,8 +2724,10 @@ bool emitter::emitInsCanOnlyWriteSSE2OrAVXReg(instrDesc* id)
         case INS_blsmsk:
         case INS_blsr:
         case INS_bzhi:
-        case INS_cvttsd2si:
-        case INS_cvttss2si:
+        case INS_cvttsd2si32:
+        case INS_cvttsd2si64:
+        case INS_cvttss2si32:
+        case INS_cvttss2si64:
         case INS_cvtsd2si:
         case INS_cvtss2si:
         case INS_extractps:
@@ -2748,7 +2751,8 @@ bool emitter::emitInsCanOnlyWriteSSE2OrAVXReg(instrDesc* id)
 #endif
         case INS_vcvtsd2usi:
         case INS_vcvtss2usi:
-        case INS_vcvttsd2usi:
+        case INS_vcvttsd2usi32:
+        case INS_vcvttsd2usi64:
         case INS_vcvttss2usi32:
         case INS_vcvttss2usi64:
         {
@@ -11605,22 +11609,20 @@ void emitter::emitDispIns(
                     break;
                 }
 
-                case INS_cvttsd2si:
+                case INS_cvttsd2si32:
+                case INS_cvttsd2si64:
                 case INS_cvtss2si:
                 case INS_cvtsd2si:
-                case INS_cvttss2si:
+                case INS_cvttss2si32:
+                case INS_cvttss2si64:
                 case INS_vcvtsd2usi:
                 case INS_vcvtss2usi:
-                case INS_vcvttsd2usi:
-                {
-                    printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_16BYTE));
-                    break;
-                }
-
+                case INS_vcvttsd2usi32:
+                case INS_vcvttsd2usi64:
                 case INS_vcvttss2usi32:
                 case INS_vcvttss2usi64:
                 {
-                    printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_4BYTE));
+                    printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_16BYTE));
                     break;
                 }
 
@@ -19048,7 +19050,8 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
             break;
         }
 
-        case INS_cvttsd2si:
+        case INS_cvttsd2si32:
+        case INS_cvttsd2si64:
         case INS_cvtsd2si:
         case INS_cvtsi2sd32:
         case INS_cvtsi2ss32:
@@ -19057,7 +19060,8 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
         case INS_vcvtsd2usi:
         case INS_vcvtusi2ss32:
         case INS_vcvtusi2ss64:
-        case INS_vcvttsd2usi:
+        case INS_vcvttsd2usi32:
+        case INS_vcvttsd2usi64:
         case INS_vcvttss2usi32:
             result.insThroughput = PERFSCORE_THROUGHPUT_1C;
             result.insLatency += PERFSCORE_LATENCY_7C;
@@ -19069,7 +19073,8 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
             result.insLatency += PERFSCORE_LATENCY_5C;
             break;
 
-        case INS_cvttss2si:
+        case INS_cvttss2si32:
+        case INS_cvttss2si64:
         case INS_cvtss2si:
         case INS_vcvtss2usi:
             result.insThroughput = PERFSCORE_THROUGHPUT_1C;