Sync branch with main #2

BaLiKfromUA · 2024-08-20T22:02:43Z

No description provided.

Generate nuw GEPs for struct member accesses, as inbounds + non-negative implies nuw. Regression tests are updated using update scripts where possible, and by find + replace where not.

…spose(shape_cast) (llvm#100731)" (llvm#102457) This reverts commit 88accd9. This change can be dropped in favor of just llvm#102017.

Modifying `auto` to `auto&` to avoid unnecessary copying

…78112) https://cplusplus.github.io/CWG/issues/2627.html It is no longer a narrowing conversion when converting a bit-field to a type smaller than the field's declared type if the bit-field has a width small enough to fit in the target type. This includes integral promotions (`long long i : 8` promoted to `int` is no longer narrowing, allowing `c.i <=> c.i`) and list-initialization (`int n{ c.i };`) Also applies back to C++11 as this is a defect report.

…lvm#102573) Disables `vector.matrix_multiply` for scalable vectors. As per the docs: > This is the counterpart of llvm.matrix.multiply in MLIR I'm not aware of any use of matrix-multiply intrinsics in the context of scalable vectors, hence disabling.

…lir (llvm#102361)

…2535) Commit cee594c added support to clang for multiple expressions in `num_teams` clause. Add follow-up changes to flang.

As all the necessary information is encoded using attributes nowadays, this test doesn't actually depend on the triple anymore.

Syntacore SCR5 is an entry-level Linux-capable 32/64-bit RISC-V processor core. Overview: https://syntacore.com/products/scr5 Scheduling model will be added in a subsequent PR. Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com> Co-authored-by: Anton Afanasyev <anton.afanasyev@syntacore.com>

…lvm#96287)" This reverts commit ccb2b01. Causes buildbot failures, e.g. on ppc64le builders.

Follow up on 199d6f2 (LSV: document hang reported in llvm#37865) to fix the build when omitting the AArch64 target. Add the missing lit.local.cfg.

We should handle allocator attributes not only on function declarations, but also on the call-site. That way we can e.g. also optimize cases where the allocator function is a virtual function call. This was already supported in some of the MemoryBuiltins helpers, but not all of them. This adds support for allocsize, alloc-family and allockind("free").

…perations. (llvm#102105) The code-generator is currently not able to handle scalable vectors of <vscale x 1 x eltty>. The usual "fix" for this until it is supported is to mark the costs of loads/stores with an invalid cost, preventing the vectorizer from vectorizing at those factors. But on rare occasions loops do not contain load/stores, only reductions. So whilst this is still unsupported return an invalid cost to avoid selecting vscale x 1 VFs. The cost of a reduction is not currently used by the vectorizer so this adds the cost to the add/mul/and/or/xor or min/max that should feed the reduction. It includes reduction costs too, for completeness. This change will be removed when code-generation for these types is sufficiently reliable. Fixes llvm#99760

If nobuiltin is set, directly return nullptr instead of using a separate out parameter and having all callers check this.

This allows moving some tests relying on -stop-after=amdgpu-isel to move to checking -stop-after=finalize-isel instead, which will more reliably pass the verifier.

I forgot to update the version info in the SDKSettings file when I updated it to the real version relevant to the test.

Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to `llvm.func` operations. - `spir_kernel`/`spir_func` calling conventions used for kernels/functions. - `workgroup` attributions encoded as additional `llvm.ptr<3>` arguments. - No attribute used to annotate kernels - `reqd_work_group_size` attribute using to encode `gpu.known_block_size`. - `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution sizes. This will be attached to the pointer argument workgroup attributions lower to. **Note**: A notable missing feature that will be addressed in a follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>

This PR adds conversion patterns for MemRef to the `convert-to-spirv` pass, introduced in llvm#95942. Conversions from MemRef memory space to SPIR-V storage class were also included, and would run before the final dialect conversion phase. **Future Plans** - Add tests for ops other than `memref.load` and `memref.store` --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>

…llvm#102616) There's no need for them to have different types. Part of <llvm#62629>.

…#101407) This patch adds the code generation support for multi-dim `num_teams` clause when it is used with `target teams ompx_bare` construct.

…ack for `insertelement` (llvm#82130) Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue. There was a similar issue reported for `extractelement` which was fixed in a6614ec Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>

Make it possible to do things like the following, regardless of whether the offload target is nvptx or amdgpu: ``` $ clang -O1 -g -fopenmp --offload-arch=native test.c \ -Xoffload-linker -mllvm=-pass-remarks=inline \ -Xoffload-linker -mllvm=-force-remove-attribute=g.internalized:noinline\ -Xoffload-linker --lto-newpm-passes='forceattrs,default<O1>' \ -Xoffload-linker --lto-debug-pass-manager \ -foffload-lto ``` To accomplish that: - In clang-linker-wrapper, do not forward options via `-Wl` if they might have literal commas. Use `-Xlinker` instead. - In clang-nvlink-wrapper, accept `--lto-debug-pass-manager` and `--lto-newpm-passes`. - In clang-nvlink-wrapper, drop `-passes` because it's inconsistent with the interface of `lld`, which is used instead of clang-nvlink-wrapper when the target is amdgpu. Without this patch, `-passes` is passed to `nvlink`, producing an error anyway. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>

Without this, the doc string is put in a single line. These scripts have multi-line docstrings, so this makes their --help output look much nicer. Otherwise, no behavior change.

Inspired by llvm#99418 (which hopefully we can replace this code with at some point)

…m#102631)

The others are already inline here.

…g. (llvm#102650) Simplifies checks for AGPRs and VGPRs and makes them more explicit and less fragile.

Mention the names of unavailable registers in error messages to not make the diagnostics for execz/vccz less rich than it was. Clean up unnecessary name qualifications while there. Part of <llvm#62629>.

llvm#102123) …Type This is needed to ensure we find a type if its definition is in a CU that wasn't indexed. This can happen if the definition is in some precompiled code (e.g. the c++ standard library) which was built with different flags than the rest of the binary.

…ADS is not defined When LLVM_ENABLE_THREADS is not defined, llvm::get_threadid returns 0 which makes this test case fail. This is a pretty niche setting, Linaro uses it to stop lld crashing our 32 bit containers. So the test will get plenty of runs elsewhere. In lldb's code it's not getting the current thread ID anyway, it's using a value it got from ptrace. So even if that copy of lldb was built with LLVM_ENABLE_THREADS off, it should still be able to debug threads.

…uced in llvm#99732 (llvm#102716)

On PlayStation, allow users to supply -static to the linker, via the driver. An initial step. Later changes will have the PS5 driver supply additional options to the linker, if and when -static is passed. SIE tracker: TOOLCHAIN-16704

We only need to see that 1 frame of the stack is in user code. No need to carry on looking. Doing so actually caused a test failure on Armv8 Ubuntu Jammy where a libc function does not have a display name. I'm sure I'm going to get stung by this elsewhere, but for this test, breaking early sidesteps the problem.

The Mul factor was zero-extended here, resulting in incorrect results for integers larger than 64-bit. As we currently only multiply by 1 or -1, just split this into two cases -- there's no need for a full multiplication here. Fixes llvm#102597.

Implement FEAT_SME_B16B16 to enable ZA-targeting non-widening SME BFloat16 instructions. Remove the now redundant FEAT_B16B16 which has been replaced by FEAT_SVE_B16B16 and FEAT_SME_B16B16 (this commit), see llvm#101480 for the details and reasoning of this change to LLVM. FEAT_SME_B16B16 is documented under the latest Armv9.4 feature documentation: https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extensio - Changes to Clang AArch64 frontend - Change target guard of SME2 ZA-targeting non-widening BFloat16 intrinsics to 'sme-b16b16' - Changes to LLVM AArch64 backend - llvm/lib/Target/AArch64/AArch64Features.td - Create FeatureSMEB16B16, which implies FeatureSME2 and FeatureSVEB16B16 - Remove FeatureB16B16 - Fix description of FeatureSVEB16B16 - llvm/lib/Target/AArch64/AArch64InstrInfo.td - Create HasSMEB16B16 predicate - llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td - Change predictication of SME2 ZA-targeting non-widening BFloat16 instructions to new HasSMEB16B16 - llvm/lib/Target/AArch64/AArch64.td - Add HasSMEB16B16 to SME2Unsupported (FEAT_SME_B16B16 implies FEAT_SME2) - llvm/lib/AArch64/AsmParser/AArch64AsmParser.cpp - Remove flag 'b16b16' mapping to removed FeatureB16B16 - Add flag 'sme-b16b16' mapping to new FeatureSMEB16B16 - Changes to LLVM unit tests - llvm/unittests/TargetParser/TargetParserTest.cpp - Add new sme-b16b16 flag to existing target parser tests - Add tests for the sme-b16b16 dependencies: - 'sme-b16b16' should enable 'sme2', 'sve-b16b16'. - Remove 'b16b16' from bf16 dependency test - Added MC tests - llvm/test/MC/AArch64/SME2p1 - To ensure that ZA-targeting multi-vector non-widening BFloat16 instructions are enabled by +sme-b16b16, and that this feature is removed by +nosme-b61b6. - Modidified tests - All CodeGen, Semantic, and MC tests that are effected by the removal of 'b16b16', have been modified to supply and/or expect 'sme-b16b16' where appropriate.

Include chain of ops feeding inductions in cost precomputation for inductions, not just the induction increment. In VPlan, those instructions will be cleaned up, as both phi and increment are generated by VPWidenIntOrFpInductionRecipe independently. Fixes llvm#101337.

This PR fixes emission of valid OpLifestart/OpLifestop instructions. According to https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpLifetimeStart: "Size must be 0 if Pointer is a pointer to a non-void type or the Addresses [capability](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Capability) is not declared.". The `Size` argument is set the corresponding intrinsics arguments, so Size is not zero we must ensure that Pointer has the required type by inserting a bitcast if needed.

…01732) This PR contains changes in virtual register processing aimed to improve correctness of emitted MIR between passes from the perspective of MachineVerifier. This potentially helps to detect previously missed flaws in code emission and harden the test suite. As a measure of correctness and usefulness of this PR we may use a mode with expensive checks set on, and MachineVerifier reports problems in the test suite. In order to satisfy Machine Verifier requirements to MIR correctness not only a rework of usage of virtual registers' types and classes is required, but also corrections into pre-legalizer and instruction selection logics. Namely, the following changes are introduced: * scalar virtual registers have proper bit width, * detect register class by SPIR-V type, * add a superclass for id virtual register classes, * fix Tablegen rules used for instruction selection, * fixes of minor existed issues (missed flag for proper representation of a null constant for OpenCL vs. HLSL, wrong usage of integer virtual registers as a synonym of any non-type virtual register).

* rename CXXIndeterminateSpliceExpr in the readme too Signed-off-by: delimbetov <1starfall1@gmail.com> * make TryAnnotateOptionalCXXScopeToken work Signed-off-by: delimbetov <1starfall1@gmail.com> * make splice work in requires clause Signed-off-by: delimbetov <1starfall1@gmail.com> * add tests for splice in requires expr Signed-off-by: delimbetov <1starfall1@gmail.com> * add typename and newline at the end of the file Signed-off-by: delimbetov <1starfall1@gmail.com> * add comments Signed-off-by: delimbetov <1starfall1@gmail.com> --------- Signed-off-by: delimbetov <1starfall1@gmail.com>

Some work remains: In particular, if this is going to "work" (i.e., supported by P2996), we need to think carefully about reachability, TU-local entities, etc. There probably need to be some constraints around use of imported reflections, and possibly some 'is_reachable' metafunction. Not entirely sure - need to experiment further. Closes issue bloomberg#4.

TBD whether to keep this, but adding it so it can be played around with.

…ions (bloomberg#89) * basic impl Signed-off-by: delimbetov <1starfall1@gmail.com> * add test for the new storage duration funcs Signed-off-by: delimbetov <1starfall1@gmail.com> * code style Signed-off-by: delimbetov <1starfall1@gmail.com> * run libcxx generators to pass CI Signed-off-by: delimbetov <1starfall1@gmail.com> * fix identation Signed-off-by: delimbetov <1starfall1@gmail.com> --------- Signed-off-by: delimbetov <1starfall1@gmail.com>

Closes issue bloomberg#87.

…lvm#104148) `hasOperands` does not always execute matchers in the order they are written. This can cause issue in code using bindings when one operand matcher is relying on a binding set by the other. With this change, the first matcher present in the code is always executed first and any binding it sets are available to the second matcher. Simple example with current version (1 match) and new version (2 matches): ```bash > cat tmp.cpp int a = 13; int b = ((int) a) - a; int c = a - ((int) a); > clang-query tmp.cpp clang-query> set traversal IgnoreUnlessSpelledInSource clang-query> m binaryOperator(hasOperands(cStyleCastExpr(has(declRefExpr(hasDeclaration(valueDecl().bind("d"))))), declRefExpr(hasDeclaration(valueDecl(equalsBoundNode("d")))))) Match #1: tmp.cpp:1:1: note: "d" binds here int a = 13; ^~~~~~~~~~ tmp.cpp:2:9: note: "root" binds here int b = ((int)a) - a; ^~~~~~~~~~~~ 1 match. > ./build/bin/clang-query tmp.cpp clang-query> set traversal IgnoreUnlessSpelledInSource clang-query> m binaryOperator(hasOperands(cStyleCastExpr(has(declRefExpr(hasDeclaration(valueDecl().bind("d"))))), declRefExpr(hasDeclaration(valueDecl(equalsBoundNode("d")))))) Match #1: tmp.cpp:1:1: note: "d" binds here 1 | int a = 13; | ^~~~~~~~~~ tmp.cpp:2:9: note: "root" binds here 2 | int b = ((int)a) - a; | ^~~~~~~~~~~~ Match #2: tmp.cpp:1:1: note: "d" binds here 1 | int a = 13; | ^~~~~~~~~~ tmp.cpp:3:9: note: "root" binds here 3 | int c = a - ((int)a); | ^~~~~~~~~~~~ 2 matches. ``` If this should be documented or regression tested anywhere please let me know where.

…104523) Compilers and language runtimes often use helper functions that are fundamentally uninteresting when debugging anything but the compiler/runtime itself. This patch introduces a user-extensible mechanism that allows for these frames to be hidden from backtraces and automatically skipped over when navigating the stack with `up` and `down`. This does not affect the numbering of frames, so `f <N>` will still provide access to the hidden frames. The `bt` output will also print a hint that frames have been hidden. My primary motivation for this feature is to hide thunks in the Swift programming language, but I'm including an example recognizer for `std::function::operator()` that I wished for myself many times while debugging LLDB. rdar://126629381 Example output. (Yes, my proof-of-concept recognizer could hide even more frames if we had a method that returned the function name without the return type or I used something that isn't based off regex, but it's really only meant as an example). before: ``` (lldb) thread backtrace --filtered=false * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12 frame bloomberg#4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10 frame bloomberg#5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12 frame bloomberg#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame bloomberg#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame bloomberg#8: 0x0000000183cdf154 dyld`start + 2476 (lldb) ``` after ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame bloomberg#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame bloomberg#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame bloomberg#8: 0x0000000183cdf154 dyld`start + 2476 Note: Some frames were hidden by frame recognizers ```

hazzlim and others added 30 commits August 9, 2024 13:25

[IRBuilder] Generate nuw GEPs for struct member accesses (llvm#99538)

94473f4

Generate nuw GEPs for struct member accesses, as inbounds + non-negative implies nuw. Regression tests are updated using update scripts where possible, and by find + replace where not.

Revert "[mlir][ArmSME] Pattern to swap shape_cast(tranpose) with tran…

fc4485b

…spose(shape_cast) (llvm#100731)" (llvm#102457) This reverts commit 88accd9. This change can be dropped in favor of just llvm#102017.

[NFC] Use references to avoid copying (llvm#99863)

3e806c8

Modifying `auto` to `auto&` to avoid unnecessary copying

[mlir][vector] Add tests for scalable vectors in one-shot-bufferize.m…

24be4d5

…lir (llvm#102361)

[flang][OpenMP] Handle multiple ranges in num_teams clause (llvm#10…

3064646

…2535) Commit cee594c added support to clang for multiple expressions in `num_teams` clause. Add follow-up changes to flang.

[InstCombine] Remove unnecessary RUN line from test (NFC)

0795ab4

As all the necessary information is encoded using attributes nowadays, this test doesn't actually depend on the triple anymore.

Revert "Enable logf128 constant folding for hosts with 128bit floats (l…

a15de17

…lvm#96287)" This reverts commit ccb2b01. Causes buildbot failures, e.g. on ppc64le builders.

LSV/test/AArch64: add missing lit.local.cfg; fix build (llvm#102607)

fff78a5

Follow up on 199d6f2 (LSV: document hang reported in llvm#37865) to fix the build when omitting the AArch64 target. Add the missing lit.local.cfg.

Unnamed bitfields are not nonstatic data members.

8ce6449

[MemoryBuiltins] Simplify getCalledFunction() helper (NFC)

5bc1f9e

If nobuiltin is set, directly return nullptr instead of using a separate out parameter and having all callers check this.

AMDGPU/NewPM: Port SIFixSGPRCopies to new pass manager (llvm#102614)

cf54cae

This allows moving some tests relying on -stop-after=amdgpu-isel to move to checking -stop-after=finalize-isel instead, which will more reliably pass the verifier.

[llvm-readobj][COFF] Dump hybrid objects for ARM64X files. (llvm#102245)

1d77dd5

Fix a unit test input file (llvm#102567)

4c5ef66

I forgot to update the version info in the SDKSettings file when I updated it to the real version relevant to the test.

[libc][math][c23] Add totalorderl function. (llvm#102564)

ff1cc5b

[AMDGPU][AsmParser][NFCI] All NamedIntOperands to be of the i32 type. (…

335bc3c

…llvm#102616) There's no need for them to have different types. Part of <llvm#62629>.

[ARM] Regenerate big-endian-vmov.ll. NFC

dad1cb9

[Clang][OMPX] Add the code generation for multi-dim num_teams (llvm…

ee8100b

…#101407) This patch adds the code generation support for multi-dim `num_teams` clause when it is used with `target teams ompx_bare` construct.

[bazel] Port for d45de80

3bd63d4

[bazel] Add missing dep for the SPIRVToLLVM target

5c0eb1a

[gn] Give two scripts argparse.RawDescriptionHelpFormatter

f4d5b14

Without this, the doc string is put in a single line. These scripts have multi-line docstrings, so this makes their --help output look much nicer. Otherwise, no behavior change.

[X86] Convert truncsat clamping patterns to use SDPatternMatch. NFC.

669d844

Inspired by llvm#99418 (which hopefully we can replace this code with at some point)

bjope and others added 27 commits August 12, 2024 13:28

Clean up pointer casts etc after opaque pointers transition. NFC (llv…

145aff6

…m#102631)

TargetMachine: Move trivial setter/getter to header

7fe486a

The others are already inline here.

[AMDGPU][NFCI] Mark AGPRs and VGPRs with different flags in HWEncodin…

c7107ca

…g. (llvm#102650) Simplifies checks for AGPRs and VGPRs and makes them more explicit and less fragile.

[AMDGPU][AsmParser] Eliminate validateExeczVcczOperands(). (llvm#102600)

7727853

Mention the names of unavailable registers in error messages to not make the diagnostics for execz/vccz less rich than it was. Clean up unnecessary name qualifications while there. Part of <llvm#62629>.

[Clang][OpenMP] Fix the wrong transform of num_teams claused introd…

aa86e5b

…uced in llvm#99732 (llvm#102716)

[IndVars] Add test for llvm#102597 (NFC)

c876761

[SLP][NFC]Use local getShuffleCost function across the code, NFC.

34514ce

Fix an obscure crash with substitution.

c8a4568

Merge branch 'main' into p2996

78a4192

Add 'is_access_specified' metafunction.

1973df5

TBD whether to keep this, but adding it so it can be played around with.

Split 'is_alias' into 'is_type_alias' and 'is_namespace_alias'.

ecd638b

Add <experimental/meta> to std module.

3d897ec

s/meta type/consteval-only type.

b31a899

Closes issue bloomberg#87.

Mandates instead of Constant When for reflect_*().

43e19fb

s/is_special_member/is_special_member_function

8d34e90

BaLiKfromUA closed this Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync branch with main #2

Sync branch with main #2

BaLiKfromUA commented Aug 20, 2024

Sync branch with main #2

Sync branch with main #2

Conversation

BaLiKfromUA commented Aug 20, 2024