Sync branch with main #2

Generate nuw GEPs for struct member accesses, as inbounds + non-negative implies nuw. Regression tests are updated using update scripts where possible, and by find + replace where not.

…spose(shape_cast) (llvm#100731)" (llvm#102457) This reverts commit 88accd9. This change can be dropped in favor of just llvm#102017.

Modifying `auto` to `auto&` to avoid unnecessary copying

…78112) https://cplusplus.github.io/CWG/issues/2627.html It is no longer a narrowing conversion when converting a bit-field to a type smaller than the field's declared type if the bit-field has a width small enough to fit in the target type. This includes integral promotions (`long long i : 8` promoted to `int` is no longer narrowing, allowing `c.i <=> c.i`) and list-initialization (`int n{ c.i };`) Also applies back to C++11 as this is a defect report.

…lvm#102573) Disables `vector.matrix_multiply` for scalable vectors. As per the docs: > This is the counterpart of llvm.matrix.multiply in MLIR I'm not aware of any use of matrix-multiply intrinsics in the context of scalable vectors, hence disabling.

…lir (llvm#102361)

…2535) Commit cee594c added support to clang for multiple expressions in `num_teams` clause. Add follow-up changes to flang.

As all the necessary information is encoded using attributes nowadays, this test doesn't actually depend on the triple anymore.

Syntacore SCR5 is an entry-level Linux-capable 32/64-bit RISC-V processor core. Overview: https://syntacore.com/products/scr5 Scheduling model will be added in a subsequent PR. Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com> Co-authored-by: Anton Afanasyev <anton.afanasyev@syntacore.com>

…lvm#96287)" This reverts commit ccb2b01. Causes buildbot failures, e.g. on ppc64le builders.

Follow up on 199d6f2 (LSV: document hang reported in llvm#37865) to fix the build when omitting the AArch64 target. Add the missing lit.local.cfg.

We should handle allocator attributes not only on function declarations, but also on the call-site. That way we can e.g. also optimize cases where the allocator function is a virtual function call. This was already supported in some of the MemoryBuiltins helpers, but not all of them. This adds support for allocsize, alloc-family and allockind("free").

…perations. (llvm#102105) The code-generator is currently not able to handle scalable vectors of <vscale x 1 x eltty>. The usual "fix" for this until it is supported is to mark the costs of loads/stores with an invalid cost, preventing the vectorizer from vectorizing at those factors. But on rare occasions loops do not contain load/stores, only reductions. So whilst this is still unsupported return an invalid cost to avoid selecting vscale x 1 VFs. The cost of a reduction is not currently used by the vectorizer so this adds the cost to the add/mul/and/or/xor or min/max that should feed the reduction. It includes reduction costs too, for completeness. This change will be removed when code-generation for these types is sufficiently reliable. Fixes llvm#99760

If nobuiltin is set, directly return nullptr instead of using a separate out parameter and having all callers check this.

This allows moving some tests relying on -stop-after=amdgpu-isel to move to checking -stop-after=finalize-isel instead, which will more reliably pass the verifier.

I forgot to update the version info in the SDKSettings file when I updated it to the real version relevant to the test.

Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to `llvm.func` operations. - `spir_kernel`/`spir_func` calling conventions used for kernels/functions. - `workgroup` attributions encoded as additional `llvm.ptr<3>` arguments. - No attribute used to annotate kernels - `reqd_work_group_size` attribute using to encode `gpu.known_block_size`. - `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution sizes. This will be attached to the pointer argument workgroup attributions lower to. **Note**: A notable missing feature that will be addressed in a follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>

This PR adds conversion patterns for MemRef to the `convert-to-spirv` pass, introduced in llvm#95942. Conversions from MemRef memory space to SPIR-V storage class were also included, and would run before the final dialect conversion phase. **Future Plans** - Add tests for ops other than `memref.load` and `memref.store` --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>

…llvm#102616) There's no need for them to have different types. Part of <llvm#62629>.

…#101407) This patch adds the code generation support for multi-dim `num_teams` clause when it is used with `target teams ompx_bare` construct.

…ack for `insertelement` (llvm#82130) Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue. There was a similar issue reported for `extractelement` which was fixed in a6614ec Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>

Make it possible to do things like the following, regardless of whether the offload target is nvptx or amdgpu: ``` $ clang -O1 -g -fopenmp --offload-arch=native test.c \ -Xoffload-linker -mllvm=-pass-remarks=inline \ -Xoffload-linker -mllvm=-force-remove-attribute=g.internalized:noinline\ -Xoffload-linker --lto-newpm-passes='forceattrs,default<O1>' \ -Xoffload-linker --lto-debug-pass-manager \ -foffload-lto ``` To accomplish that: - In clang-linker-wrapper, do not forward options via `-Wl` if they might have literal commas. Use `-Xlinker` instead. - In clang-nvlink-wrapper, accept `--lto-debug-pass-manager` and `--lto-newpm-passes`. - In clang-nvlink-wrapper, drop `-passes` because it's inconsistent with the interface of `lld`, which is used instead of clang-nvlink-wrapper when the target is amdgpu. Without this patch, `-passes` is passed to `nvlink`, producing an error anyway. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>

Without this, the doc string is put in a single line. These scripts have multi-line docstrings, so this makes their --help output look much nicer. Otherwise, no behavior change.

Inspired by llvm#99418 (which hopefully we can replace this code with at some point)

…opeForCallOperatorInstantiationRAII (llvm#100766) This PR addresses issues related to the handling of `init capture` with parameter packs in Clang's `LambdaScopeForCallOperatorInstantiationRAII`. Previously, `addInstantiatedCapturesToScope` would add `init capture` containing packs to the scope using the type of the `init capture` to determine the expanded pack size. However, this approach resulted in a pack size of 0 because `getType()->containsUnexpandedParameterPack()` returns `false`. After extensive testing, it appears that the correct pack size can only be inferred from `getInit`. But `getInit` may reference parameters and `init capture` from an outer lambda, as shown in the following example: ```cpp auto L = [](auto... z) { return [... w = z](auto... y) { // ... }; }; ``` To address this, `addInstantiatedCapturesToScope` in `LambdaScopeForCallOperatorInstantiationRAII` should be called last. Additionally, `addInstantiatedCapturesToScope` has been modified to only add `init capture` to the scope. The previous implementation incorrectly called `MakeInstantiatedLocalArgPack` for other non-init captures containing packs, resulting in a pack size of 0. ### Impact This patch affects scenarios where `LambdaScopeForCallOperatorInstantiationRAII` is passed with `ShouldAddDeclsFromParentScope = false`, preventing the correct addition of the current lambda's `init capture` to the scope. There are two main scenarios for `ShouldAddDeclsFromParentScope = false`: 1. **Constraints**: Sometimes constraints are instantiated in place rather than delayed. In this case, `LambdaScopeForCallOperatorInstantiationRAII` does not need to add `init capture` to the scope. 2. **`noexcept` Expressions**: The expressions inside `noexcept` have already been transformed, and the packs referenced within have been expanded. Only `RebuildLambdaInfo` needs to add the expanded captures to the scope, without requiring `addInstantiatedCapturesToScope` from `LambdaScopeForCallOperatorInstantiationRAII`. ### Considerations An alternative approach could involve adding a data structure within the lambda to record the expanded size of the `init capture` pack. However, this would increase the lambda's size and require extensive modifications. This PR is a prerequisite for implmenting llvm#61426

Tracking a set containing every block and operation visited can become very expensive and is unnecessary. Co-authored-by: Will Dietz <w@wdtz.org>

llvm#101978) Default attributes assigned to all functions according to the command line parameters. Some functions might have their own attributes and we need to set or remove attributes accordingly. Tests are updated to test this scenarios too.

The work of ParseRegularReg() should remain to be parsing the register as it was specified, and not to try translate it to anything else. It's up to operand predicates to decide on what is and is not an acceptable register for an operand, including considering its expected register class, and for the rest of the AsmParser infrastructure to handle it respectively from there on.

This transform op makes it possible to query attributes associated to IR by means of the DLTI dialect. The op takes both a `key` and a target `op` to perform the query at. Facility functions automatically find the closest ancestor op which defines the appropriate DLTI interface or has an attribute implementing a DLTI interface. By default the lookup uses the data layout interfaces of DLTI. If the optional `device` parameter is provided, the lookup happens with respect to the interfaces for TargetSystemSpec and TargetDeviceSpec. This op uses new free-standing functions in the `dlti` namespace to not only look up specifications via the `DataLayoutSpecOpInterface` and on `ModuleOp`s but also on any ancestor op that has an appropriate DLTI attribute.

…m#102553) With opaque pointers, nothing directly uses the value type, so we can mutate it if we want. This avoid doing a complicated RAUW dance.

…lvm#102584) Move tests that exercise DropUnitDimFromElementwiseOps and DropUnitDimsFromTransposeOp to a dedicated file. While these patterns are collected under populateFlattenVectorTransferPatterns (and are tested via -test-vector-transfer-flatten-patterns), they can actually be tested without the xfer Ops, and hence the split. Note, this is mostly just moving tests from one file to another. The only real change is the removal of the following check-lines: ```mlir // CHECK-128B-NOT: memref.collapse_shape ``` These were added specifically to check the "flattening" logic (which introduces `memref.collapse_shape`). However, these tests were never meant to test that logic (in fact, that's the reason I am moving them to a different file) and hence are being removed as copy&paste errors. I also removed the following TODO: ```mlir /// TODO: Potential duplication with tests from: /// * "vector-dropleadunitdim-transforms.mlir" /// * "vector-transfer-drop-unit-dims-patterns.mlir" ``` I've checked what patterns are triggered in those test files and neither DropUnitDimFromElementwiseOps nor DropUnitDimsFromTransposeOp does.

…101561)" This reverts commit 8f21ff9. Crashed CI builds

This generalizes MSan's Arm NEON vst support, to include the lane-specific variants. This also updates the test from llvm#100645.

Co-authored-by: OverMighty <its.overmighty@gmail.com>

…m#102086) Currently `AMDGPUAttributorPass` is registered in default optimizer pipeline. This will allow the pass to run in default pipeline as well as at thinLTO post link stage. However, it will not run in full LTO post link stage. This patch moves it to full LTO.

This ports a fix from memprof (llvm#98510), which has a shadow mapping that is similar to ASan (8 bytes of shadow memory per 64 bytes of app memory). This patch changes the allocator to dynamically choose a base address, as suggested by Vitaly for memprof. This simplifies ASan's #ifdef's and avoids potential conflict in the event that ASan were to switch to a dynamic shadow offset in the future [1]. [1] Since shadow memory is mapped before the allocator is mapped: - dynamic shadow and fixed allocator (old memprof): could fail if "unlucky" (e.g., https://lab.llvm.org/buildbot/#/builders/66/builds/1361/steps/17/logs/stdio) - dynamic shadow and dynamic allocator (HWASan; current memprof): always works - fixed shadow and fixed allocator (current ASan): always works, if constants are carefully chosen - fixed shadow and dynamic allocator (ASan with this patch): always works

When working on very busy systems, check-offload frequently fails many tests with this diagnostic: ``` clang: error: cannot determine amdgcn architecture: /tmp/llvm/build/bin/amdgpu-arch: Child timed out: ; consider passing it via '-march' ``` This patch accepts the environment variable `CLANG_TOOLCHAIN_PROGRAM_TIMEOUT` to set the timeout. It also increases the timeout from 10 to 60 seconds.

1. Add MipsPat to optimize (andi (srl (truncate i64 $1), x), y) to (andi (truncate (dsrl i64 $1, x)), y). 2. Add MipsPat to optimize (ext (truncate i64 $1), x, y) to (truncate (dext i64 $1, x, y)). The assembly result is the same as gcc. Fixes llvm#42826

Initially, the LRU list stored all mapped entries with no distinction between the committed (non-madvise()'d) entries and decommitted (madvise()'d) entries. Now these two types of entries re separated into two lists, allowing future cache logic to branch depending on whether or not entries are committed or decommitted. Furthermore, the retrieval algorithm will prioritize committed entries over decommitted entries. Specifically, committed entries that satisfy the MaxUnusedCachePages requirement are retrieved before optimal-fit, decommitted entries. This commit addresses the compiler errors raised [here](llvm#100818 (comment)).

llvm-objdump -S issues unnecessary warnings for RISC-V relocatable files containing .debug_loclists or .debug_rnglists sections with ULEB128 relocations. This occurred because `DWARFObjInMemory` verifies support for all relocation types, triggering warnings for unsupported ones. ``` % llvm-objdump -S a.o ... 0000000000000000 <foo>: warning: failed to compute relocation: R_RISCV_SUB_ULEB128, Invalid data was encountered while parsing the file warning: failed to compute relocation: R_RISCV_SET_ULEB128, Invalid data was encountered while parsing the file ... ``` This change fixes llvm#101544 by declaring support for the two ULEB128 relocation types, silencing the spurious warnings. --- In DWARF v5 builds, DW_LLE_offset_pair/DW_RLE_offset_pair might be generated in .debug_loclists/.debug_rnglists with ULEB128 relocations. They are only read by llvm-dwarfdump to dump section content and verbose DW_AT_location/DW_AT_ranges output for relocatable files. The DebugInfoDWARF user (e.g. DWARFDebugRnglists.cpp) calls `Data.getULEB128` without checking the ULEB128 relocations, as the unrelocated value holds meaning (refer to the assembler implementation https://reviews.llvm.org/D157657). This differs from `.quad .Lfoo`, which requires relocation reading (e.g. https://reviews.llvm.org/D74404). Pull Request: llvm#101607

Add custom lowering for `BR_JT` DAG nodes to the `brx.idx` PTX instruction ([PTX ISA 9.7.13.4. Control Flow Instructions: brx.idx] (https://docs.nvidia.com/cuda/parallel-thread-execution/#control-flow-instructions-brx-idx)). Depending on the heuristics in DAG selection, `switch` statements may now be lowered using `brx.idx`. Note: this fixes the previous issue in llvm#102400 by adding the isBarrier attribute to BRX_END

…lvm#102496) Instead of expanding in RISCVExpandPseudoInsts, expand during MachineInstr to MCInst lowering. We weren't doing anything in expansion other than copying operands.

This has received no development work in a while and is slowly bit rotting as new extensions are added. At the moment, I don't think this is viable without adding a new invariant that 32 bit values are always in sign extended form like Mips64 does. We are very dependent on computeKnownBits and ComputeNumSignBits in SelectionDAG to remove sign extends created for ABI reasons. If we can't propagate sign bit information through 64-bit values in SelectionDAG, we can't effectively clean up those extends.

…vm#102416) We already ended up with -fptrauth-returns, the feature macro, the lang opt, and the actual backend lowering. The only part left is threading it all through PointerAuthOptions, to drive the addition of the "ptrauth-returns" attribute to generated functions. While there, do minor cleanup on ptrauth-function-attributes.c. This also adds ptrauth_key_return_address to ptrauth.h.

Only return nullptr when we don't have an available QualType.

…n. NFC

…age (llvm#102086)" This reverts commit 2fe61a5.

Follow on to llvm#101232, as suggested in the comments, narrow the scope of the preserved analyses.

This provides -fptrauth-auth-traps, which at the frontend level only controls the addition of the "ptrauth-auth-traps" function attribute. The attribute in turn controls various aspects of backend codegen, by providing the guarantee that every "auth" operation generated will trap on failure. This can either be delegated to the hardware (if AArch64 FPAC is known to be available), in which case this attribute doesn't change codegen. Otherwise, if FPAC isn't available, this asks the backend to emit additional instructions to check and trap on auth failure.

…llvm#102665) This updates some code to consistently use cpp::numeric_limits, the src/__support polyfill for std::numeric_limits, rather than the C <limits.h> macros. This is in keeping with the general C++-oriented style in libc code, and also sidesteps issues about the new C23 *_WIDTH macros that the compiler-provided header does not define outside C23 mode. Bug: https://issues.fuchsia.dev/358196552

…102539) This class is technically not usable in its current state. When you use it in a simple C++ project, your compiler will complain about an incomplete definition of SaveCoreOptions. Normally this isn't a problem, other classes in the SBAPI do this. The difference is that SBSaveCoreOptions has a default destructor in the header, so the compiler will attempt to generate the code for the destructor with an incomplete definition of the impl type. All methods for every class, including constructors and destructors, must have a separate implementation not in a header.

Make sure that the usage of `cppType` and `cppClassName` of type and attribute definitions/constraints is consistent in TableGen. - `cppClassName`: The C++ class name of the type or attribute. - `cppType`: The fully qualified C++ class name: C++ namespace and C++ class name. Basically, we should always use the fully qualified C++ class name for parameter types, return types or template arguments. Also some minor cleanups. Fixes llvm#57279.

\llvm#92331 tried to make `ObjCARCContractPass` by default, but it caused a regression on O0 builds and was reverted. This patch trys to bring that back by: 1. reverts the [revert](llvm@1579e9c). 2. `createObjCARCContractPass` only on optimized builds. Tests are updated to refelect the changes. Specifically, all `O0` tests should not include `ObjCARCContractPass` Signed-off-by: Peter Rong <PeterRong@meta.com>

) When a type/attribute is defined in TableGen, a type constraint can be used for parameters, but the type constraint verification was missing. Example: ``` def TestTypeVerification : Test_Type<"TestTypeVerification"> { let parameters = (ins AnyTypeOf<[I16, I32]>:$param); // ... } ``` No verification code was generated to ensure that `$param` is I16 or I32. When type constraints a present, a new method will generated for types and attributes: `verifyInvariantsImpl`. (The naming is similar to op verifiers.) The user-provided verifier is called `verify` (no change). There is now a new entry point to type/attribute verification: `verifyInvariants`. This function calls both `verifyInvariantsImpl` and `verify`. If neither of those two verifications are present, the `verifyInvariants` function is not generated. When a type/attribute is not defined in TableGen, but a verifier is needed, users can implement the `verifyInvariants` function. (This function was previously called `verify`.) Note for LLVM integration: If you have an attribute/type that is not defined in TableGen (i.e., just C++), you have to rename the verification function from `verify` to `verifyInvariants`. (Most attributes/types have no verification, in which case there is nothing to do.) Depends on llvm#102657.

The previous change replaced INT_WIDTH with cpp::numberic_limits<int>::digits, but these don't have the same value. While INT_WIDTH == UINT_WIDTH, not so for ::digits, so use cpp::numberic_limits<unsigned int>::digits et al instead for the intended effects. Bug: https://issues.fuchsia.dev/358196552

@vporpo

Heavily based on work by @vporpo.

…xt (llvm#102662) In device context managed memory is not available so it makes no sense to allocate the descriptor using it. Fall back to fir.alloca as it is handled well in device code. cuf.free is just dropped.

The previous change missed the second spot doing the same thing. Bug: https://issues.fuchsia.dev/358196552

- Added a default parsing implementation to `PassOptions` to allow `Option`/`ListOption` to wrap PassOption objects. This is helpful when creating meta-pipelines (pass pipelines composed of pass pipelines). - Updated `ListOption` printing to enable round-tripping the output of `dump-pass-pipeline` back into `mlir-opt` for more complex structures.

Intrinsics are defined with a bfloat type as of commit 250f2bb, not i16 and i32 storage types. As such declarations are no longer needed once the correct types are used.

…() (llvm#102406) This patch introduces Tracker::emplaceIfTracking(), a wrapper of Tracker::track() that will conditionally create the change object if tracking is enabled. This patch also removes the `Parent` member field of `IRChangeBase`.

…lvm#101765) ASTContext::getIntWidth returns 1 if isBooleanType(), and falls back on getTypeSize in the default case, which itself just returns the Width from getTypeInfo's returned struct, so can be used in all cases here, not just for _BitInt types.

Certain tests within the compiler-rt subproject encountered "command not found" errors when using lit's internal shell, particularly when trying to use the `DIR` environment variable. When checking with the command `LIT_USE_INTERNAL_SHELL=1 ninja check-compiler-rt`, I encountered the following error: ``` ******************** Testing: FAIL: SanitizerCommon-ubsan-i386-Linux :: sanitizer_coverage_trace_pc_guard-init.cpp (146 of 9570) ******************** TEST 'SanitizerCommon-ubsan-i386-Linux :: sanitizer_coverage_trace_pc_guard-init.cpp' FAILED ******************** Exit Code: 127 Command Output (stdout): -- # RUN: at line 5 DIR=/usr/local/google/home/harinidonthula/llvm-project/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/ubsan-i386-Linux/Output/sanitizer_coverage_trace_pc_guard-init.cpp.tmp_workdir # executed command: DIR=/usr/local/google/home/harinidonthula/llvm-project/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/ubsan-i386-Linux/Output/sanitizer_coverage_trace_pc_guard-init.cpp.tmp_workdir # .---command stderr------------ # | 'DIR=/usr/local/google/home/harinidonthula/llvm-project/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/ubsan-i386-Linux/Output/sanitizer_coverage_trace_pc_guard-init.cpp.tmp_workdir': command not found # `----------------------------- # error: command failed with exit status: 127 ``` In this patch, I resolved these issues by removing the use of the `DIR` environment variable. Instead, the tests now directly utilize `%t_workdir` for managing temporary directories. Additionally, I simplified the tests by embedding the clang command arguments directly into the test scripts, which avoids complications with environment variable expansion under lit's internal shell. This fix ensures that the tests run smoothly with lit's internal shell and prevents the "command not found" errors, improving the reliability of the test suite when executed in this environment. fixes: llvm#102395 [link to RFC](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)

) In TargetLowering::expandFixedPointMul when expanding fixed point multiplication, and when using a widened MUL as strategy for the lowering, there was a bug resulting in assertion failures like this: Assertion `VT.isVector() == N1.getValueType().isVector() && "SIGN_EXTEND result type type should be vector iff the operand " "type is vector!"' failed. Problem was that we did not consider that VT could be a vector type when setting up the WideVT. This patch should fix that bug.

The previous patch removing the fenv requirement for str to float had an error that got missed due to a lack of tests. This patch fixes the issue and adds tests, as well as updating the existing tests.

Sin/cos/tan fuzzers were having issues with ONE_TWENTY_EIGHT_OVER_PI, so the LIBC_TARGET_CPU_HAS_FMA ifdef statement got moved from the sin/cos/tan .cpp files to the range_reduction_double_common.cpp file.

This patch replaces some of the remaining uses of Tracker::track() to Tracker::emplaceIfTracking().

While manual compiles can specify full file paths and build automation tools use full, unique paths in practice, it's not clear whether it's a general good practice to enforce full paths (fail a build if relative paths are used). `NumDefs == 1` condition [1] should hold true for many internal-linkage vtables as long as full paths are indeed used to salvage the marginal performance when local-linkage vtables are imported due to indirect reference. llvm#100448 (comment) has more details. [1] https://github.com/llvm/llvm-project/pull/100448/files#diff-e7cb370fee46f0f773f2b5429dfab36b75126d3909ae98ee87ff3d0e3f75c6e9R215

This patch introduces the SingleLLVMInstructionImpl class which implements a couple of functions shared across all Instructions that map to a single LLVM Instructions. This avoids code replication.

…tributor pass (llvm#101760)

… `AMDGPU.h`

…2653) Does not yet add it to the pass pipeline. Somehow it causes 2 tests to assert in SelectionDAG, in functions without any control flow.

…lvm#102654)

… casts

nsan will port msan_allocator.cpp and msan_thread.cpp. Clean up the two files first.

Without this patch, the constructor arguments come from SmallVectorImpl, not ArrayRef. This patch switches them to ArrayRef so that we can construct SmallVector with a single argument. Note that LLVM Programmer’s Manual prefers ArrayRef to SmallVectorImpl for flexibility.

…2713)

After llvm#102654

…navailable e.g. on Mach-O

…ber (llvm#102723)

It looks like we've accidentally disabled clang-tidy in the CI. This re-enables it and fixes the issues accumulated while it was disabled.

) Always go through toAPValue() first and pretty-print that. In the future, I think we could get rid of the individual toDiagnosticString() implementations. This way we also get the correct printing for floats.

DominatorTree, LoopInfo, and ScalarEvolution are function-level analyses that expect to be called only on instructions and basic blocks of the function they were original created for. When Polly outlined a parallel loop body into a separate function, it reused the same analyses seemed to work until new checks to be added in llvm#101198. This patch creates new analyses for the subfunctions. GenDT, GenLI, and GenSE now refer to the analyses of the current region of code. Outside of an outlined function, they refer to the same analysis as used for the SCoP, but are substituted within an outlined function. Additionally to the cross-function queries of DT/LI/SE, we must not create SCEVs that refer to a mix of expressions for old and generated values. Currently, SCEVs themselves do not "remember" which ScalarEvolution analysis they were created for, but mixing them is just as unexpected as using DT/LI across function boundaries. Hence `SCEVLoopAddRecRewriter` was combined into `ScopExpander`. `SCEVLoopAddRecRewriter` only replaced induction variables but left SCEVUnknowns to reference the old function. `SCEVParameterRewriter` would have done so but its job was effectively superseded by `ScopExpander`, and now also `SCEVLoopAddRecRewriter`. Some issues persist put marked with a FIXME in the code. Changing them would possibly cause this patch to be not NFC anymore.

…tests. (llvm#102736) Use llvm-lib to generate input library instead of a binary blob.

…m#102637) This prevents some unnecessary conversions to/from int64_t and IntegerAttr.

Zero-initializing all of them accidentally left the last member active. Only initialize the first one.

…ions (llvm#102715)

…02749) Including unions, where this is more important.

…102753) Pointer::activate() propagates up anyway, so that is handled. But we need to call activate() in any case since the parent might not be a union, but the activate() is still needed. Always call it and hope that the InUnion flag takes care of the potential performance problems.

- Fix include guards for headers under utils/TableGen to match their paths.

Update the list of opcodes handled by the constant_fold_binop combine to match the ones that are folded in CSEMIRBuilder::buildInstr.

A dominance query of a block that is in a different function is ill-defined, so assert that getNode() is only called for blocks that are in the same function. There are two cases, where this behavior did occur. LoopFuse didn't explicitly do this, but didn't invalidate the SCEV block dispositions, leaving dangling pointers to free'ed basic blocks behind, causing use-after-free. We do, however, want to be able to dereference basic blocks inside the dominator tree, so that we can refer to them by a number stored inside the basic block.

This patch fixes: clang/lib/Serialization/ASTReader.cpp:11484:13: error: unused variable '_' [-Werror,-Wunused-variable]

The use of _ requires either: - (void)_ and curly braces, or - [[maybe_unused]]. For simple repetitions like these, we can use traditional for loops for readable warning-free code.

They don't have a body and we need to implement them ourselves. Use the Memcpy op to do that.

-fsized-deallocation was recently made the default for C++17 onwards (llvm#90373). While here, remove unneeded -faligned-allocation.

… functions. NFC

This script helps the release managers merge backport PR's. It does the following things: * Validate the PR, checks approval, target branch and many other things. * Rebases the PR * Checkout the PR locally * Pushes the PR to the release branch * Deletes the local branch I have found the script very helpful to merge the PR's.

I tried to add a limit to number of blocks visited in the paths() function but even with a very high limit the transformation coverage was being reduced. After looking at the code it seemed that the function was trying to create paths of the form `SwitchBB...DeterminatorBB...SwitchPredecessor`. This is inefficient because a lot of nodes in those paths (nodes before DeterminatorBB) would be irrelevant to the optimization. We only care about paths of the form `DeterminatorBB_Pred DeterminatorBB...SwitchBB`. This weeds out a lot of visited nodes. In this patch I have added a hard limit to the number of nodes visited and changed the algorithm for path calculation. Primarily I am traversing the use-def chain for the PHI nodes that define the state. If we have a hole in the use-def chain (no immediate predecessors) then I call the paths() function. I also had to the change the select instruction unfolding code to insert redundant one input PHIs to allow the use of the use-def chain in calculating the paths. The test suite coverage with this patch (including a limit on nodes visited) is as follows: Geomean diff: dfa-jump-threading.NumTransforms: +13.4% dfa-jump-threading.NumCloned: +34.1% dfa-jump-threading.NumPaths: -80.7% Compile time effect vs baseline (pass enabled by default) is mostly positive: https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed

…llvm#102681) Before this PR, when using the latest MSVC `Microsoft (R) C/C++ Optimizing Compiler Version 19.40.33813 for x64` one of the Clang unit test used to fail: `CodeGenObjC/gnustep2-direct-method.m`, see full failure log: [here](llvm#100517 (comment)). This PR temporarily shuffles around the code to make the MSVC inliner/ optimizer happy and avoid the bug. MSVC bug report: https://developercommunity.visualstudio.com/t/Bad-code-generation-when-building-LLVM-w/10719589?port=1025&fsid=e572244a-cde7-4d75-a73d-9b8cd94204dd

By default, clang-format packs binary operations, but it may be desirable to have compound operations be on individual lines instead of being packed. This PR adds the option `BreakBinaryOperations` to break up large compound binary operations to be on one line each. This applies to all logical and arithmetic/bitwise binary operations Maybe partially addresses llvm#79487 ? Closes llvm#58014 Closes llvm#57280

With the --force (or -f) option, git-clang-format wipes out input files excluded by a .clang-format-ignore file if they have unstaged changes. This patch adds a hidden clang-format option --list-ignored that lists such excluded files for git-clang-format to filter out. Fixes llvm#102459.

…llvm#102755) Three `llvm-exegesis` tests ``` LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/DefinitionFillsCompletely LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/MultipleDefinitions LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/OneDefinition ``` `FAIL` on Linux/sparc64 like ``` llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp:68: Failure Expected equality of these values: SharedMemoryMapping[I] Which is: '\0' ExpectedValue[I] Which is: '\xAA' (170) ``` It seems like this test only works on little-endian hosts: three sub-tests are already disabled on powerpc and s390x (both big-endian), and the fourth is additionally guarded against big-endian hosts (making the other guards unnecessary). However, since it's not been analyzed if this is really an endianess issue, this patch disables the whole test on powerpc and s390x as before adding sparc to the mix. Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`.

Reverts llvm#101198 Breaks multiple bots: https://lab.llvm.org/buildbot/#/builders/72/builds/2103 https://lab.llvm.org/buildbot/#/builders/164/builds/1909 https://lab.llvm.org/buildbot/#/builders/66/builds/2706

…lvm#102785) Reverts llvm#102735 Breaks https://lab.llvm.org/buildbot/#/builders/52/builds/1496

Pass `ProbeNode` parameter of `trackInlineesOptimizedAway` as const reference. Reviewers: wlei-llvm, WenleiHe Reviewed By: WenleiHe Pull Request: llvm#102787

Expand autos in select places in preparation to llvm#102789. Reviewers: dcci, maksfb, WenleiHe, rafaelauler, ayermolo, wlei-llvm Reviewed By: WenleiHe, wlei-llvm Pull Request: llvm#102788

…C) (llvm#102779)

…vm#102800) `getDeclPtr()` will not just return what we want, but in this case a pointer to the `vu` local variable.

Arnaud is no longer active.

…02807) This corrects a release note introduced in llvm#98745

Move VPWidenLoadRecipe::execute to VPlanRecipes.cpp in line with other ::execute implementations that don't depend on anything defined in LoopVectorization.cpp

This test has hardly any test coverage, and no IR tests. Add a few more tests involving calls, and add some IR checks. This pass needs a lot of work to improve the test coverage, and to actually use the cost model instead of making up its own accounting scheme.

…02645) This was much more difficult than I anticipated. The pass is not in a good state, with poor test coverage. The legacy PM does seem to be relying on maintaining the map state between different SCCs, which seems bad. The pass is going out of its way to avoid putting the attributes it introduces onto non-callee functions. If it just added them, we could use them directly instead of relying on the map, I would think. The NewPM path uses a ModulePass; I'm not sure if we should be using CGSCC here but there seems to be some missing infrastructure to support backend defined ones.

This automatically adds the `clang:as-a-library` label on PRs for the C and Python bindings and the libclang library --------- Co-authored-by: Vlad Serebrennikov <serebrennikov.vladislav@gmail.com>

…2759) My fix for my original fix of issue llvm#92896 in 666d224 modified the function signature for fmt::sprintf to more accurately match the real implementation in libfmt but failed to do the same for absl::StrFormat. The latter fix applied equally well to absl::StrFormat so it's important that its test verifies that the bug is fixed too.

Move collectig profitable VFs to ::getBestVF, in preparation for retiring selectVectorizationFactor.

Update AVX level for llvm#48188 to be closer to the one used in the preproducer.

Regenerate check lines for test to avoid unrelated changes in llvm#99808.

…ure (llvm#97670) See https://discourse.llvm.org/t/rfc-globalisel-instructionselect-allow-arbitrary-instruction-erasure

The Combiner doesn't install the Observer into the MachineFunction. This probably went unnoticed, because MachineFunction::getObserver() is currently only used in constrainOperandRegClass(), but this might cause issues down the line. Pull Request: llvm#102156

Remove a hack from GISelWorkList caused by the Combiner removing instructions from an unfinalized GISelWorkList during the DCE phase. This is in preparation for larger changes to the WorkListMaintainer. Pull Request: llvm#102158

…ands for USUBSAT. (llvm#102781) It doesn't matter which extend we use to promote the operands. Use whatever is the most efficient. The custom handler for RISC-V was using SIGN_EXTEND when the Zbb extension is enabled so we no longer need that.

On thread creation, asan/hwasan/msan/tsan unpoison the thread stack and static TLS blocks in case the blocks reuse previously freed memory that is possibly poisoned. glibc nptl/allocatestack.c allocates thread stack using a hidden, non-interceptable function. nsan is similar: the shadow types for the thread stack and static TLS blocks should be set to unknown, otherwise if the static TLS blocks reuse previous shadow memory, and `*p += x` instead of `*p = x` is used for the first assignment, the mismatching user and shadow memory could lead to false positives. NsanThread is also needed by the next patch to use the sanitizer allocator. Pull Request: llvm#102718

This patch bumps the CI container LLVM version to 18.1.8. This should've been bumped a while ago, but I just noticed that it was out of date. This also allows us to drop a patch that we manually had to add as it is by default included in v18.

…lvm#102625) This change fixes a crash when getOwner()->getParent() is a nullptr

Update createEdgeMask to created masks where the terminator in Src is a switch. We need to handle 2 separate cases: 1. Dst is not the default desintation. Dst is reached if any of the cases with destination == Dst are taken. Join the conditions for each case where destination == Dst using a logical OR. 2. Dst is the default destination. Dst is reached if none of the cases with destination != Dst are taken. Join the conditions for each case where the destination is != Dst using a logical OR and negate it. Edge masks are created for every destination of cases and/or default when requesting a mask where the source is a switch. Fixes llvm#48188. PR: llvm#99808

nsan will port msan_allocator.cpp.

The pointer will immediate be dereferenced.

After f0df4fb, isPredicatedInst needs to handle SwitchInst as well. Handle it the same as BranchInst. This fixes a crash in the newly added test and improves the results for one of the existing tests in predicate-switch.ll Should fix https://lab.llvm.org/buildbot/#/builders/113/builds/2099.

…s behavior (llvm#102671) Followup to llvm#102138 and llvm#102396, restore more old behavior to fix ppc64-aix bot.

…102751) - Eliminate top-level "using namespace" from some headers.

This is invalid in C++, and clang recently started warning on it as of llvm#101853

Reverts llvm#102825

Intermittently on my mac I was getting the same nullptr crash in dlsym. We need to make sure rtsan gets initialized on mac between when the binary starts running, and the first intercepted function is called. Until that point we should use the DlsymAllocator.

This fixes: ``` [6831/7617] Building CXX object tools\lldb\source\Target\CMakeFiles\lldbTarget.dir\ThreadPlanSingleThreadTimeout.cpp.obj C:\src\git\llvm-project\lldb\source\Target\ThreadPlanSingleThreadTimeout.cpp(66) : warning C4715: 'lldb_private::ThreadPlanSingleThreadTimeout::StateToString': not all control paths return a value ```

This fixes several of those when building with MSVC on Windows: ``` [3625/7617] Building CXX object projects\openmp\runtime\src\CMakeFiles\omp.dir\kmp_affinity.cpp.obj C:\src\git\llvm-project\openmp\runtime\src\kmp_affinity.cpp(2637): warning C4062: enumerator 'KMP_HW_UNKNOWN' in switch of enum 'kmp_hw_t' is not handled C:\src\git\llvm-project\openmp\runtime\src\kmp.h(628): note: see declaration of 'kmp_hw_t' ```

This fixes a few of these warnings, when building with Clang ToT on Windows: ``` [622/7618] Building CXX object projects\compiler-rt\lib\sanitizer_common\CMakeFiles\RTSanitizerCommonSymbolizer.x86_64.dir\sanitizer_symbolizer_win.cpp.obj C:\src\git\llvm-project\compiler-rt\lib\sanitizer_common\sanitizer_symbolizer_win.cpp(74,3): warning: cast from 'FARPROC' (aka 'long long (*)()') to 'decltype(::StackWalk64) *' (aka 'int (*)(unsigned long, void *, void *, _tagSTACKFRAME64 *, void *, int (*)(void *, unsigned long long, void *, unsigned long, unsigned long *), void *(*)(void *, unsigned long long), unsigned long long (*)(void *, unsigned long long), unsigned long long (*)(void *, void *, _tagADDRESS64 *))') converts to incompatible function type [-Wcast-function-type-mismatch] ``` This is similar to llvm#97905

This fixes the following warning, when building with Clang ToT on Windows: ``` [6668/7618] Building CXX object tools\lldb\source\Plugins\Process\Windows\Common\CMakeFiles\lldbPluginProcessWindowsCommon.dir\TargetThreadWindows.cpp.obj C:\src\git\llvm-project\lldb\source\Plugins\Process\Windows\Common\TargetThreadWindows.cpp(182,22): warning: cast from 'FARPROC' (aka 'long long (*)()') to 'GetThreadDescriptionFunctionPtr' (aka 'long (*)(void *, wchar_t **)') converts to incompatible function type [-Wcast-function-type-mismatch] ``` This is similar to: llvm#97905

This fixes the following: ``` [6603/7618] Building CXX object tools\lldb\source\Plugins\ObjectFile\PECOFF\CMakeFiles\lldbPluginObjectFilePECOFF.dir\WindowsMiniDump.cpp.obj C:\src\git\llvm-project\lldb\source\Plugins\ObjectFile\PECOFF\WindowsMiniDump.cpp(29,25): warning: object backing the pointer will be destroyed at the end of the full-expression [-Wdangling-gsl] 29 | const auto &outfile = core_options.GetOutputFile().value(); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. ```

…. NFCI (llvm#102796)

This commit fixes what appears to be invalid C++ -- a lambda capturing a variable before it is declared. The code compiles with GCC and Clang but not MSVC.

…sics (llvm#101125) Adding the following linked to their docs: - [amd_vrs16_acosf](https://github.com/amd/aocl-libm-ose/blob/9c0b67293ba01e509a6308247d82a8f1adfbbc67/scripts/libalm.def#L221) - [amd_vrd2_cosh](https://github.com/amd/aocl-libm-ose/blob/9c0b67293ba01e509a6308247d82a8f1adfbbc67/scripts/libalm.def#L124) - [amd_vrs16_tanhf](https://github.com/amd/aocl-libm-ose/blob/9c0b67293ba01e509a6308247d82a8f1adfbbc67/scripts/libalm.def#L224)

…o make it more clear The preivous implementation of wasDeclEmitted may be confusing that why we need to filter the declaration not from modules. Now adjust the implementations to avoid the problems.

…y symbols behavior (llvm#102671)" This reverts commit 32973b0. This fix doesn't fix the build failure as expected and making few other configuration broken too.

This PR changes the sanitizer passes to be idempotent. When any sanitizer pass is run after it has already been run before, double instrumentation is seen in the resulting IR. This happens because there is no check in the pass, to verify if IR has been instrumented before. This PR checks if "nosanitize_*" module flag is already present and if true, return early without running the pass again.

…vm#102449) llvm#102326 enables verification of type parameters that are type constraints. The element type verification for `VectorType` (and maybe other builtin types in the future) can now be auto-generated. Also remove redundant error checking in the vector type parser: element type and dimensions are already checked by the verifier (which is called from `getChecked`). Depends on llvm#102326.

Assert that the given pointer is in a union if it's not active and use a range-based for loop to find the active field.

Fix the auto-cast of `linalg.batch_reduce_matmul` from `cast_to_T(A * cast_to_T(B)) + C` to `cast_to_T(A) * cast_to_T(B) + C`

In preparation for having a similar function for destructors.

…mitted to make it more clear" This reverts commit 4399f2a. This fails with Modules/aarch64-sme-keywords.cppm

) Reland llvm#100723, fixing the ARM issue at the cost of a small loss of optimization in `test/CodeGen/AMDGPU/fshr.ll` Solves llvm#100383

llvm#102605) In C++23 anything can be constexpr, including a dtor of a class whose members and bases don't have constexpr dtors. Avoid early triggering of vtable instantiation int this case. Fixes llvm#102293

The only use in `opt.cpp` was removed in d291f1f.

`DataLayout` isn't exactly cheap to copy (448 bytes on a 64-bit host). Move `operator=` to cpp file to improve compilation time. Also move `operator==` closer to `operator=` and add a couple of FIXMEs.

The condition was backwards - it was rejecting when the condition was met. Fixes llvm#102719

VPVectorPointerRecipe only uses the first part of the pointer operand, so mark it accordingly. Follow-up suggested as part of llvm#99808.

…vm#102099) The tosa::transpose::verify() should make sure that the permutation numbers are within the size of the input array. Otherwise it will cause a cryptic array out of bound assertion later.Fix llvm#99513.

fixes llvm#102231 by inserting missing checks.

@fn

…vm#102596) transformConstExprCastCall() implements a number of highly dubious transforms attempting to make a call function type line up with the function type of the called function. Historically, the main value this had was to avoid function type mismatches due to pointer type differences, which is no longer relevant with opaque pointers. This patch is a step towards reducing the scope of the transform, by applying it only to definitions, not declarations. For declarations, the declared signature might not match the actual function signature, e.g. `void @fn()` is sometimes used as a placeholder for functions with unknown signature. The implementation already bailed out in some cases for declarations, but I think it would be safer to disable the transform entirely. For the test cases, I've updated some of them to use definitions instead, so that the test coverage is preserved.

This contains the fpmr register which was added in Armv9.5-a. This register mainly contains controls for fp8 formats. It was added to the Linux Kernel in torvalds/linux@4035c22.

This commit removes `invalidateRegionsImpl()`, moving its body to `invalidateRegions(ValueList Values, ...)`, because it was a completely useless layer of indirection. Moreover I'm fixing some strange indentation within this function body and renaming two variables to the proper `UpperCamelCase` format.

Make test more robust w.r.t. future changes.

(DWARFv5) split units have an extra `dwo_id` field in the header. Type units have `type_signature` and `type_offset`.

This removes redundant ORs of matching masks. Follow-up to f0df4fb to reduce the number of redundant ORs for masks.

…lvm#99468)

…lvm#102580) At pointer subtraction only pointers are allowed that point into an array (or one after the end), this fact was checker by the checker. This check is now removed because it is a special case of array indexing error that is handled by different checkers (like ArrayBoundsV2).

) I ran into this when LTO completely emptied two compile units, so they ended up with the same hash (see llvm#100375). Although, ideally, the compiler would try to ensure we don't end up with a hash collision even in this case, guaranteeing their absence is practically impossible. This patch ensures this situation does not bring down lldb.

This patch replaces `ConstructQueue::iterator` arguments with `ConstructQueue::const_iterator` where it's used as a pointer to an element inside of a `const ConstructQueue &` passed along with it. Since these functions don't intend to modify the list or any elements in it, keeping constness consistent between both makes it simpler to work with.

…lvm#102477) ... within the classes `StoreManager` and `ProgramState` and describe the connection between the two methods.

…ic (llvm#101017) Currently the histogram intrinsic (llvm.experimental.vector.histogram.add) only allows i32 and i64 types for the memory locations to be updated, matching the restrictions of the histcnt instruction. This patch adds support for the legalisation of smaller types (i8 and i16) via promotion.

…ls for DeserializationListener (llvm#102855) Close llvm#102684 The root cause of the issue is, it is possible that the predefined decl is not registered at the beginning of writing a module file but got created during the process of writing from reading. This is incorrect. The predefined decls should always be predefined decls. Another deep thought about the issue is, we shouldn't read any new things after we start to write the module file. But this is another deeper question.

…D::BLENDV nodes when upper elements are not demanded. Prep work for llvm#83402

Delete the attribute and annotate any atomicrmw instructions in the function with new metadata.

…lvm#102652) This transform op makes it possible to query attributes associated to IR by means of the DLTI dialect. The op takes both a `key` and a target `op` to perform the query at. Facility functions automatically find the closest ancestor op which defines the appropriate DLTI interface or has an attribute implementing a DLTI interface. By default the lookup uses the data layout interfaces of DLTI. If the optional `device` parameter is provided, the lookup happens with respect to the interfaces for TargetSystemSpec and TargetDeviceSpec. This op uses new free-standing functions in the `dlti` namespace to not only look up specifications via the `DataLayoutSpecOpInterface` and on `ModuleOp`s but also on any ancestor op that has an appropriate DLTI attribute.

R600 has a separate CodeGenPassBuilder anyway.

As these are flags they can be set or not depending on what the system libraries did prior to loading the program.

…#102806)

…on (llvm#102812) Keep respecting the old cl::opt for now.

…_to_sint fold (necessary to fix llvm#83402 on AVX512 targets). Prep work for llvm#83402

@sogartar

This is a fixed copy of llvm#98145 (necessary after it got reverted). @sogartar @yaochengji This PR adds the following to llvm#98145: - `UpdateHaloOp` accepts a `memref` (instead of a tensor) and not returning a result to clarify its inplace-semantics - `UpdateHaloOp` accepts `split_axis` to allow multiple mesh-axes per tensor/memref-axis (similar to `mesh.sharding`) - The implementation of `Shardinginterface` for tensor operation (`tensor.empty` for now) moved from the tensor library to the mesh interface library. `spmdize` uses features from `mesh` dialect. @rengolin agreed that `tensor` should not depend on `mesh` so this functionality cannot live in a `tensor`s lib. The unfulfilled dependency caused the issues leading to reverting llvm#98145. Such cases are generally possible and might lead to re-considering the current structure (like for tosa ops). - rebased onto latest main -------------------------- Replacing `#mesh.sharding` attribute with operation `mesh.sharding` - extended semantics now allow providing optional `halo_sizes` and `sharded_dims_sizes` - internally a sharding is represented as a non-IR class `mesh::MeshSharding` What previously was ```mlir %sharded0 = mesh.shard %arg0 <@Mesh0, [[0]]> : tensor<4x8xf32> %sharded1 = mesh.shard %arg1 <@Mesh0, [[0]]> annotate_for_users : tensor<16x8xf32> ``` is now ```mlir %sharding = mesh.sharding @Mesh0, [[0]] : !mesh.sharding %0 = mesh.shard %arg0 to %sharding : tensor<4x8xf32> %1 = mesh.shard %arg1 to %sharding annotate_for_users : tensor<16x8xf32> ``` and allows additional annotations to control the shard sizes: ```mlir mesh.mesh @Mesh0 (shape = 4) %sharding0 = mesh.sharding @Mesh0, [[0]] halo_sizes = [1, 2] : !mesh.sharding %0 = mesh.shard %arg0 to %sharding0 : tensor<4x8xf32> %sharding1 = mesh.sharding @Mesh0, [[0]] sharded_dims_sizes = [3, 5, 5, 3] : !mesh.sharding %1 = mesh.shard %arg1 to %sharding1 annotate_for_users : tensor<16x8xf32> ``` - `mesh.shard` op accepts additional optional attribute `force`, useful for halo updates - Some initial spmdization support for the new semantics - Support for `tensor.empty` reacting on `sharded_dims_sizes` and `halo_sizes` in the sharding - New collective operation `mesh.update_halo` as a spmdized target for shardings with `halo_sizes` --------- Co-authored-by: frank.schlimbach <fschlimb@smtp.igk.intel.com> Co-authored-by: Jie Fu <jiefu@tencent.com>

LegacyPointerTypes is not used any longer and can be removed from the LLVM context. Also remove a copy-pasted code comment in TypedPointerType that doesn't make sense (since there is no special case for address space zero in the TypedPointerType::get implementation).

With opaque pointers we can just get the pointer type for the resolver function by using PointerType::get, making the GlobalIFunc::getResolverFunctionType function obsolete.

…m#102631)

The others are already inline here.

…g. (llvm#102650) Simplifies checks for AGPRs and VGPRs and makes them more explicit and less fragile.

Mention the names of unavailable registers in error messages to not make the diagnostics for execz/vccz less rich than it was. Clean up unnecessary name qualifications while there. Part of <llvm#62629>.

llvm#102123) …Type This is needed to ensure we find a type if its definition is in a CU that wasn't indexed. This can happen if the definition is in some precompiled code (e.g. the c++ standard library) which was built with different flags than the rest of the binary.

…ADS is not defined When LLVM_ENABLE_THREADS is not defined, llvm::get_threadid returns 0 which makes this test case fail. This is a pretty niche setting, Linaro uses it to stop lld crashing our 32 bit containers. So the test will get plenty of runs elsewhere. In lldb's code it's not getting the current thread ID anyway, it's using a value it got from ptrace. So even if that copy of lldb was built with LLVM_ENABLE_THREADS off, it should still be able to debug threads.

…uced in llvm#99732 (llvm#102716)

On PlayStation, allow users to supply -static to the linker, via the driver. An initial step. Later changes will have the PS5 driver supply additional options to the linker, if and when -static is passed. SIE tracker: TOOLCHAIN-16704

We only need to see that 1 frame of the stack is in user code. No need to carry on looking. Doing so actually caused a test failure on Armv8 Ubuntu Jammy where a libc function does not have a display name. I'm sure I'm going to get stung by this elsewhere, but for this test, breaking early sidesteps the problem.

The Mul factor was zero-extended here, resulting in incorrect results for integers larger than 64-bit. As we currently only multiply by 1 or -1, just split this into two cases -- there's no need for a full multiplication here. Fixes llvm#102597.

Implement FEAT_SME_B16B16 to enable ZA-targeting non-widening SME BFloat16 instructions. Remove the now redundant FEAT_B16B16 which has been replaced by FEAT_SVE_B16B16 and FEAT_SME_B16B16 (this commit), see llvm#101480 for the details and reasoning of this change to LLVM. FEAT_SME_B16B16 is documented under the latest Armv9.4 feature documentation: https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extensio - Changes to Clang AArch64 frontend - Change target guard of SME2 ZA-targeting non-widening BFloat16 intrinsics to 'sme-b16b16' - Changes to LLVM AArch64 backend - llvm/lib/Target/AArch64/AArch64Features.td - Create FeatureSMEB16B16, which implies FeatureSME2 and FeatureSVEB16B16 - Remove FeatureB16B16 - Fix description of FeatureSVEB16B16 - llvm/lib/Target/AArch64/AArch64InstrInfo.td - Create HasSMEB16B16 predicate - llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td - Change predictication of SME2 ZA-targeting non-widening BFloat16 instructions to new HasSMEB16B16 - llvm/lib/Target/AArch64/AArch64.td - Add HasSMEB16B16 to SME2Unsupported (FEAT_SME_B16B16 implies FEAT_SME2) - llvm/lib/AArch64/AsmParser/AArch64AsmParser.cpp - Remove flag 'b16b16' mapping to removed FeatureB16B16 - Add flag 'sme-b16b16' mapping to new FeatureSMEB16B16 - Changes to LLVM unit tests - llvm/unittests/TargetParser/TargetParserTest.cpp - Add new sme-b16b16 flag to existing target parser tests - Add tests for the sme-b16b16 dependencies: - 'sme-b16b16' should enable 'sme2', 'sve-b16b16'. - Remove 'b16b16' from bf16 dependency test - Added MC tests - llvm/test/MC/AArch64/SME2p1 - To ensure that ZA-targeting multi-vector non-widening BFloat16 instructions are enabled by +sme-b16b16, and that this feature is removed by +nosme-b61b6. - Modidified tests - All CodeGen, Semantic, and MC tests that are effected by the removal of 'b16b16', have been modified to supply and/or expect 'sme-b16b16' where appropriate.

Include chain of ops feeding inductions in cost precomputation for inductions, not just the induction increment. In VPlan, those instructions will be cleaned up, as both phi and increment are generated by VPWidenIntOrFpInductionRecipe independently. Fixes llvm#101337.

This PR fixes emission of valid OpLifestart/OpLifestop instructions. According to https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpLifetimeStart: "Size must be 0 if Pointer is a pointer to a non-void type or the Addresses [capability](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Capability) is not declared.". The `Size` argument is set the corresponding intrinsics arguments, so Size is not zero we must ensure that Pointer has the required type by inserting a bitcast if needed.

…01732) This PR contains changes in virtual register processing aimed to improve correctness of emitted MIR between passes from the perspective of MachineVerifier. This potentially helps to detect previously missed flaws in code emission and harden the test suite. As a measure of correctness and usefulness of this PR we may use a mode with expensive checks set on, and MachineVerifier reports problems in the test suite. In order to satisfy Machine Verifier requirements to MIR correctness not only a rework of usage of virtual registers' types and classes is required, but also corrections into pre-legalizer and instruction selection logics. Namely, the following changes are introduced: * scalar virtual registers have proper bit width, * detect register class by SPIR-V type, * add a superclass for id virtual register classes, * fix Tablegen rules used for instruction selection, * fixes of minor existed issues (missed flag for proper representation of a null constant for OpenCL vs. HLSL, wrong usage of integer virtual registers as a synonym of any non-type virtual register).

* rename CXXIndeterminateSpliceExpr in the readme too Signed-off-by: delimbetov <1starfall1@gmail.com> * make TryAnnotateOptionalCXXScopeToken work Signed-off-by: delimbetov <1starfall1@gmail.com> * make splice work in requires clause Signed-off-by: delimbetov <1starfall1@gmail.com> * add tests for splice in requires expr Signed-off-by: delimbetov <1starfall1@gmail.com> * add typename and newline at the end of the file Signed-off-by: delimbetov <1starfall1@gmail.com> * add comments Signed-off-by: delimbetov <1starfall1@gmail.com> --------- Signed-off-by: delimbetov <1starfall1@gmail.com>

Some work remains: In particular, if this is going to "work" (i.e., supported by P2996), we need to think carefully about reachability, TU-local entities, etc. There probably need to be some constraints around use of imported reflections, and possibly some 'is_reachable' metafunction. Not entirely sure - need to experiment further. Closes issue bloomberg#4.

TBD whether to keep this, but adding it so it can be played around with.

…ions (bloomberg#89) * basic impl Signed-off-by: delimbetov <1starfall1@gmail.com> * add test for the new storage duration funcs Signed-off-by: delimbetov <1starfall1@gmail.com> * code style Signed-off-by: delimbetov <1starfall1@gmail.com> * run libcxx generators to pass CI Signed-off-by: delimbetov <1starfall1@gmail.com> * fix identation Signed-off-by: delimbetov <1starfall1@gmail.com> --------- Signed-off-by: delimbetov <1starfall1@gmail.com>

Closes issue bloomberg#87.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync branch with main #2

Sync branch with main #2

Commits on Aug 9, 2024

Commits on Aug 10, 2024

Commits on Aug 11, 2024

Commits on Aug 12, 2024

Commits on Aug 15, 2024

Commits on Aug 19, 2024

Commits on Aug 20, 2024