summaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
AgeCommit message (Collapse)Author
2025-11-15Cleanups in AArch64 (#168025)Eric Christopher
Forward declare a couple of classes for simplicity, remove some unused headers, clean up a comment. Tested with check-all.
2025-11-10Remove unused standard headers: <string>, <optional>, <numeric>, <tuple> ↵serge-sans-paille
(#167232)
2025-10-28[AArch64][SME] Propagate desired ZA states in the MachineSMEABIPass (#149510)Benjamin Maxwell
This patch adds a step to the MachineSMEABIPass that propagates desired ZA states. This aims to pick better ZA states for edge bundles, as when many (or all) blocks in a bundle do not have a preferred ZA state, the ZA state assigned to a bundle can be less than ideal. An important case is nested loops, where only the inner loop has a preferred ZA state. Here we'd like to propagate the ZA state from the inner loop to the outer loops (to avoid saves/restores in any loop).
2025-09-17[AArch64] Enable GlobalMerge on externals (#158592)David Green
GlobalMerge has been enabled for minsize for a while, this patch enables it more generally. In my testing it did not affect performance very much, especially with the linker relaxations we already perform, but should help reduce code size a little.
2025-09-11[llvm] Move data layout string computation to TargetParser (#157612)Reid Kleckner
Clang and other frontends generally need the LLVM data layout string in order to generate LLVM IR modules for LLVM. MLIR clients often need it as well, since MLIR users often lower to LLVM IR. Before this change, the LLVM datalayout string was computed in the LLVM${TGT}CodeGen library in the relevant TargetMachine subclass. However, none of the logic for computing the data layout string requires any details of code generation. Clients who want to avoid duplicating this information were forced to link in LLVMCodeGen and all registered targets, leading to bloated binaries. This happened in PR #145899, which measurably increased binary size for some of our users. By moving this information to the TargetParser library, we can delete the duplicate datalayout strings in Clang, and retain the ability to generate IR for unregistered targets. This is intended to be a very mechanical LLVM-only change, but there is an immediately obvious follow-up to clang, which will be prepared separately. The vast majority of data layouts are computable with two inputs: the triple and the "ABI name". There is only one exception, NVPTX, which has a cl::opt to enable short device pointers. I invented a "shortptr" ABI name to pass this option through the target independent interface. Everything else fits. Mips is a bit awkward because it uses a special MipsABIInfo abstraction, which includes members with codegen-like concepts like ABI physical registers that can't live in TargetParser. I think the string logic of looking for "n32" "n64" etc is reasonable to duplicate. We have plenty of other minor duplication to preserve layering. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
2025-09-06[AArch64] Don't run loop-idiom-vectorize pass in the O0 pipeline (#156802)Fangrui Song
As noted in #156787
2025-08-19[AArch64][SME] Implement the SME ABI (ZA state management) in Machine IR ↵Benjamin Maxwell
(#149062) ## Short Summary This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI for ZA state (e.g., lazy saves and agnostic ZA functions). This is currently not enabled by default (but aims to be by LLVM 22). The goal is for this new pass to more optimally place ZA saves/restores and to work with exception handling. ## Long Description This patch reimplements management of ZA state for functions with private and shared ZA state. Agnostic ZA functions will be handled in a later patch. For now, this is under the flag `-aarch64-new-sme-abi`, however, we intend for this to replace the current SelectionDAG implementation once complete. The approach taken here is to mark instructions as needing ZA to be in a specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly defining or using ZA registers (such as $zt0 or $zab0) require the "ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE" state depending on the callee (having shared or private ZA). We already add ZA register uses/definitions to machine instructions, so no extra work is needed to mark these. Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START. These markers are then used by the MachineSMEABIPass to find instructions where there is a transition between required ZA states. These are the points we need to insert code to set up or restore a ZA save (or initialize ZA). To handle control flow between blocks (which may have different ZA state requirements), we bundle the incoming and outgoing edges of blocks. Bundles are formed by assigning each block an incoming and outgoing bundle (initially, all blocks have their own two bundles). Bundles are then combined by joining the outgoing bundle of a block with the incoming bundle of all successors. These bundles are then assigned a ZA state based on the blocks that participate in the bundle. Blocks whose incoming edges are in a bundle "vote" for a ZA state that matches the state required at the first instruction in the block, and likewise, blocks whose outgoing edges are in a bundle vote for the ZA state that matches the last instruction in the block. The ZA state with the most votes is used, which aims to minimize the number of state transitions.
2025-06-30[MachineOutliner] Remove LOHs from outlined candidates (#143617)Ellis Hoag
Remove Linker Optimization Hints (LOHs) from outlining candidates instead of simply preventing outlining if LOH labels are found in the candidate. This will improve the effectiveness of the machine outliner when LOHs are enabled (which is the default). In https://discourse.llvm.org/t/loh-conflicting-with-machineoutliner/83279/1 it was observed that the machine outliner is much more effective when LOHs are disabled. Rather than completely disabling LOH, this PR aims to keep LOH in most places and removing them from outlined functions where it could be illegal. Note that we are conservatively removing all LOHs from outlined functions for simplicity, but I believe we could retain LOHs that are in the intersection of all candidates. It should be ok to remove these LOHs since these blocks are being outlined anyway, which will harm performance much more than the gain from keeping the LOHs.
2025-06-25[AArch64][SME] Use reportFatalUsageError rather than assert (NFC) (#145491)Benjamin Maxwell
Fixes #144351
2025-06-17[llvm] annotate interfaces in llvm/Target for DLL export (#143615)Andrew Rogers
## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Target` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). A sub-set of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The bulk of this change is manual additions of `LLVM_ABI` to `LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target. Adding `LLVM_ABI` to the function implementation is required here because they do not `#include "llvm/Support/TargetSelect.h"`, which contains the declarations for this functions and was already updated with `LLVM_ABI` in a previous patch. I considered patching these files with `#include "llvm/Support/TargetSelect.h"` instead, but since TargetSelect.h is a large file with a bunch of preprocessor x-macro stuff in it I was concerned it would unnecessarily impact compile times. In addition, a number of unit tests under llvm/unittests/Target required additional dependencies to make them build correctly against the LLVM DLL on Windows using MSVC. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang
2025-06-03[MISched] Add templates for creating custom schedulers (#141935)Pengcheng Wang
We rename `createGenericSchedLive` and `createGenericSchedPostRA` to `createSchedLive` and `createSchedPostRA`, and add a template parameter `Strategy` which is the generic implementation by default. This can simplify some code for targets that have custom scheduler strategy.
2025-05-06Register assembly printer passes (#138348)Matthias Braun
Register assembly printer passes in the pass registry. This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests. Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow targets to use the `INITIALIZE_PASS` macros and register the pass in the pass registry. This currently has a default parameter so it won't break any targets that have not been updated.
2025-04-26[TTI] Simplify implementation (NFCI) (#136674)Sergei Barannikov
Replace "concept based polymorphism" with simpler PImpl idiom. This pursues two goals: * Enforce static type checking. Previously, target implementations hid base class methods and type checking was impossible. Now that they override the methods, the compiler will complain on mismatched signatures. * Make the code easier to navigate. Previously, if you asked your favorite LSP server to show a method (e.g. `getInstructionCost()`), it would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`, `TTIImplBase`, and target overrides. Now it is two less :) There are three commits to hopefully simplify the review. The first commit removes `TTI::Model`. This is done by deriving `TargetTransformInfoImplBase` from `TTI::Concept`. This is possible because they implement the same set of interfaces with identical signatures. The first commit makes `TargetTransformImplBase` polymorphic, which means all derived classes should `override` its methods. This is done in second commit to make the first one smaller. It appeared infeasible to extract this into a separate PR because the first commit landed separately would result in tons of `-Woverloaded-virtual` warnings (and break `-Werror` builds). The third commit eliminates `TTI::Concept` by merging it with the only derived class `TargetTransformImplBase`. This commit could be extracted into a separate PR, but it touches the same lines in `TargetTransformInfoImpl.h` (removes `override` added by the second commit and adds `virtual`), so I thought it may make sense to land these two commits together. Pull Request: https://github.com/llvm/llvm-project/pull/136674
2025-04-07[NFC][LLVM][AArch64] Cleanup pass initialization for AArch64 (#134315)Rahul Joshi
- Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767
2025-03-20Initialize aarch64-cond-br-tuning pass (#132087)Shubham Sandeep Rastogi
The call to the initializeAArch64CondBrTuningPass function is missing in the AArch64TargetMachine LLVMInitializeAArch64Target function. This means that the pass is not in the pass registry and options such as -run-pass=aarch64-cond-br-tuning and -stop-after=aarch64-cond-br-tuning cannot be used. This patch fixes that issue.
2025-03-06[win] NFC: Rename `EHCatchret` to `EHCont` to allow for EH Continuation ↵Daniel Paoliello
targets that aren't `catchret` instructions (#129953) This change splits out the renaming and comment updates from #129612 as a non-functional change.
2025-02-26Revert "Reland "[AArch64][NPM] Chalk out the CodeGenPassBuilder for NPM ↵Akshat Oke
(#128…" (#128819) Reverts llvm/llvm-project#128662 Still a link error.
2025-02-26Reland "[AArch64][NPM] Chalk out the CodeGenPassBuilder for NPM (#128… ↵Akshat Oke
(#128662) …471)" Reland https://github.com/llvm/llvm-project/pull/128471 The Passes library was not linked in earlier.
2025-02-24Revert "[AArch64][NPM] Chalk out the CodeGenPassBuilder for NPM (#128471)"Kazu Hirata
This reverts commit d85685eb863641dce62a9f858ebcd6bab56c605b. Multiple buildbot failures have been reported: https://github.com/llvm/llvm-project/pull/128471
2025-02-25[AArch64][NPM] Chalk out the CodeGenPassBuilder for NPM (#128471)Akshat Oke
This allows for testing AArch64 passes with the new pass manager.
2025-02-05[CodeGen] Move MISched target hooks into TargetMachine (#125700)Christudasan Devadasan
The createSIMachineScheduler & createPostMachineScheduler target hooks are currently placed in the PassConfig interface. Moving it out to TargetMachine so that both legacy and the new pass manager can effectively use them.
2024-12-17[MTE] Apply alignment / size in AsmPrinter rather than IR (#111918)Florian Mayer
This makes sure no optimizations are applied that assume the bigger alignment or size, which could be incorrect if we link together with non-instrumented code.
2024-11-18[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326)Akshat Oke
With this, all machine SSA optimization passes are available in the new codegen pipeline.
2024-11-14Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234)Matin Raayai
Following discussions in #110443, and the following earlier discussions in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html, https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine` interface classes. More specifically: 1. Makes `TargetMachine` the only class implemented under `TargetMachine.h` in the `Target` library. 2. `TargetMachine` contains target-specific interface functions that relate to IR/CodeGen/MC constructs, whereas before (at least on paper) it was supposed to have only IR/MC constructs. Any Target that doesn't want to use the independent code generator simply does not implement them, and returns either `false` or `nullptr`. 3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming aims to make the purpose of `LLVMTargetMachine` clearer. Its interface was moved under the CodeGen library, to further emphasis its usage in Targets that use CodeGen directly. 4. Makes `TargetMachine` the only interface used across LLVM and its projects. With these changes, `CodeGenCommonTMImpl` is simply a set of shared function implementations of `TargetMachine`, and CodeGen users don't need to static cast to `LLVMTargetMachine` every time they need a CodeGen-specific feature of the `TargetMachine`. 5. More importantly, does not change any requirements regarding library linking. cc @arsenm @aeubanks
2024-11-14Add ifunc support for Windows on AArch64. (#111962)Daniel Kiss
On Windows there is no platform support for ifunc but we could lower them to global function pointers. This also enables FMV for Windows with Clang and Compiler-rt. Depends on #111961
2024-11-11[AArch64] Remove unused includes (NFC) (#115685)Kazu Hirata
Identified with misc-include-cleaner.
2024-11-07[Backend] Add clearSubtargetMap API for TargetMachine. (#112383)weiwei chen
- [x] Add `clearSubtargetInfo` API to TargetMachine and each backend to make it possible to release memory used in each backend's `SubtargetInfo` map if needed. Keep this API as `protected` so that it will be used with precautions.
2024-10-16[CodeGen][NewPM] Port EarlyIfConversion pass to NPM. (#108508)Christudasan Devadasan
2024-10-15[clang][aarch64] Add support for the MSVC qualifiers __ptr32, __ptr64, ↵Daniel Paoliello
__sptr, __uptr for AArch64 (#111879) MSVC has a set of qualifiers to allow using 32-bit signed/unsigned pointers when building 64-bit targets. This is useful for WoW code (i.e., the part of Windows that handles running 32-bit application on a 64-bit OS). Currently this is supported on x64 using the 270, 271 and 272 address spaces, but does not work for AArch64 at all. This change adds the same 270, 271 and 272 address spaces to AArch64 and adjusts the data layout string accordingly. Clang will generate the correct address space casts, but these will currently be ignored until the AArch64 backend is updated to handle them. Partially fixes #62536 This is a resurrected version of <https://reviews.llvm.org/D158857> (originally created by @a_vorobev) - I've cleaned it up a little, fixed the rest of the tests and added to auto-upgrade for the data layout.
2024-10-09[LLVM][AArch64] Enable SVEIntrinsicOpts at all optimisation levels.Paul Walker
2024-10-09Revert "[LLVM][AArch64] Enable SVEIntrinsicOpts at all optimisation levels."Paul Walker
This reverts commit 886d98e149843f3890ef4dd556a5dee45ff97fe9.
2024-10-09[LLVM][AArch64] Enable SVEIntrinsicOpts at all optimisation levels.Paul Walker
2024-08-30Fix cl::desc typos in aarch64-enable-dead-defs and arm-implicit-it. (#106712)rjmansfield
2024-08-21[AArch64] Add SME peephole optimizer pass (#104612)Sander de Smalen
This pass removes back-to-back smstart/smstop instructions to reduce the number of streaming mode changes in a function. The implementation as proposed doesn't aim to solve all problems yet and suggests a number of cases that can be optimized in the future.
2024-07-05[AArch64] When hardening against SLS, only create called thunks (#97472)Anatoly Trosinenko
In preparation for implementing hardening of BLRA* instructions, restrict thunk function generation to only the thunks being actually called from any function. As described in the existing comments, emitting all possible thunks for BLRAA and BLRAB instructions would mean adding about 1800 functions in total, most of which are likely not to be called. This commit merges AArch64SLSHardening class into SLSBLRThunkInserter, so thunks can be created as needed while rewriting a machine function. The usages of TII, TRI and ST fields of AArch64SLSHardening class are replaced with requesting them in-place, as ThunkInserter assumes multiple "entry points" in contrast to the only runOnMachineFunction method of AArch64SLSHardening. The runOnMachineFunction method essentially replaces pre-existing insertThunks implementation as there is no more need to insert all possible thunks unconditionally. Instead, thunks are created on first use from inside of insertThunks method.
2024-06-24Reapply [IR] Lazily initialize the class to pass name mapping (NFC) (#96321) ↵Nikita Popov
(#96462) On MSVC the `this` uses inside `decltype` require a lambda capture. On clang they result in an unused capture warning instead. Add the capture and suppress the warning with `(void)this`. ----- Initializing this map is somewhat expensive (especially for O0), so we currently only do it if certain flags are used. I would like to make use of it for crash dumps (#96078), where we don't know in advance whether it will be needed or not. This patch changes the initialization to a lazy approach, where a callback is registered that does the actual initialization. The callbacks will be run the first time the pass name is requested. This way there is no compile-time impact if the mapping is not used.
2024-06-24Revert "[IR] Lazily initialize the class to pass name mapping (NFC) (#96321)"Nikita Popov
My attempt to fix the Windows build made things worse, revert entirely for now. This reverts commit e7137f2fed5cfee822ae3c4c6d39188adb59a16c. This reverts commit 6eaf204dbb0a6a81cddfd02f625c130f7bb1aae5. This reverts commit 957dc4366dd2ce9d5d2991c3ad76bbf438e9954e.
2024-06-24[IR] Lazily initialize the class to pass name mapping (NFC) (#96321)Nikita Popov
Initializing this map is somewhat expensive (especially for O0), so we currently only do it if certain flags are used. I would like to make use of it for crash dumps (#96078), where we don't know in advance whether it will be needed or not. This patch changes the initialization to a lazy approach, where a callback is registered that does the actual initialization. The callbacks will be run the first time the pass name is requested. This way there is no compile-time impact if the mapping is not used.
2024-06-20[AArch64] Consider runtime mode when deciding to use SVE for fixed-length ↵Sander de Smalen
vectors. (#96081) This also fixes the case where an SVE div is incorrectly to be assumed available in non-streaming mode with SME.
2024-06-07[AArch64][LoopIdiom] Generalize AArch64LoopIdiomTransform into ↵Min-Yih Hsu
LoopIdiomVectorize (#94081) To facilitate sharing LoopIdiomTransform between AArch64 and RISC-V, this first patch moves AArch64LoopIdiomTransform from lib/Target/AArch64 to lib/Transforms/Vectorize and renames it to LoopIdiomVectorize. The following patch (#94082) will teach LoopIdiomVectorize how to generate VP intrinsics (in addition to the current masked vector style) in favor of RVV.
2024-06-04Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149)paperchalice
- Fix build with `EXPENSIVE_CHECKS` - Remove unused `PassName::ID` to resolve warning - Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
2024-06-02Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146)paperchalice
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to de37c06f01772e02465ccc9f538894c76d89a7a1 It still breaks EXPENSIVE_CHECKS build. Sorry.
2024-06-02[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567)paperchalice
Port selection dag isel to new pass manager. Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
2024-05-22[AArch64] NFC: Rename -force-streaming-compatible-sve to ↵Sander de Smalen
-force-streaming-compatible (#92774) The behaviour of the flag should be equivalent to __arm_streaming_compatible. At the moment, the name suggests that '-force-streaming-compatible-sve' on its own (i.e. without specifying `+sve`) enables the compiler to use the streaming-compatible subset of SVE instructions, but the semantics merely are that the function can be called with either PSTATE.SM=0 or PSTATE.SM=1.
2024-05-05[clang backend] In AArch64's DataLayout, specify a minimum function ↵Doug Wyatt
alignment of 4. (#90702) This addresses an issue where the explicit alignment of 2 (for C++ ABI reasons) was being propagated to the back end and causing under-aligned functions (in special sections). This is an alternate approach suggested by @efriedma-quic in PR #90415. Fixes #90358
2024-04-15[AArch64][SME] Create new pass to remove COALESCER_BARRIER early. (#85386)Sander de Smalen
The purpose of the COALESCER_BARRIER pseudo node is to prevent the register coalescer from coalescing certain COPY instructions around smstart/smstop instructions, so that we spill only the (required) FPR register rather than the encompassing ZPR register. The pseudos are removed in the AArch64ExpandPseudo pass. However, because the node itself is a _use_ of a register, this occassionally leads to redundant spills/fills, because the register allocator thinks the virtual register is actually used before an smstart/smstop instruction, causing it to be filled, at which points it requires immediate spilling again to ensure it stays live over the smstart/smstop instruction. We can avoid that by removing the pseudo nodes right after coalescing, but before register allocation.
2024-03-25[aarch64] Unguard GEPOpt from O3Nathan Lanza
This chunk of code currently runs only if the optimization mode is O3 AND the EnableGEPOpt flag is set. Given that this is the only use case for the EnableGEPOpt flag, the guarding against O3 is kinda pointless. IF the user wants to enable it then the flag should be sufficient. Reviewers: TNorthover, aeubanks Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/86588
2024-03-21[NewPM][AArch64] Add AArch64PassRegistry.def (#85215)paperchalice
PR #83567 ports `SelectionDAGISel` to the new pass manager, then each backend should provide `<Target>DagToDagISel()` in new pass manager style. Then each target should provide `<Target>PassRegistry.def` to register backend passes in `registerPassBuilderCallbacks` to reduce duplicate code. This PR adds `AArch64PassRegistry.def` to AArch64 backend and boilerplate code in `registerPassBuilderCallbacks`.
2024-03-07[AArch64] Move SLS later in pass pipeline (#84210)ostannard
Currently, the SLS hardening pass is run before the machine outliner, which means that the outliner creates new functions and calls which do not have the SLS hardening applied. The fix for this is to move the SLS passes to after the outliner, as has recently been done for the return address signing pass. This also avoids a bug where the SLS outliner emits code with instructions after a return, which the outliner doesn't correctly handle.
2024-02-25[CodeGen] Port AtomicExpand to new Pass Manager (#71220)Rishabh Bali
Port the `atomicexpand` pass to the new Pass Manager. Fixes #64559