summaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
AgeCommit message (Collapse)Author
2025-11-18[AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (#168058)Pierre van Houtryve
Also breaks the long inheritance chains by making both `SIGfx10CacheControl` and `SIGfx12CacheControl` inherit from `SICacheControl` directly. With this patch, we now just have 3 `SICacheControl` implementations that each do their own thing, and there is no more code hidden 3 superclasses above (which made this code harder to read and maintain than it needed to be).
2025-11-17[AMDGPU][SIMemoryLegalizer] Combine all GFX6-9 CacheControl Classes (#168052)Pierre van Houtryve
Merge the following classes into `SIGfx6CacheControl`: - SIGfx7CacheControl - SIGfx90ACacheControl - SIGfx940CacheControl They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same. This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions. There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.
2025-11-14[AMDGPU] Make use of getFunction and getMF. NFC. (#167872)Jay Foad
2025-10-27[AMDGPU] Add target feature for waits before system scope stores. NFC. (#164993)Jay Foad
2025-10-25[Target] Add "override" where appropriate (NFC) (#165083)Kazu Hirata
Note that "override" makes "virtual" redundant. Identified with modernize-use-override.
2025-10-21[AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (#161638)Pierre van Houtryve
They were previously optimized to not emit any waitcnt, which is technically correct because there is no reordering of operations at workgroup scope in CU mode for GFX10+. This breaks transitivity however, for example if we have the following sequence of events in one thread: - some stores - store atomic release syncscope("workgroup") - barrier then another thread follows with - barrier - load atomic acquire - store atomic release syncscope("agent") It does not work because, while the other thread sees the stores, it cannot release them at the wider scope. Our release fences aren't strong enough to "wait" on stores from other waves. We also cannot strengthen our release fences any further to allow for releasing other wave's stores because only GFX12 can do that with `global_wb`. GFX10-11 do not have the writeback instruction. It'd also add yet another level of complexity to code sequences, with both acquire/release having CU-mode only alternatives. Lastly, acq/rel are always used together. The price for synchronization has to be paid either at the acq, or the rel. Strengthening the releases would just make the memory model more complex but wouldn't help performance. So the choice here is to streamline the code sequences by making CU and WGP mode emit almost identical (vL0 inv is not needed in CU mode) code for release (or stronger) atomic ordering. This also removes the `vm_vsrc(0)` wait before barriers. Now that the release fence in CU mode is strong enough, it is no longer needed. Supersedes #160501 Solves SC1-6454
2025-10-20[AMDGPU] Enable volatile and non-temporal for loads to LDS (#153244)Krzysztof Drewniak
The primary purpose of this commit is to enable marking loads to LDS (global.load.lds, buffer.*.load.lds) volatile (using bit 31 of the aux as with normal buffer loads) and to ensure that their !nontemporal annotations translate to appropriate settings of te cache control bits. However, in the process of implementing this feature, we also fixed - Incorrect handling of buffer loads to LDS in GlobalISel - Updating the handling of volatile on buffers in SIMemoryLegalizer: previously, the mapping of address spaces would cause volatile on buffer loads to be silently dropped on at least gfx10. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-10-20[AMDGPU][SIMemoryLegalizer][GFX12] Correctly insert sample/bvhcnt (#161637)Pierre van Houtryve
The check used was not strong enough to prevent the insertion of sample/bvhcnt when they were not needed. I assume SIInsertWaitCnts was trimming those away anyway, but this was a bug nonetheless. We were inserting SAMPLE/BVHcnt waits in places where we only needed to wait on the previous atomic operation. Neither of these counter have any atomics associated with them.
2025-09-24[AMDGPU] Update comments in memory legalizer. NFC (#160453)Stanislav Mekhanoshin
2025-09-24[AMDGPU] SIMemoryLegalizer: Factor out check if memory operations can affect ↵Fabian Ritter
the global AS (#160129) Mostly NFC, and adds an assertion for gfx12 to ensure that no atomic scratch instructions are present in the case of GloballyAddressableScratch. This should always hold because of #154710.
2025-09-23[AMDGPU] Insert waitcnt for non-global fence release in GFX12 (#159282)Fabian Ritter
A fence release could be followed by a barrier, so it should wait for the relevant memory accesses to complete, even if it is mmra-limited to LDS. So far, that would be skipped for non-global fence releases. Fixes SWDEV-554932.
2025-09-12[NFC][AMDGPU][SIMemoryLegalizer] remove effectively empty function (#156806)Sameer Sahasrabuddhe
The removed function SIGfx90ACacheControl::enableLoadCacheBypass() does not actually do anything except one assert and one unreachable.
2025-09-10[AMDGPU][gfx1250] Support "cluster" syncscope (#157641)Pierre van Houtryve
Defaults to "agent" for targets that do not support it. - Add documentation - Register it in MachineModuleInfo - Add MemoryLegalizer support
2025-09-10[AMDGPU][gfx1250] Remove SCOPE_SE for scratch stores (#157640)Pierre van Houtryve
2025-09-10Revert "[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588)" (#157639)Pierre van Houtryve
This reverts commit be17791f2624f22b3ed24a2539406164a379125d. This is not necessary for gfx1250 anymore.
2025-09-10[AMDGPU][gfx1250] Implement SIMemoryLegalizer (#154726)Pierre van Houtryve
Implements the base of the MemoryLegalizer for a roughly correct GFX1250 memory model. Documentation will come later, and some remaining changes still have to be added, but this is the backbone of the model.
2025-09-04[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)Pierre van Houtryve
- Add clang built-ins + sema/codegen - Add IR Intrinsic + verifier - Add DAG/GlobalISel codegen for the intrinsics - Add lowering in SIMemoryLegalizer using a MMO flag.
2025-09-02[AMDGPU] Reenable BackOffBarrier on GFX11/12 (#155370)Pierre van Houtryve
Re-enable it by adding a wait on vm_vsrc before every barrier "start" instruction in GFX10/11/12 CU mode. This is a less strong wait than what we do without BackOffBarrier, thus this shouldn't introduce any new guarantees that can be abused, instead it relaxes the guarantees we have now to the bare minimum needed to support the behavior users want (fence release + barrier works). There is an exact memory model in the works which will be documented separately.
2025-07-30[AMDGPU] introduce S_WAITCNT_LDS_DIRECT in the memory legalizer (#150887)Sameer Sahasrabuddhe
The new instruction represents the unknown number of waitcnts needed at a release operation to ensure that prior direct loads to LDS (formerly called LDS DMA) are completed. The instruction is replaced in SIInsertWaitcnts with a suitable value for vmcnt(). Co-authored-by: Austin Kerbow <austin.kerbow@amd.com>.
2025-07-29[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588)Pierre van Houtryve
Determines whether we can use `SCOPE_CU` stores (on by default), or whether all stores must be done at `SCOPE_SE` minimum.
2025-07-28[AMDGPU][gfx12] Clean-up implementation of waits before SCOPE_SYS stores ↵Pierre van Houtryve
(#150587) We can do it all in finalizeStore if we ensure it always sees the stores. For that, I needed to fix a hidden bug where finalizeStore wouldn't see all stores because sometimes the iterator got out-of-sync and didn't point to the store anymore. This also removes the waits before volatile LDS stores which never needed it, that was a bug until now.
2025-07-28[AMDGPU][gfx1250] Use SCOPE_SE for stores that may hit scratch (#150586)Pierre van Houtryve
2025-07-24[NFC][AMDGPU] Refactor handling of `amdgpu-synchronize-as` MD on fences ↵Pierre van Houtryve
(#148630) Directly plug it into the MMO instead, which is much cleaner.
2025-07-24[NFC][AMDGPU] Rename "amdgpu-as" to "amdgpu-synchronize-as" (#148627)Pierre van Houtryve
"amdgpu-as" is way too vague and doesn't give enough context. We may want to support it on normal atomics too, to control the synchronized (ordered) AS. If we do that, the name has to be less vague.
2025-06-20[AMDGPU] Don't insert wait instructions that are not supported by gfx1250 ↵Stanislav Mekhanoshin
(#145084) No tests yet, but it will allow further tests not to be polluted with these waits.
2025-05-28Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)Justin Bogner
This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.
2025-03-12[AMDGPU][NPM] Port SIMemoryLegalizer to NPM (#130060)Akshat Oke
2025-02-19[AMDGPU] Remove FeatureForceStoreSC0SC1 (#126878)Fabian Ritter
This was only used for gfx940 and gfx941, which have since been removed. For SWDEV-512631
2025-02-19[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763)Fabian Ritter
gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631
2024-10-07[AMDGPU] Only emit SCOPE_SYS global_wb (#110636)Pierre van Houtryve
global_wb with scopes lower than SCOPE_SYS is unnecessary for correctness. I was initially optimistic they would be very cheap no-ops but they can actually be quite expensive so let's avoid them.
2024-09-09[AMDGPU] Document & Finalize GFX12 Memory Model (#98599)Pierre van Houtryve
Documents the memory model implemented as of #98591, with some fixes/optimizations to the implementation.
2024-07-22AMDGPU: Query MachineModuleInfo from PM instead of MachineFunction (#99679)Matt Arsenault
2024-07-16[AMDGPU] Fix and add namespace closing comments. NFC.Jay Foad
2024-07-16[AMDGPU] Implement GFX12 Memory Model (#98591)Pierre van Houtryve
- Emit GLOBAL_WB instructions - Reflect synscope on instructions's `scope:` operand Fixes SWDEV-468508 Fixes SWDEV-470735 Fixes SWDEV-468392 Fixes SWDEV-469622
2024-05-27[AMDGPU] Add amdgpu-as MMRA for fences (#78572)Pierre van Houtryve
Using MMRAs, allow `builtin_amdgcn_fence` to emit fences that only target one or more address spaces, instead of fencing all address spaces at once. This is done through a `amdgpu-as` MMRA. Currently focused on OpenCL fences, but can very easily support more AS names and codegen on more than just fences.
2024-03-06[AMDGPU] Handle amdgpu.last.use metadata (#83816)Mirko Brkušanin
Convert !amdgpu.last.use metadata into MachineMemOperand for last use and handle it in SIMemoryLegalizer similar to nontemporal and volatile.
2024-03-04[AMDGPU] Fix setting nontemporal in memory legalizer (#83815)Mirko Brkušanin
Iterator MI can advance in insertWait() but we need original instruction to set temporal hint. Just move it before handling volatile.
2024-02-28AMDGPU/GFX12: Insert waitcnts before stores with scope_sys (#82996)Petar Avramovic
Insert waitcnts for loads and atomics before stores with system scope. Scope is field in instruction encoding and corresponds to desired coherence level in cache hierarchy. Intrinsic stores can set scope in cache policy operand. If volatile keyword is used on generic stores memory legalizer will set scope to system. Generic stores, by default, get lowest scope level. Waitcnts are not required if it is guaranteed that memory is cached. For example vulkan shaders can guarantee this. TODO: implement flag for frontends to give us a hint not to insert waits. Expecting vulkan flag to be implemented as vulkan:private MMRA.
2024-02-13[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450)Pierre van Houtryve
Fixes SWDEV-443292
2024-01-18[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)Jay Foad
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.
2024-01-08[AMDGPU] Add new cache flushing instructions for GFX12 (#76944)Mirko Brkušanin
Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>
2023-12-15[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known ↵Pierre van Houtryve
to be 0 already (#72830) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>
2023-05-12AMDGPU: Force sc0 and sc1 on stores for gfx940 and gfx941Konstantin Zhuravlyov
Differential Revision: https://reviews.llvm.org/D149986
2023-03-08[AMDGPU] Skip buffer_wbl2 before atomic fence acquireStanislav Mekhanoshin
Memory models for gfx90a and gfx940 do not require buffer_wbl2 before the fence for acquire ordering, but we do insert the full release. Fixes: SWDEV-386785 Differential Revision: https://reviews.llvm.org/D145524
2023-02-07[NFC][TargetParser] Remove llvm/Support/TargetParser.hArchibald Elliott
2022-12-17std::optional::value => operator*/operator->Fangrui Song
value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.
2022-12-14[AMDGPU] Stop using make_pair and make_tuple. NFC.Jay Foad
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828
2022-12-13[CodeGen] llvm::Optional => std::optionalFangrui Song
2022-12-08[llvm] Use std::nullopt instead of None in comments (NFC)Kazu Hirata
This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02[Target] Use std::nullopt instead of None (NFC)Kazu Hirata
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716