summaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
AgeCommit message (Collapse)Author
2025-09-15[AMDGPU] Add the support for `.cluster_dims` code object metadata (#158721)Shilei Tian
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-07-31[AMDGPU] Remove an unnecessary cast (NFC) (#151440)Kazu Hirata
getZExtValue() already returns uint64_t.
2025-03-10AMDGPU: Move enqueued block handling into clang (#128519)Matt Arsenault
The previous implementation wasn't maintaining a faithful IR representation of how this really works. The value returned by createEnqueuedBlockKernel wasn't actually used as a function, and hacked up later to be a pointer to the runtime handle global variable. In reality, the enqueued block is a struct where the first field is a pointer to the kernel descriptor, not the kernel itself. We were also relying on passing around a reference to a global using a string attribute containing its name. It's better to base this on a proper IR symbol reference during final emission. This now avoids using a function attribute on kernels and avoids using the additional "runtime-handle" attribute to populate the final metadata. Instead, associate the runtime handle reference to the kernel with the !associated global metadata. We can then get a final, correctly mangled name at the end. I couldn't figure out how to get rename-with-external-symbol behavior using a combination of comdats and aliases, so leaves an IR pass to externalize the runtime handles for codegen. If anything breaks, it's most likely this, so leave avoiding this for a later step. Use a special section name to enable this behavior. This also means it's possible to declare enqueuable kernels in source without going through the dedicated block syntax or other dedicated compiler support. We could move towards initializing the runtime handle in the compiler/linker. I have a working patch where the linker sets up the first field of the handle, avoiding the need to export the block kernel symbol for the runtime. We would need new relocations to get the private and group sizes, but that would avoid the runtime's special case handling that requires the device_enqueue_symbol metadata field. https://reviews.llvm.org/D141700
2024-11-05AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups ↵Matt Arsenault
(#113751) 0 does not make sense as a value for this to be, much less the default. Also stop emitting each individual field if it is the default, rather than if any element was the default. Also fix the name of the test since it didn't exactly match the real attribute name.
2024-10-06[AMDGPU] Support preloading hidden kernel arguments (#98861)Austin Kerbow
Adds hidden kernel arguments to the function signature and marks them inreg if they should be preloaded into user SGPRs. The normal kernarg preloading logic then takes over with some additional checks for the correct implicitarg_ptr alignment. Special care is needed so that metadata for the hidden arguments is not added twice when generating the code object.
2024-10-03[AMDGPU] Qualify auto. NFC. (#110878)Jay Foad
Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)
2024-07-16[AMDGPU] Concatenate nested namespaces. NFC.Jay Foad
2024-06-28[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)Nikita Popov
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.
2024-06-26[AMDGPU] MCExpr-ify AMDGPU HSAMetadata (#94788)Janek van Oirschot
Enables MCExpr for HSAMetadata, particularly, HSAMetadata's msgpack format.
2024-05-09MCExpr-ify SIProgramInfo (#88257)Janek van Oirschot
Convert members in SIProgramInfo affected by variables provided by AMDGPUResourceUsageAnalysis into MCExprs.
2024-03-12[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)Jun Wang
A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-02-05[AMDGPU] Introduce Code Object V6 (#76954)Pierre van Houtryve
Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch #76955
2024-01-21[AMDGPU] Add an asm directive to track code_object_version (#76267)Emma Pilkington
Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.
2024-01-04[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 (#65273)Chaitanya
"hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout. Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a function uses dynamic LDS. hidden argument will be added in below cases: - LDS global is used in the kernel. - Kernel calls a function which uses LDS global. - LDS pointer is passed as argument to kernel itself.
2023-11-07Reland: [AMDGPU] Remove Code Object V3 (#67118)Pierre van Houtryve
V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.
2023-10-18Revert "[AMDGPU] Remove Code Object V3 (#67118)"pvanhout
This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.
2023-10-16[AMDGPU] Remove Code Object V3 (#67118)Pierre van Houtryve
V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.
2023-09-21[AMDGPU] Remove Code Object V2 (#65715)Pierre van Houtryve
Code Object V2 has been deprecated for more than a year now. We can safely remove it from LLVM. - [clang] Remove support for the `-mcode-object-version=2` option. - [lld] Remove/refactor tests that were still using COV2 - [llvm] Update AMDGPUUsage.rst - Code Object V2 docs are left for informational purposes because those code objects may still be supported by the runtime/loaders for a while. - [AMDGPU] Remove COV2 emission capabilities. - [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2 - [AMDGPU] Update all tests that were still using COV2 - They are either deleted or ported directly to code object v4 (as v3 is also planned to be removed soon).
2023-09-12[AMDGPU] Add utilities to track number of user SGPRs. NFC.Austin Kerbow
Factor out and unify some common code that calculates and tracks the number of user SGRPs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159439
2023-08-23[AMDGPU] Emit .actual_access metadataChangpeng Fang
Summary: Emit .actual_access metadata for the deduced argument access qualifier, and .access for kernel_arg_access_qual. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D157451
2023-08-10[llvm] Drop some bitcasts and references related to typed pointersBjorn Pettersson
Differential Revision: https://reviews.llvm.org/D157551
2023-06-09[AMDGPU] Port no-hsa-graphic-shaders.ll to code object V4pvanhout
Split from D146023 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152432
2023-04-20AMDGPU: Really invert handling of enqueued block detectionMatt Arsenault
Remove the broken call graph analysis in the block enqueue lowering pass. The previous iteration was reverted due to a runtime bug when the completion action was unconditionally enabled.
2023-02-10AMDGPU: Use module flag to get code object version at IR level folow-upChangpeng Fang
Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293
2023-02-02AMDGPU: Use module flag to get code object version at IR levelChangpeng Fang
Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313
2023-01-12Partially reapply "AMDGPU: Invert handling of enqueued block detection"Matt Arsenault
This mostly reverts commit 270e96f435596449002fc89962595497481c8770. Keep the attributor related changes around, but functionally restore the old behavior as a workaround. Device enqueue goes back to not working at -O0 with this version.
2023-01-07Revert "AMDGPU: Invert handling of enqueued block detection"Matt Arsenault
This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e. The runtime is having trouble with this at -O0 when the inputs are always enabled.
2023-01-06AMDGPU: Invert handling of enqueued block detectionMatt Arsenault
Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set amdgpu-no-default-queue and amdgpu-no-completion-action up front so they never have to pay the cost. There are also so many of these now, the offset use API should probably consider all of them at once. Maybe they should merge into one attribute with used fields. Having separate functions for each field in AMDGPUBaseInfo is also not the greatest API (might as well fix this when the patch to get the object version from the module lands).
2023-01-05[AMDGPU] Add .uniform_work_group_size metadata to v5Vang Thao
Amdgpu kernel with function attribute "uniform-work-group-size"="true" requires uniform work group size (i.e. each dimension of global size is a multiple of corresponding dimension of work group size). hipExtModuleLaunchKernel allows to launch HIP kernel with non-uniform workgroup size, which makes it necessary for runtime to check and enforce uniform workgroup size if kernel requires it. To let runtime be able to enforce that, this metadata is needed to indicate that the kernel requires uniform workgroup size. Reviewed By: kzhuravl, arsenm Differential Revision: https://reviews.llvm.org/D141012
2022-12-14[AMDGPU] Stop using make_pair and make_tuple. NFC.Jay Foad
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828
2022-12-13[AMDGPU] Add `.workgroup_processor_mode` to v5 MDPierre van Houtryve
Adds Workgroup Processor Mode (WGP) to the HSA Metadata for Code Object v5/GFX10+. The field is already present as an asm directive and in the compute program resource register but is also needed in the MD. Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D139931
2022-12-13[CodeGen] llvm::Optional => std::optionalFangrui Song
2022-12-02[Target] Use std::nullopt instead of None (NFC)Kazu Hirata
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-10-11[AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the ↵Abinav Puthan Purayil
metadata map specific to code object v5 and later Unfortunately, we have a broken handling of this in the runtime of rocm 5.3. The runtime is expected to handle this correctly when v5 becomes the default. Differential Revision: https://reviews.llvm.org/D134714
2022-09-06[AMDGPU/Metadata] Rename HSAMD::MetadataStreamer classesraghavmedicherla
Renamed all HSAMD::MetadataStreamer classes to improve readability of the code. Differential Revision: https://reviews.llvm.org/D133156
2022-08-28[Target] Qualify auto in range-based for loops (NFC)Kazu Hirata
2022-08-23AMDGPU/MetaData: Restrict address space key to only be emitted for ↵Raghav
"global_buffer" and "dynamic_shared_pointer" This matches .address_space docs at https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v3 Differential Revision: https://reviews.llvm.org/D132145
2022-07-18[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the ↵Abinav Puthan Purayil
kernel metadata map This change introduces the dynamic stack boolean field to code-object-v3 and above under the code properties of the kernel descriptor and under the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to the is_dynamic_callstack field of amd_kernel_code_t. Differential Revision: https://reviews.llvm.org/D128344
2022-04-12AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionallyChangpeng Fang
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function. Reviewers: arsenm, sameerds, b-sumner and foad Differential Revision: https://reviews.llvm.org/D123548
2022-04-11AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5Changpeng Fang
Summary: In emitting metadata for implicit kernel arguments, we need to be in sync with the actual loads to align the implicit kernel argument segment to 8 byte boundary. In this work, we simply force this alignment through the first implicit argument. In addition, we don't emit metadata for any implicit kernel argument if none of them is actually used. Reviewers: arsenm, b-sumner Differential Revision: https://reviews.llvm.org/D123346
2022-04-05[AMDGPU][OpenCL] Remove "printf and hostcall" diagnosticScott Linder
The diagnostic is unreliable, and triggers even for dead uses of hostcall that may exist when linking the device-libs at lower optimization levels. Eliminate the diagnostic, and directly document the limitation for OpenCL before code object V5. Make some NFC changes to clarify the related code in the MetadataStreamer. Add a clang test to tie OCL sources containing printf to the backend IR tests for this situation. Reviewed By: sameerds, arsenm, yaxunl Differential Revision: https://reviews.llvm.org/D121951
2022-03-28[AMDGPU][NFC]: Remove unnecessary MFI functionsChangpeng Fang
Summary: hasHostcallPtr() and hasHeapPtr() are only used in metadata emit. However, we can use the corresponding function attributes directly instead introducing the functions. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D122600
2022-03-09AMDGPU: Set up User SGPRs for queue_ptr only when necessaryChangpeng Fang
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend. Reviewers: sameerds, arsenm Fixes: SWDEV-307189 Differential Revision: https://reviews.llvm.org/D119762
2022-02-25[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernargChangpeng Fang
Summary: Emit metadata for hidden_heap_v1 kernarg Reviewers: sameerds, b-sumner Fixes: SWDEV-307188 Differential Revision: https://reviews.llvm.org/D119027
2022-02-16[AMDGPU] Add agpr_count to metadata and AsmParserJacob Lambert
gfx90a allows the number of ACC registers (AGPRs) to be set independently to the VGPR registers. For both HSA and PAL metadata, we now include an "agpr_count" key to report the number of AGPRs set for supported devices (gfx90a, gfx908, as determined by hasMAIInsts()). This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL. The AsmParser also now recognizes ".kernel.agpr_count" for supported devices. Differential Revision: https://reviews.llvm.org/D116140
2022-02-11[AMDGPU] replace hostcall module flag with function attributeSameer Sahasrabuddhe
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216
2022-01-31AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit ↵Changpeng Fang
kernel args Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2 Reviewers: t-tye, b-sumner, arsenm, and bcahoon Fixes: SWDEV-307188, SWDEV-307189 Differential Revision: https://reviews.llvm.org/D118272
2022-01-26[AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointersNikita Popov
AMDGPUHSAMetadataStreamer currently assumes that pointer arguments without align attribute have ABI alignment of the pointee type. This is incompatible with opaque pointers, but also plain incorrect: Pointer arguments without explicit alignment have alignment 1. It is the responsibility of the frontent to add correct align annotations. Differential Revision: https://reviews.llvm.org/D118229
2022-01-25[NFC] Remove uses of PointerType::getElementType()Nikita Popov
Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.
2021-12-04AMDGPU: Optimize out implicit kernarg argument allocation if unusedMatt Arsenault
We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be unused. Start using it to avoid allocating the implicit arguments if unneeded.