diff options
Diffstat (limited to 'llvm/docs/AMDGPUUsage.rst')
| -rw-r--r-- | llvm/docs/AMDGPUUsage.rst | 70 |
1 files changed, 51 insertions, 19 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index b6d61a62f50f..37563203f2f8 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -517,19 +517,19 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following **GCN GFX12 (RDNA 4)** [AMD-GCN-GFX12-RDNA4]_ ----------------------------------------------------------------------------------------------------------------------- - ``gfx1200`` ``amdgcn`` dGPU - cumode - Architected *TBA* - - wavefrontsize64 flat - scratch .. TODO:: + ``gfx1200`` ``amdgcn`` dGPU - cumode - Architected - Radeon RX 9060 + - wavefrontsize64 flat - Radeon RX 9060 XT + scratch - Packed - work-item Add product - IDs names. + work-item + IDs - ``gfx1201`` ``amdgcn`` dGPU - cumode - Architected *TBA* - - wavefrontsize64 flat - scratch .. TODO:: + ``gfx1201`` ``amdgcn`` dGPU - cumode - Architected - Radeon RX 9070 + - wavefrontsize64 flat - Radeon RX 9070 XT + scratch - Radeon RX 9070 GRE - Packed - work-item Add product - IDs names. + work-item + IDs ``gfx1250`` ``amdgcn`` APU - Architected *TBA* flat @@ -537,6 +537,8 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following - Packed work-item Add product IDs names. + - Workgroup + Clusters =========== =============== ============ ===== ================= =============== =============== ====================== @@ -768,9 +770,6 @@ For example: performant than code generated for XNACK replay disabled. - cu-stores TODO On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used. - If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater. - =============== ============================ ================================================== .. _amdgpu-target-id: @@ -1098,6 +1097,22 @@ is conservatively correct for OpenCL. - ``wavefront`` and executed by a thread in the same wavefront. + ``cluster`` Synchronizes with, and participates in modification + and seq_cst total orderings with, other operations + (except image operations) for all address spaces + (except private, or generic that accesses private) + provided the other operation's sync scope is: + + - ``system``, ``agent`` or ``cluster`` and + executed by a thread on the same cluster. + - ``workgroup`` and executed by a thread in the + same work-group. + - ``wavefront`` and executed by a thread in the + same wavefront. + + On targets that do not support workgroup cluster + launch mode, this behaves like ``agent`` scope instead. + ``workgroup`` Synchronizes with, and participates in modification and seq_cst total orderings with, other operations (except image operations) for all address spaces @@ -1131,6 +1146,9 @@ is conservatively correct for OpenCL. ``agent-one-as`` Same as ``agent`` but only synchronizes with other operations within the same address space. + ``cluster-one-as`` Same as ``cluster`` but only synchronizes with other + operations within the same address space. + ``workgroup-one-as`` Same as ``workgroup`` but only synchronizes with other operations within the same address space. @@ -1437,7 +1455,6 @@ The AMDGPU backend implements the following LLVM IR intrinsics. Returns a pair for the swapped registers. The first element of the return corresponds to the swapped element of the first argument. - llvm.amdgcn.permlane32.swap Provide direct access to `v_permlane32_swap_b32` instruction on supported targets. Swaps the values across lanes of first 2 operands. Rows 2 and 3 of the first operand are swapped with rows 0 and 1 of the second operand (one row is 16 lanes). @@ -1458,6 +1475,25 @@ The AMDGPU backend implements the following LLVM IR intrinsics. - `v_mov_b32 <dest> <old>` - `v_mov_b32 <dest> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>` + :ref:`llvm.prefetch <int_prefetch>` Implemented on gfx1250, ignored on earlier targets. + First argument is flat, global, or constant address space pointer. + Any other address space is not supported. + On gfx125x generates flat_prefetch_b8 or global_prefetch_b8 and brings data to GL2. + Second argument is rw and currently ignored. Can be 0 or 1. + Third argument is locality, 0-3. Translates to memory scope: + + * 0 - SCOPE_SYS + * 1 - SCOPE_DEV + * 2 - SCOPE_SE + * 3 - SCOPE_SE + + Note that SCOPE_CU is not generated and not safe on an invalid address. + Fourth argument is cache type: + + * 0 - Instruction cache, currently ignored and no code is generated. + * 1 - Data cache. + + Instruction cache prefetches are unsafe on invalid address. ============================================== ========================================================== .. TODO:: @@ -5114,9 +5150,7 @@ The fields used by CP for code objects before V3 also match those specified in and must be 0, >454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT _SIZE - 455 1 bit USES_CU_STORES GFX12.5: Whether the ``cu-stores`` target attribute is enabled. - If 0, then all stores are ``SCOPE_SE`` or higher. - 457:456 2 bits Reserved, must be 0. + 457:455 3 bits Reserved, must be 0. 458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-GFX9 Reserved, must be 0. GFX10-GFX11 @@ -18254,8 +18288,6 @@ terminated by an ``.end_amdhsa_kernel`` directive. GFX942) ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - ``.amdhsa_uses_cu_stores`` 0 GFX12.5 Controls USES_CU_STORES in - :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. Specific |
