summaryrefslogtreecommitdiff
path: root/llvm/docs/AMDGPUUsage.rst
diff options
context:
space:
mode:
Diffstat (limited to 'llvm/docs/AMDGPUUsage.rst')
-rw-r--r--llvm/docs/AMDGPUUsage.rst70
1 files changed, 51 insertions, 19 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index b6d61a62f50f..37563203f2f8 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -517,19 +517,19 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
**GCN GFX12 (RDNA 4)** [AMD-GCN-GFX12-RDNA4]_
-----------------------------------------------------------------------------------------------------------------------
- ``gfx1200`` ``amdgcn`` dGPU - cumode - Architected *TBA*
- - wavefrontsize64 flat
- scratch .. TODO::
+ ``gfx1200`` ``amdgcn`` dGPU - cumode - Architected - Radeon RX 9060
+ - wavefrontsize64 flat - Radeon RX 9060 XT
+ scratch
- Packed
- work-item Add product
- IDs names.
+ work-item
+ IDs
- ``gfx1201`` ``amdgcn`` dGPU - cumode - Architected *TBA*
- - wavefrontsize64 flat
- scratch .. TODO::
+ ``gfx1201`` ``amdgcn`` dGPU - cumode - Architected - Radeon RX 9070
+ - wavefrontsize64 flat - Radeon RX 9070 XT
+ scratch - Radeon RX 9070 GRE
- Packed
- work-item Add product
- IDs names.
+ work-item
+ IDs
``gfx1250`` ``amdgcn`` APU - Architected *TBA*
flat
@@ -537,6 +537,8 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
- Packed
work-item Add product
IDs names.
+ - Workgroup
+ Clusters
=========== =============== ============ ===== ================= =============== =============== ======================
@@ -768,9 +770,6 @@ For example:
performant than code generated for XNACK replay
disabled.
- cu-stores TODO On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used.
- If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater.
-
=============== ============================ ==================================================
.. _amdgpu-target-id:
@@ -1098,6 +1097,22 @@ is conservatively correct for OpenCL.
- ``wavefront`` and executed by a thread in the
same wavefront.
+ ``cluster`` Synchronizes with, and participates in modification
+ and seq_cst total orderings with, other operations
+ (except image operations) for all address spaces
+ (except private, or generic that accesses private)
+ provided the other operation's sync scope is:
+
+ - ``system``, ``agent`` or ``cluster`` and
+ executed by a thread on the same cluster.
+ - ``workgroup`` and executed by a thread in the
+ same work-group.
+ - ``wavefront`` and executed by a thread in the
+ same wavefront.
+
+ On targets that do not support workgroup cluster
+ launch mode, this behaves like ``agent`` scope instead.
+
``workgroup`` Synchronizes with, and participates in modification
and seq_cst total orderings with, other operations
(except image operations) for all address spaces
@@ -1131,6 +1146,9 @@ is conservatively correct for OpenCL.
``agent-one-as`` Same as ``agent`` but only synchronizes with other
operations within the same address space.
+ ``cluster-one-as`` Same as ``cluster`` but only synchronizes with other
+ operations within the same address space.
+
``workgroup-one-as`` Same as ``workgroup`` but only synchronizes with
other operations within the same address space.
@@ -1437,7 +1455,6 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
Returns a pair for the swapped registers. The first element of the return corresponds
to the swapped element of the first argument.
-
llvm.amdgcn.permlane32.swap Provide direct access to `v_permlane32_swap_b32` instruction on supported targets.
Swaps the values across lanes of first 2 operands. Rows 2 and 3 of the first operand are
swapped with rows 0 and 1 of the second operand (one row is 16 lanes).
@@ -1458,6 +1475,25 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
- `v_mov_b32 <dest> <old>`
- `v_mov_b32 <dest> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>`
+ :ref:`llvm.prefetch <int_prefetch>` Implemented on gfx1250, ignored on earlier targets.
+ First argument is flat, global, or constant address space pointer.
+ Any other address space is not supported.
+ On gfx125x generates flat_prefetch_b8 or global_prefetch_b8 and brings data to GL2.
+ Second argument is rw and currently ignored. Can be 0 or 1.
+ Third argument is locality, 0-3. Translates to memory scope:
+
+ * 0 - SCOPE_SYS
+ * 1 - SCOPE_DEV
+ * 2 - SCOPE_SE
+ * 3 - SCOPE_SE
+
+ Note that SCOPE_CU is not generated and not safe on an invalid address.
+ Fourth argument is cache type:
+
+ * 0 - Instruction cache, currently ignored and no code is generated.
+ * 1 - Data cache.
+
+ Instruction cache prefetches are unsafe on invalid address.
============================================== ==========================================================
.. TODO::
@@ -5114,9 +5150,7 @@ The fields used by CP for code objects before V3 also match those specified in
and must be 0,
>454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT
_SIZE
- 455 1 bit USES_CU_STORES GFX12.5: Whether the ``cu-stores`` target attribute is enabled.
- If 0, then all stores are ``SCOPE_SE`` or higher.
- 457:456 2 bits Reserved, must be 0.
+ 457:455 3 bits Reserved, must be 0.
458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-GFX9
Reserved, must be 0.
GFX10-GFX11
@@ -18254,8 +18288,6 @@ terminated by an ``.end_amdhsa_kernel`` directive.
GFX942)
``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
- ``.amdhsa_uses_cu_stores`` 0 GFX12.5 Controls USES_CU_STORES in
- :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in
Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
Specific