llvm-project.git/offload/plugins-nextgen/amdgpu/src/rtl.cpp, branch users/mingmingl-llvm/samplefdo-profile-format

[Offload] Add `OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION]` (#155823)

2025-08-29T08:39:18+00:00

This is the total number of work items that the device supports (the
equivalent work group properties are for only a single work group).

[Offload] Add PRODUCT_NAME device info (#155632)

2025-08-28T14:16:17+00:00

On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.

[Offload] Full AMD support for olMemFill (#154958)

2025-08-26T10:49:12+00:00

[Offload] Implement olMemFill (#154102)

2025-08-22T13:31:16+00:00

Implement olMemFill to support filling device memory with arbitrary
length patterns. AMDGPU support will be added in a follow-up PR.

[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194)

2025-08-22T12:40:31+00:00

A simple info query for events that returns whether the event is
complete or not.

[Offload] Add olCalculateOptimalOccupancy (#142950)

2025-08-19T14:16:47+00:00

This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is
currently
only implemented on Cuda; AMDGPU and Host return unsupported.

---------

Co-authored-by: Callum Fare

[Offload] Define additional device info properties (#152533)

2025-08-19T12:02:01+00:00

Add the following properties in Offload device info:
* VENDOR_ID
* NUM_COMPUTE_UNITS
* [SINGLE|DOUBLE|HALF]_FP_CONFIG
* NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
* MAX_CLOCK_FREQUENCY
* MEMORY_CLOCK_RATE
* ADDRESS_BITS
* MAX_MEM_ALLOC_SIZE
* GLOBAL_MEM_SIZE

Add a bitfield option to enumerators, allowing the values to be
bit-shifted instead of incremented. Generate the per-type enums using
`foreach` to reduce code duplication.

Use macros in unit test definitions to reduce code duplication.

[Offload] Introduce dataFence plugin interface. (#153793)

2025-08-15T18:49:35+00:00

The purpose of this fence is to ensure that any `dataSubmit`s inserted
into a queue before a `dataFence` finish before finish before any
`dataSubmit`s
inserted after it begin.

This is a no-op for most queues, since they are in-order, and by design
any operations inserted into them occur in order.

But the interface is supposed to be functional for out-of-order queues.

The addition of the interface means that any operations that rely on
such ordering (like ATTACH map-type support in #149036) can invoke it,
without worrying about whether the underlying queue is in-order or
out-of-order.

Once a plugin supports out-of-order queues, the plugin can implement
this function, without requiring any change at the libomptarget level.

---------

Co-authored-by: Alex Duran

[Offload] `olLaunchHostFunction` (#152482)

2025-08-15T08:39:48+00:00

Add an `olLaunchHostFunction` method that allows enqueueing host work
to the stream.

[Offload] Make olLaunchKernel test thread safe (#149497)

2025-08-08T09:57:04+00:00

This sprinkles a few mutexes around the plugin interface so that the
olLaunchKernel CTS test now passes when ran on multiple threads.

Part of this also involved changing the interface for device synchronise
so that it can optionally not free the underlying queue (which
introduced a race condition in liboffload).