llvm-project.git/offload/liboffload/src/OffloadImpl.cpp, branch users/mingmingl-llvm/samplefdo-profile-format

[Offload] Add `OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION]` (#155823)

2025-08-29T08:39:18+00:00

This is the total number of work items that the device supports (the
equivalent work group properties are for only a single work group).

[Offload] Improve `olDestroyQueue` logic (#153041)

2025-08-29T08:39:00+00:00

Previously, `olDestroyQueue` would not actually destroy the queue,
instead leaving it for the device to clean up when it was destroyed.
Now, the queue is either released immediately if it is complete or put
into a list of "pending" queues if it is not. Whenever we create a new
queue, we check this list to see if any are now completed. If there are
any we release their resources and use them instead of pulling from
the pool.

This prevents long running programs that create and drop many queues
without syncing them from leaking memory all over the place.

[Offload] Add PRODUCT_NAME device info (#155632)

2025-08-28T14:16:17+00:00

On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.

[Offload] Implement olMemFill (#154102)

2025-08-22T13:31:16+00:00

Implement olMemFill to support filling device memory with arbitrary
length patterns. AMDGPU support will be added in a follow-up PR.

[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194)

2025-08-22T12:40:31+00:00

A simple info query for events that returns whether the event is
complete or not.

[Offload] Fix `OL_DEVICE_INFO_MAX_MEM_ALLOC_SIZE` on AMD (#154521)

2025-08-21T08:37:58+00:00

This wasn't handled with the normal info API, so needs special handling.

[Offload] Guard olMemAlloc/Free with a mutex (#153786)

2025-08-20T12:23:57+00:00

Both these functions update an `AllocInfoMap` structure in the context,
however they did not use any locks, causing random failures in threaded
code. Now they use a mutex.

[Offload] Add olCalculateOptimalOccupancy (#142950)

2025-08-19T14:16:47+00:00

This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is
currently
only implemented on Cuda; AMDGPU and Host return unsupported.

---------

Co-authored-by: Callum Fare

[Offload] Define additional device info properties (#152533)

2025-08-19T12:02:01+00:00

Add the following properties in Offload device info:
* VENDOR_ID
* NUM_COMPUTE_UNITS
* [SINGLE|DOUBLE|HALF]_FP_CONFIG
* NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
* MAX_CLOCK_FREQUENCY
* MEMORY_CLOCK_RATE
* ADDRESS_BITS
* MAX_MEM_ALLOC_SIZE
* GLOBAL_MEM_SIZE

Add a bitfield option to enumerators, allowing the values to be
bit-shifted instead of incremented. Generate the per-type enums using
`foreach` to reduce code duplication.

Use macros in unit test definitions to reduce code duplication.

[Offload] `olLaunchHostFunction` (#152482)

2025-08-15T08:39:48+00:00

Add an `olLaunchHostFunction` method that allows enqueueing host work
to the stream.