llvm-project.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2025-11-13	[Offload] Add device info for shared memory (#167817)	Kevin Sala Penades

2025-11-04	[Offload] Add device UID (#164391)	Robert Imschweiler
	Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
2025-10-14	Revert "[Offload] Lazily initialize platforms in the Offloading API" (#163272)	Joseph Huber
	Summary: This causes issues with CUDA's teardown order when the init is separated from the total init scope.
2025-10-14	[Offload] Lazily initialize platforms in the Offloading API (#163272)	Joseph Huber
	Summary: The Offloading library wraps around the underlying plugins. The problem is that we currently initialize all plugins we find, even if they are not needed for the program. This is very expensive for trivial uses, as fully heterogenous usage is quite rare. In practice this means that you will always pay a 200 ms penalty for having CUDA installed. This patch changes the behavior to provide accessors into the plugins and devices that allows them to be initialized lazily. We use a once_flag, this should properly take a fast-path check while still blocking on concurrent use. Making full use of this will require a way to filter platforms more specifically. I'm thinking of what this would look like as an API. I'm thinking that we either have an extra iterate function that takes a callback on the platform, or we just provide a helper to find all the devices that can run a given image. Maybe both? Fixes: https://github.com/llvm/llvm-project/issues/159636
2025-10-06	[Offload] Fix isValidBinary segfault on host platform	Joseph Huber
	Summary: Need to verify this actually has a device. We really need to rework this to point to a real impolementation, or streamline it to handle this automatically.
2025-09-29	[Offload][NFC] use unique ptrs for platforms (#160888)	Piotr Balcer
	Currently, devices store a raw pointer to back to their owning Platform. Platforms are stored directly inside of a vector. Modifying this vector risks invalidating all the platform pointers stored in devices. This patch allocates platforms individually, and changes devices to store a reference to its platform instead of a pointer. This is safe, because platforms are guaranteed to outlive the devices they contain.
2025-09-24	[Offload] Add olGetMemInfo with platform-less API (#159581)	Ross Brunton

2025-09-23	[Offload] Don't add the unsupported host plugin to the list (#159642)	Joseph Huber
	Summary: The host plugin is basically OpenMP specific and doesn't work very well. Previously we were skipping over it in the list instead of just not adding it at all.
2025-09-23	[Offload] Re-allocate overlapping memory (#159567)	Ross Brunton
	If olMemAlloc happens to allocate memory that was already allocated elsewhere (possibly by another device on another platform), it is now thrown away and a new allocation generated. A new `AllocBases` vector is now available, which is an ordered list of allocation start addresses.
2025-09-19	[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)	Joseph Huber
	Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary can be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.
2025-09-16	[Offload] Copy loaded images into managed storage (#158748)	Joseph Huber
	Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.
2025-08-29	[Offload] Add `OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION]` (#155823)	Ross Brunton
	This is the total number of work items that the device supports (the equivalent work group properties are for only a single work group).
2025-08-29	[Offload] Improve `olDestroyQueue` logic (#153041)	Ross Brunton
	Previously, `olDestroyQueue` would not actually destroy the queue, instead leaving it for the device to clean up when it was destroyed. Now, the queue is either released immediately if it is complete or put into a list of "pending" queues if it is not. Whenever we create a new queue, we check this list to see if any are now completed. If there are any we release their resources and use them instead of pulling from the pool. This prevents long running programs that create and drop many queues without syncing them from leaking memory all over the place.
2025-08-28	[Offload] Add PRODUCT_NAME device info (#155632)	Ross Brunton
	On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-22	[Offload] Implement olMemFill (#154102)	Callum Fare
	Implement olMemFill to support filling device memory with arbitrary length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22	[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194)	Ross Brunton
	A simple info query for events that returns whether the event is complete or not.
2025-08-21	[Offload] Fix `OL_DEVICE_INFO_MAX_MEM_ALLOC_SIZE` on AMD (#154521)	Ross Brunton
	This wasn't handled with the normal info API, so needs special handling.
2025-08-20	[Offload] Guard olMemAlloc/Free with a mutex (#153786)	Ross Brunton
	Both these functions update an `AllocInfoMap` structure in the context, however they did not use any locks, causing random failures in threaded code. Now they use a mutex.
2025-08-19	[Offload] Add olCalculateOptimalOccupancy (#142950)	Ross Brunton
	This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19	[Offload] Define additional device info properties (#152533)	Rafal Bielski
	Add the following properties in Offload device info: * VENDOR_ID * NUM_COMPUTE_UNITS * [SINGLE\|DOUBLE\|HALF]_FP_CONFIG * NATIVE_VECTOR_WIDTH_[CHAR\|SHORT\|INT\|LONG\|FLOAT\|DOUBLE\|HALF] * MAX_CLOCK_FREQUENCY * MEMORY_CLOCK_RATE * ADDRESS_BITS * MAX_MEM_ALLOC_SIZE * GLOBAL_MEM_SIZE Add a bitfield option to enumerators, allowing the values to be bit-shifted instead of incremented. Generate the per-type enums using `foreach` to reduce code duplication. Use macros in unit test definitions to reduce code duplication.
2025-08-15	[Offload] `olLaunchHostFunction` (#152482)	Ross Brunton
	Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.
2025-08-13	[Offload] Store globals in the program's global list rather than the kernel ↵	Ross Brunton
	list (#153441)
2025-08-08	[Offload] Make olLaunchKernel test thread safe (#149497)	Ross Brunton
	This sprinkles a few mutexes around the plugin interface so that the olLaunchKernel CTS test now passes when ran on multiple threads. Part of this also involved changing the interface for device synchronise so that it can optionally not free the underlying queue (which introduced a race condition in liboffload).
2025-08-08	[Offload] OL_QUEUE_INFO_EMPTY (#152473)	Ross Brunton
	Add a queue query that (if possible) reports whether the queue is empty
2025-08-07	[Offload] Don't create events for empty queues (#152304)	Ross Brunton
	Add a device function to check if a device queue is empty. If liboffload tries to create an event for an empty queue, we create an "empty" event that is already complete. This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run quickly for empty queues.
2025-08-04	[Offload] Rework `MAX_WORK_GROUP_SIZE` (#151926)	Ross Brunton
	`MAX_WORK_GROUP_SIZE` now represents the maximum total number of work groups the device can allocate, rather than the maximum per dimension. `MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old behaviour.
2025-07-25	[Offload] Refactor device information queries to use new tagging (#147318)	Ross Brunton
	Instead using strings to look up device information (which is brittle and slow), use the new tags that the plugins specify when building the nodes.
2025-07-24	[Offload] Replace "EventOut" parameters with `olCreateEvent` (#150217)	Ross Brunton
	Rather than having every "enqueue"-type function have an output pointer specifically for an output event, just provide an `olCreateEvent` entrypoint which pushes an event to the queue. For example, replace: ```cpp olMemcpy(Queue, ..., EventOut); ``` with ```cpp olMemcpy(Queue, ...); olCreateEvent(Queue, EventOut); ```
2025-07-23	[Offload] Add olWaitEvents (#150036)	Ross Brunton
	This function causes a queue to wait until all the provided events have completed before running any future scheduled work.
2025-07-23	[Offload] Rename olWaitEvent/Queue to olSyncEvent/Queue (#150023)	Ross Brunton
	This more closely matches the nomenclature used by CUDA, AMDGPU and the plugin interface.
2025-07-16	[Offload] Cache symbols in program (#148209)	Ross Brunton
	When creating a new symbol, check that it already exists. If it does, return that pointer rather than building a new symbol structure.
2025-07-14	[Offload] Check plugins aren't already deinitialized when tearing down (#148642)	Callum Fare
	This is a hotfix for #148615 - it fixes the issue for me locally. I think a broader issue is that in the test environment we're calling olShutDown from a global destructor in the test binaries. We should do something more controlled, either calling olInit/olShutDown in every test, or move those to a GTest global environment. I didn't do that originally because it looked like it needed changes to LLVM's GTest wrapper.
2025-07-11	[Offload] Add global variable address/size queries (#147972)	Ross Brunton
	Add two new symbol info types for getting the bounds of a global variable. As well as a number of tests for reading/writing to it.
2025-07-11	[Offload] Add `olGetSymbolInfo[Size]` (#147962)	Ross Brunton
	This mirrors the similar functions for other handles. The only implemented info at the moment is the symbol's kind.
2025-07-11	[Offload] Replace `GetKernel` with `GetSymbol` with global support (#148221)	Ross Brunton
	`olGetKernel` has been replaced by `olGetSymbol` which accepts a `Kind` parameter. As well as loading information about kernels, it can now also load information about global variables.
2025-07-10	[Offload] Change `ol_kernel_handle_t` -> `ol_symbol_handle_t` (#147943)	Ross Brunton
	In the future, we want `ol_symbol_handle_t` to represent both kernels and global variables The first step in this process is a rename and promotion to a "typed handle".
2025-07-09	[Offload] Implement olGetQueueInfo, olGetEventInfo (#142947)	Callum Fare
	Add info queries for queues and events. `olGetQueueInfo` only supports getting the associated device. We were already tracking this so we can implement this for free. We will likely add other queries to it in the future (whether the queue is empty, what flags it was created with, etc) `olGetEventInfo` only supports getting the associated queue. This is another thing we were already storing in the handle. We'll be able to add other queries in future (the event type, status, etc)
2025-07-02	[Offload] Add `MAX_WORK_GROUP_SIZE` device info query (#143718)	Ross Brunton
	This adds a new device info query for the maximum workgroup/block size for each dimension.
2025-06-30	[Offload] Refactor device/platform info queries (#146345)	Ross Brunton
	This makes several small changes to how the platform and device info queries are handled: * ReturnHelper has been replaced with InfoWriter which is more explicit in how it is invoked. * InfoWriter consumes `llvm::Expected` rather than values directly, and will early exit if it returns an error. * As a result of the above, `GetInfoString` now correctly returns errors rather than empty strings. * The host device now has its own dedicated "getInfo" function rather than being checked in multiple places.
2025-06-30	[Offload] Implement `olShutDown` (#144055)	Ross Brunton
	`olShutDown` was not properly calling deinit on the platforms, resulting in random segfaults on AMD devices. As part of this, `olInit` and `olShutDown` now alloc and free the offload context rather than it being static. This allows `olShutDown` to be called within a destructor of a static object (like the tests do) without having to worry about destructor ordering.
2025-06-27	[Offload] Store device info tree in device handle (#145913)	Ross Brunton
	Rather than creating a new device info tree for each call to `olGetDeviceInfo`, we instead do it on device initialisation. As well as improving performance, this fixes a few lifetime issues with returned strings. This does unfortunately mean that device information is immutable, but hopefully that shouldn't be a problem for any queries we want to implement. This also meant allowing offload initialization to fail, which it can now do.
2025-06-25	[Offload] Add an `unloadBinary` interface to PluginInterface (#143873)	Ross Brunton
	This allows removal of a specific Image from a Device, rather than requiring all image data to outlive the device they were created for. This is required for `ol_program_handle_t`s, which now specify the lifetime of the buffer used to create the program.
2025-06-24	[Offload] Properly report errors when jit compiling (#145498)	Ross Brunton
	Previously, if a binary failed to load due to failures when jit compiling, the function would return success with nullptr. Now it returns a new plugin error, `COMPILE_FAILURE`.
2025-06-20	[Offload] Add type information to device info nodes (#144535)	Ross Brunton
	Rather than being "stringly typed", store values as a std::variant that can hold various types. This means that liboffload doesn't have to do any string parsing for integer/bool device info keys.
2025-06-20	[Offload] Check for initialization (#144370)	Ross Brunton
	All entry points (except olInit) now check that offload has been initialized. If not, a new `OL_ERRC_UNINITIALIZED` error is returned.
2025-06-19	[Offload] Move (most) global state to an `OffloadContext` struct (#144494)	Ross Brunton
	Rather than having a number of static local variables, we now use a single `OffloadContext` struct to store global state. This is initialised by `olInit`, but is never deleted (de-initialization of Offload isn't yet implemented). The error reporting mechanism has not been moved to the struct, since that's going to cause issues with teardown (error messages must outlive liboffload).
2025-06-13	[Offload] Replace device info queue with a tree (#144050)	Ross Brunton
	Previously, device info was returned as a queue with each element having a "Level" field indicating its nesting level. This replaces this queue with a more traditional tree-like structure. This should not result in a change to the output of `llvm-offload-device-info`.
2025-06-12	[Offload] Add `ol_dimensions_t` and convert ranges from size_t -> uint32_t ↵	Ross Brunton
	(#143901) This is a three element x, y, z size_t vector that can be used any place where a 3D vector is required. This ensures that all vectors across liboffload are the same and don't require any resizing/reordering dances.
2025-06-06	[Offload] Make olMemcpy src parameter const (#143161)	Callum Fare

2025-06-03	[Offload] Don't check in generated files (#141982)	Callum Fare
	Previously we decided to check in files that we generate with tablegen. The justification at the time was that it helped reviewers unfamiliar with `offload-tblgen` see the actual changes to the headers in PRs. After trying it for a while, it's ended up causing some headaches and is also not how tablegen is used elsewhere in LLVM. This changes our use of tablegen to be more conventional. Where possible, files are still clang-formatted, but this is no longer a hard requirement. Because `OffloadErrcodes.inc` is shared with libomptarget it now gets generated in a more appropriate place.