summaryrefslogtreecommitdiff
path: root/offload/plugins-nextgen/common/include
AgeCommit message (Collapse)Author
2025-11-13[Offload] Add device info for shared memory (#167817)Kevin Sala Penades
2025-11-06[OpenMP] Fix tests relying on the heap size variableJoseph Huber
Summary: I made that an unimplemented error, but forgot that it was used for this environment variable.
2025-11-06[Offload] Remove handling for device memory pool (#163629)Joseph Huber
Summary: This was a lot of code that was only used for upstream LLVM builds of AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so just use that. Simplifies code, can be added back if we start providing alternate forms but I don't think there's a single use-case that would justify it yet.
2025-11-04[Offload] Add device UID (#164391)Robert Imschweiler
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
2025-10-22[OpenMP] Adds omp_target_is_accessible routine (#138294)Nicole Aschenbrenner
Adds omp_target_is_accessible routine. Refactors common code from omp_target_is_present to work for both routines. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-10-09[OFFLOAD] Remove unused init_device_info plugin interface (#162650)Alex Duran
This was used for the old interop code. It's dead code after #143491
2025-09-26[Offload] Use Error for allocating/deallocating in plugins (#160811)Kevin Sala Penades
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-23[Offload][NFC] Avoid temporary string copies in InfoTreeNode (#159372)Alexey Sachkov
2025-09-20[Offload] Remove non-blocking allocation type (#159851)Joseph Huber
Summary: This was originally added in as a hack to work around CUDA's limitation on allocation. The `libc` implementation now isn't even used for CUDA so this code is never hit. Even if this case, this code never truly worked. A true solution would be to use CUDA's virtual memory API instead to allocate 2MiB slabs independenctly from the normal memory management done in the stream.
2025-09-19[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)Joseph Huber
Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary *can* be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.
2025-09-16[Offload] Copy loaded images into managed storage (#158748)Joseph Huber
Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.
2025-09-08[OpenMP] Move `__omp_rtl_data_environment' handling to OpenMP (#157182)Joseph Huber
Summary: This operation is done every time we load a binary, this behavior should be moved into OpenMP since it concerns an OpenMP specific data struct. This is a little messy, because ideally we should only be using public APIs, but more can be extracted later.
2025-09-01[OpenMP][Offload] Mark `SPMD_NO_LOOP` as a valid exec mode (#155990)Ross Brunton
This was added in #154105 , but was not added to the plugin interface's list of valid modes.
2025-08-28[OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime (#154105)Dominik Adamski
Kernels which are marked as SPMD-No-Loop should be launched with sufficient number of teams and threads to cover loop iteration space. No-Loop mode is described in RFC: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/
2025-08-22[Offload] Implement olMemFill (#154102)Callum Fare
Implement olMemFill to support filling device memory with arbitrary length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194)Ross Brunton
A simple info query for events that returns whether the event is complete or not.
2025-08-19[Offload] Add olCalculateOptimalOccupancy (#142950)Ross Brunton
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-15[Offload] Introduce dataFence plugin interface. (#153793)Abhinav Gaba
The purpose of this fence is to ensure that any `dataSubmit`s inserted into a queue before a `dataFence` finish before finish before any `dataSubmit`s inserted after it begin. This is a no-op for most queues, since they are in-order, and by design any operations inserted into them occur in order. But the interface is supposed to be functional for out-of-order queues. The addition of the interface means that any operations that rely on such ordering (like ATTACH map-type support in #149036) can invoke it, without worrying about whether the underlying queue is in-order or out-of-order. Once a plugin supports out-of-order queues, the plugin can implement this function, without requiring any change at the libomptarget level. --------- Co-authored-by: Alex Duran <alejandro.duran@intel.com>
2025-08-15[Offload] `olLaunchHostFunction` (#152482)Ross Brunton
Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.
2025-08-08[Offload] Make olLaunchKernel test thread safe (#149497)Ross Brunton
This sprinkles a few mutexes around the plugin interface so that the olLaunchKernel CTS test now passes when ran on multiple threads. Part of this also involved changing the interface for device synchronise so that it can optionally not free the underlying queue (which introduced a race condition in liboffload).
2025-08-07[Offload] Don't create events for empty queues (#152304)Ross Brunton
Add a device function to check if a device queue is empty. If liboffload tries to create an event for an empty queue, we create an "empty" event that is already complete. This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run quickly for empty queues.
2025-08-06[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size ↵hidekisaito
in omp_target_alloc (#151882) Enables AMD data center class GPUs to use memory manager memory pooling up to 3GB allocation by default, up from the "1 << 13" threshold that all plugin-nextgen devices use.
2025-08-06[OFFLOAD][OPENMP] 6.0 compatible interop interface (#143491)Alex Duran
The following patch introduces a new interop interface implementation with the following characteristics: * It supports the new 6.0 prefer_type specification * It supports both explicit objects (from interop constructs) and implicit objects (from variant calls). * Implements a per-thread reuse mechanism for implicit objects to reduce overheads. * It provides a plugin interface that allows selecting the supported interop types, and managing all the backend related interop operations (init, sync, ...). * It enables cooperation with the OpenMP runtime to allow progress on OpenMP synchronizations. * It cleanups some vendor/fr_id mismatchs from the current query routines. * It supports extension to define interop callbacks for library cleanup.
2025-07-25[Offload] Erase entries from JIT cache when program is destroyed (#148847)Ross Brunton
When `unloadBinary` is called, any entries in the JITEngine's cache for that binary will be cleared. This fixes a nasty issue with liboffload program handles. If two handles happen to have had the same address (after one was free'd, for example), the cache would be hit and return the wrong program.
2025-07-18[Offload] Allow "tagging" device info entries with offload keys (#147317)Ross Brunton
When generating the device info tree, nodes can be marked with an offload Device Info value. The nodes can also look up children based on this value.
2025-07-10[Offload] Allow querying the size of globals (#147698)Ross Brunton
The `GlobalTy` helper has been extended to make both the Size and Ptr be optional. Now `getGlobalMetadataFromDevice`/`Image` is able to write the size of the global to the struct, instead of just verifying it.
2025-07-08[Offload] Provide proper memory management for Images on host device (#146066)Ross Brunton
The `unloadBinaryImpl` method on the host plugin is now implemented properly (rather than just being a stub). When an image is unloaded, it is deallocated and the library associated with it is closed.
2025-07-02[Offload] Store kernel name in GenericKernelTy (#142799)Ross Brunton
GenericKernelTy has a pointer to the name that was used to create it. However, the name passed in as an argument may not outlive the kernel. Instead, GenericKernelTy now contains a std::string, and copies the name into there.
2025-06-25[Offload] Add an `unloadBinary` interface to PluginInterface (#143873)Ross Brunton
This allows removal of a specific Image from a Device, rather than requiring all image data to outlive the device they were created for. This is required for `ol_program_handle_t`s, which now specify the lifetime of the buffer used to create the program.
2025-06-20[Offload] Add type information to device info nodes (#144535)Ross Brunton
Rather than being "stringly typed", store values as a std::variant that can hold various types. This means that liboffload doesn't have to do any string parsing for integer/bool device info keys.
2025-06-13[Offload] Replace device info queue with a tree (#144050)Ross Brunton
Previously, device info was returned as a queue with each element having a "Level" field indicating its nesting level. This replaces this queue with a more traditional tree-like structure. This should not result in a change to the output of `llvm-offload-device-info`.
2025-06-10[PGO][Offload] Fix offload coverage mapping (#143490)Ethan Luis McDonough
This pull request fixes coverage mapping on GPU targets. - It adds an address space cast to the coverage mapping generation pass. - It reads the profiled function names from the ELF directly. Reading it from public globals was causing issues in cases where multiple device-code object files are linked together.
2025-06-03[Offload] Don't check in generated files (#141982)Callum Fare
Previously we decided to check in files that we generate with tablegen. The justification at the time was that it helped reviewers unfamiliar with `offload-tblgen` see the actual changes to the headers in PRs. After trying it for a while, it's ended up causing some headaches and is also not how tablegen is used elsewhere in LLVM. This changes our use of tablegen to be more conventional. Where possible, files are still clang-formatted, but this is no longer a hard requirement. Because `OffloadErrcodes.inc` is shared with libomptarget it now gets generated in a more appropriate place.
2025-05-20[Offload] Use new error code handling mechanism and lower-case messages ↵Ross Brunton
(#139275) [Offload] Use new error code handling mechanism This removes the old ErrorCode-less error method and requires every user to provide a concrete error code. All calls have been updated. In addition, for consistency with error messages elsewhere in LLVM, all messages have been made to start lower case.
2025-05-19[Offload] Add Error Codes to PluginInterface (#138258)Ross Brunton
A new ErrorCode enumeration is present in PluginInterface which can be used when returning an llvm::Error from offload and PluginInterface functions. This enum must be kept up to sync with liboffload's ol_errc_t enum, so both are automatically generated from liboffload's enum definition. Some error codes have also been shuffled around to allow for future work. Note that this patch only adds the machinery; actual error codes will be added in a future patch. ~~Depends on #137339 , please ignore first commit of this MR.~~ This has been merged.
2025-05-13[Offload] Remove unused field IsBareKernel. (#139815)Dhruva Chakrabarti
2025-04-23[Offload] Fix handling of 'bare' mode when environment missing (#136794)Joseph Huber
Summary: We treated the missing kernel environment as a unique mode, but it was kind of this random bool that was doing the same thing and it explicitly expects the kernel environment to be zero. It broke after the previous change since it used to default to SPMD and didn't handle zero in any of the other cases despite being used. This fixes that and queries for it without needing to consume an error.
2025-03-19[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268)Ethan Luis McDonough
This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on https://github.com/llvm/llvm-project/pull/93365. This PR makes the following changes: - Allows PGO flags to be supplied to GPU targets - Pulls version global from device - Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to allow the PGO version to be overridden
2025-02-11 [PGO][Offload] Profile profraw generation for GPU instrumentation #76587 ↵Ethan Luis McDonough
(#93365) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set
2025-02-11[Offload] Properly guard modifications to the RPC device array (#126790)Joseph Huber
Summary: If the user deallocates an RPC device this can sometimes fail if the RPC server is still running. This will happen if the modification happens while the server is still checking it. This patch adds a mutex to guard modifications to it.
2025-02-06[Offload] Make only a single thread handle the RPC server thread (#126067)Joseph Huber
Summary: This patch just changes the interface to make starting the thread multiple times permissable since it will only be done the first time. Note that this does not refcount it or anything, so it's onto the user to make sure that they don't shut down the thread before everyone is done using it. That is the case today because the shutDown portion is run by a single thread in the destructor phase. Another question is if we should make this thread truly global state, because currently it will be private to each plugin instance, so if you have an AMD and NVIDIA image there will be two, similarly if you have those inside of a shared library.
2025-02-02[offload] `gnu::format` with variadic template functions is Clang-only (#124406)Michał Górny
Use `gnu::format` attribute only when compiling with Clang, as using it against variadic template functions is a Clang extension and is not supported by GCC. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77958 Fixes #119069
2025-01-31[Offload][NFC] Fix typos discovered by codespell (#125119)Christian Clauss
https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`
2025-01-27[Offload][NFC] Make sure the thread is not running alreadyJoseph Huber
2025-01-24[Offload] Move RPC server handling to a dedicated thread (#112988)Joseph Huber
Summary: Handling the RPC server requires running through list of jobs that the device has requested to be done. Currently this is handled by the thread that does the waiting for the kernel to finish. However, this is not sound on NVIDIA architectures and only works for async launches in the OpenMP model that uses helper threads. However, we also don't want to have this thread doing work unnnecessarily. For this reason we track the execution of kernels and cause the thread to sleep via a condition variable (usually backed by some kind of futex or other intelligent sleeping mechanism) so that the thread will be idle while no kernels are running.
2025-01-23[Offload] Make MemoryManager threshold ENV var size_t type. (#124063)hidekisaito
2024-12-06[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042)Shilei Tian
2024-12-05Reland #118503: [Offload] Introduce offload-tblgen and initial new API ↵Callum Fare
implementation (#118614) Reland #118503. Added a fix for builds with `-DBUILD_SHARED_LIBS=ON` (see last commit). Otherwise the changes are identical. --- ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.
2024-12-03Revert "Reland of #108413: [Offload] Introduce offload-tblgen and initial ↵Jan Patrick Lehr
new API implementation" (#118541) Reverts llvm/llvm-project#118503 Broke bot https://lab.llvm.org/staging/#/builders/131/builds/9701/steps/5/logs/stdio
2024-12-03Reland of #108413: [Offload] Introduce offload-tblgen and initial new API ↵Callum Fare
implementation (#118503) This is another attempt to reland the changes from #108413 The previous two attempts introduced regressions and were reverted. This PR has been more thoroughly tested with various configurations so shouldn't cause any problems this time. If anyone is aware of any likely remaining problems then please let me know. The changes are identical other than the fixes contained in the last 5 commits. --- ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.