summaryrefslogtreecommitdiff
path: root/offload
AgeCommit message (Collapse)Author
2025-11-20[OFFLOAD] Add support for more fine grained debug messages control (#165416)Alex Duran
This PR introduces new debug macros that allow a more fined control of which debug message to output and introduce C++ stream style for debug messages. Changing existing messages (except a few that I changed for testing) will come in subsequent PRs. I also think that we should make debug enabling OpenMP agnostic but, for now, I prioritized maintaing the current libomptarget behavior for now, and we might need more changes further down the line as we we decouple libomptarget.
2025-11-19[Offload] Make the RPC thread sleep briefly when idle (#168596)Joseph Huber
Summary: We start this thread if the RPC client symbol is detected in the loaded binary. We should make this sleep if there's no work to avoid the thread running at high priority when the (scarecely used) RPC call is actually required. So, right now after 25 microseconds we will assume the server is inactive and begin sleeping. This resets once we do find work. AMD supports a more intelligent way to do this. HSA signals can wake a sleeping thread from the kernel, and signals can be sent from the GPU side. This would be nice to have and I'm planning on working with it in the future to make this infrastructure more usable with existing AMD workloads.
2025-11-19[Runtimes] Default build must use its own output dirs (#168266)Michael Kruse
Post-commit fix of #164794 reported at https://github.com/llvm/llvm-project/pull/164794#issuecomment-3536253493 `LLVM_LIBRARY_OUTPUT_INTDIR` and `LLVM_RUNTIME_OUTPUT_INTDIR` is used by `AddLLVM.cmake` as output directories. Unless we are in a bootstrapping-build, It must not point to directories found by `find_package(LLVM)` which may be read-only directories. MLIR for instance sets thesese variables to its own build output directory, so should the runtimes.
2025-11-18Revert "[OpenMP] Implement omp_get_uid_from_device() / ↵Robert Imschweiler
omp_get_device_from_uid()" (#168547) Reverts llvm/llvm-project#164392 due to fortran issues
2025-11-18[OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_uid() ↵Robert Imschweiler
(#164392) Use the implementation in libomptarget. If libomptarget is not available, always return the UID / device number of the host / the initial device.
2025-11-14[OpenMP][Flang] Emit default declare mappers implicitly for derived types ↵Akash Banerjee
(#140562) This patch adds support to emit default declare mappers for implicit mapping of derived types when not supplied by user. This especially helps tackle mapping of allocatables of derived types.
2025-11-13[Offload] Add device info for shared memory (#167817)Kevin Sala Penades
2025-11-13[offload] defer "---> olInit" trace message (#167893)Łukasz Plewa
Tracing requires liboffload to be initialized, so calling isTracingEnabled() before olInit always returns false. This caused the first trace log to look like: ``` -> OL_SUCCESS ``` instead of: ``` ---> olInit() -> OL_SUCCESS ``` This patch moves the pre-call trace print for olInit so it is emitted only after initialization. It would be possible to add extra logic to detect whether liboffload is already initialized and only postpone the first pre-call print, but this would add unnecessary complexity, especially since this is tablegen code. The difference would matter only in the unlikely case of a crash during a second olInit call. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-11-10[PGO][Offload] Fix missing names bug in GPU PGO (#166444)Ethan Luis McDonough
After #163011 was merged, the tests in [`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1) broke because the offload plugins were no longer able to find `__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm` visible to the host on GPU targets and reverses the changes made in f7e9968a5ba99521e6e51161f789f0cc1745193f.
2025-11-08[Offload] Remove unused KernelArgsTy instantiation (#167197)Kevin Sala Penades
2025-11-06[OpenMP] Fix tests relying on the heap size variableJoseph Huber
Summary: I made that an unimplemented error, but forgot that it was used for this environment variable.
2025-11-06[Offload] Remove handling for device memory pool (#163629)Joseph Huber
Summary: This was a lot of code that was only used for upstream LLVM builds of AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so just use that. Simplifies code, can be added back if we start providing alternate forms but I don't think there's a single use-case that would justify it yet.
2025-11-04[Offload] Add device UID (#164391)Robert Imschweiler
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
2025-10-31[MLIR][OpenMP] Fix and simplify bounds offset calculation for 1-D GEP ↵agozillon
offsets (#165486) Currently this is being calculated incorrectly and will result in incorrect index offsets in more complicated array slices. This PR tries to address it by refactoring and changing the calculation to be more correct.
2025-10-24[OFFLOAD] Remove weak from __kmpc_* calls and gather them in one header ↵Alex Duran
(#164613) Follow-up from #162652 --------- Co-authored-by: Michael Klemm <michael.klemm@amd.com>
2025-10-22[OpenMP] Adds omp_target_is_accessible routine (#138294)Nicole Aschenbrenner
Adds omp_target_is_accessible routine. Refactors common code from omp_target_is_present to work for both routines. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-10-22[NFC][Offload][OMPT] Improve readability of liboffload OMPT tests (#163181)Kaloyan Ignatov
- ompt_target_data_op_t, ompt_scope_endpoint_t and ompt_target_t are now printed as strings instead of just numbers to ease debugging - some missing clang-format clauses have been added
2025-10-21[NFC][OpenMP] Update a test that was failing on aarch64. (#164456)Abhinav Gaba
The failure was reported here: https://github.com/llvm/llvm-project/pull/164039#issuecomment-3425429556 The test was checking for the "bad" behavior so as to keep track of it, but there seem to be some issues with the pointer arithmetic specific to aarch64. The update for now is to not check for the "bad" behavior fully. We may need to debug further if similar issues are encountered eventually once the codegen has been fixed.
2025-10-21[Offload] Use `amd_signal_async_handler` for host function calls (#154131)Ross Brunton
2025-10-20[NFC][OpenMP] Add small class-member use_device_ptr/addr unit tests. (#164039)Abhinav Gaba
Two of the tests are currently asserting, and two are emitting unexpected results. The asserting tests will be fixed using the ATTACH-style codegen from #153683. The other two involve `use_device_addr` on byrefs, and need more follow-up codegen changes, that have been noted in a FIXME comment.
2025-10-17[OFFLOAD] Interop fixes for Windows (#162652)Alex Duran
On Windows, for a reason I don't fully understand boolean bits get extra padding (even when asking for packed structures) in the structures that messes the offsets between the compiler and the runtime. Also, "weak" works differently on Windows than Linux (i.e., the "local" routine has preference) which causes it to crash as we don't really have an alternate implementation of __kmpc_omp_wait_deps. Given this, it doesn't make sense to mark it as "weak" for Linux either.
2025-10-16[Offload] XFAIL pgo tests until resolved (#163722)Jan Patrick Lehr
While people look into it, xfail the tests.
2025-10-15[OpenMP] Disable a few more tests to get the bot green (#163614)Joseph Huber
2025-10-15[OpenMP] Add test to print interop identifiers (#161434)Jan Patrick Lehr
The test covers some of the identifier symbols in the interop runtime. This test, for now, is to guard against complete breakage, which was the result of the other `interop.c` test not being enabled on AMD and thus, not caught by our buildbots.
2025-10-14Revert "[Offload] Lazily initialize platforms in the Offloading API" (#163272)Joseph Huber
Summary: This causes issues with CUDA's teardown order when the init is separated from the total init scope.
2025-10-14[Offload] Lazily initialize platforms in the Offloading API (#163272)Joseph Huber
Summary: The Offloading library wraps around the underlying plugins. The problem is that we currently initialize all plugins we find, even if they are not needed for the program. This is very expensive for trivial uses, as fully heterogenous usage is quite rare. In practice this means that you will always pay a 200 ms penalty for having CUDA installed. This patch changes the behavior to provide accessors into the plugins and devices that allows them to be initialized lazily. We use a once_flag, this should properly take a fast-path check while still blocking on concurrent use. Making full use of this will require a way to filter platforms more specifically. I'm thinking of what this would look like as an API. I'm thinking that we either have an extra iterate function that takes a callback on the platform, or we just provide a helper to find all the devices that can run a given image. Maybe both? Fixes: https://github.com/llvm/llvm-project/issues/159636
2025-10-12[Offload] Silence warning via maybe unused (NFC) (#163076)Jan Patrick Lehr
2025-10-09[Flang][OpenMP] Defer descriptor mapping for assumed dummy argument types ↵agozillon
(#154349) This PR adds deferral of descriptor maps until they are necessary for assumed dummy argument types. The intent is to avoid a problem where a user can inadvertently map a temporary local descriptor to device without their knowledge and proceed to never unmap it. This temporary local descriptor remains lodged in OpenMP device memory and the next time another variable or descriptor residing in the same stack address is mapped we incur a runtime OpenMP map error as we try to remap the same address. This fix was discussed with the OpenMP committee and applies to OpenMP 5.2 and below, future versions of OpenMP can avoid this issue via the attach semantics added to the specification.
2025-10-09[OFFLOAD] Remove unused init_device_info plugin interface (#162650)Alex Duran
This was used for the old interop code. It's dead code after #143491
2025-10-06[Offload] Fix isValidBinary segfault on host platformJoseph Huber
Summary: Need to verify this actually has a device. We really need to rework this to point to a real impolementation, or streamline it to handle this automatically.
2025-10-06[Offload] Remove check on kernel argument sizes (#162121)Joseph Huber
Summary: This check is unnecessarily restrictive and currently incorrectly fires for any size less than eight bytes. Just remove it, we do sanity checks elsewhere and at some point need to trust the ABI.
2025-10-02[OFFLOAD] Restore interop functionality (#161429)Alex Duran
This implements two pieces to restore the interop functionality (that I broke) when the 6.0 interfaces were added: * A set of wrappers that support the old interfaces on top of the new ones * The same level of interop support for the CUDA amd AMD plugins
2025-10-02[Flang][OpenMP] Implicitly map nested allocatable components in derived ↵Akash Banerjee
types (#160766) This PR adds support for nested derived types and their mappers to the MapInfoFinalization pass. - Generalize MapInfoFinalization to add child maps for arbitrarily nested allocatables when a derived object is mapped via declare mapper. - Traverse HLFIR designates rooted at the target block arg and build full coordinate_of chains; append members with correct membersIndex. This fixes #156461.
2025-09-29[OpenMP] Mark problematic tests as XFAIL / UNSUPPORTED (#161267)Joseph Huber
Summary: Several of these tests have been failing for literal years. Ideally we make efforts to fix this, but keeping these broken has had serious consequences on our testing infrastructure where failures are the norm so almost all test failures are disregarded. I made a tracking issue for the ones that have been disabled. https://github.com/llvm/llvm-project/issues/161265
2025-09-29[Offload] Fix incorrect size used in llvm-offload-device-info toolJoseph Huber
Summary: This was not using the size previously queried and would fail when the implementation actually verified it.
2025-09-29[OpenMP][Offload] Support `PRIVATE | ATTACH` maps for ↵Abhinav Gaba
corresponding-pointer-initialization. (#160760) `PRIVATE | ATTACH` maps can be used to represent firstprivate pointers that should be initialized by doing doing the pointee's device address, if its lookup succeeds, or retain the original host pointee's address otherwise. With this, for a test like the following: ```f90 integer, pointer :: p(:) !$omp target map(p(1)) ... print*, p(1) !$omp end target ``` The codegen can look like: ```llvm ; maps for p: ; &p(1), &p(1), sizeof(p(1)), TO|FROM //(1) ; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), ATTACH //(2) ; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), PRIVATE|ATTACH|PARAM //(3) call... @__omp_outlined...(ptr %ref_ptr_of_p) ``` * `(1)` maps the pointee `p(1)`. * `(2)` attaches it to the (previously) mapped `ref_ptr(p)`, if present. It can be controlled via OpenMP 6.1's `attach(auto/always/never)` map-type modifiers. * `(3)` privatizes and initializes the local `ref_ptr(p)`, which gets passed in as the kernel argument `%ref_ptr_of_p`. Can be skipped if p is not referenced directly within the region. While similar mapping can be used for C/C++, it's more important/useful for Fortran as we can avoid creating another argument for passing the descriptor, and use that to initialize the private copy in the body of the kernel.
2025-09-29[OpenMP] Fix 'libc' configuration when building OpenMPJoseph Huber
Summary: Forgot to port this option's old handling from offload. It's not way easier since they're built in the same CMake project. Also delete the leftover directory that's not used anymore, don't know how that was still there.
2025-09-29[OpenMP][Flang] Fix no-loop test (#161162)Dominik Adamski
Fortran no-loop test is supported only for GPU.
2025-09-29[Offload][NFC] use unique ptrs for platforms (#160888)Piotr Balcer
Currently, devices store a raw pointer to back to their owning Platform. Platforms are stored directly inside of a vector. Modifying this vector risks invalidating all the platform pointers stored in devices. This patch allocates platforms individually, and changes devices to store a reference to its platform instead of a pointer. This is safe, because platforms are guaranteed to outlive the devices they contain.
2025-09-26[Offload] Use Error for allocating/deallocating in plugins (#160811)Kevin Sala Penades
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-26[Flang][OpenMP] Enable no-loop kernels (#155818)Dominik Adamski
Enable the generation of no-loop kernels for Fortran OpenMP code. target teams distribute parallel do pragmas can be promoted to no-loop kernels if the user adds the -fopenmp-assume-teams-oversubscription and -fopenmp-assume-threads-oversubscription flags. If the OpenMP kernel contains reduction or num_teams clauses, it is not promoted to no-loop mode. The global OpenMP device RTL oversubscription flags no longer force no-loop code generation for Fortran.
2025-09-25Revert "[Flang][OpenMP] Implicitly map nested allocatable components in ↵Akash Banerjee
derived types" (#160759) Reverts llvm/llvm-project#160116
2025-09-24[Flang][OpenMP] Implicitly map nested allocatable components in derived ↵Akash Banerjee
types (#160116) This PR adds support for nested derived types and their mappers to the MapInfoFinalization pass. - Generalize MapInfoFinalization to add child maps for arbitrarily nested allocatables when a derived object is mapped via declare mapper. - Traverse HLFIR designates rooted at the target block arg and build full coordinate_of chains; append members with correct membersIndex. This fixes #156461.
2025-09-24[Offload] Add olGetMemInfo with platform-less API (#159581)Ross Brunton
2025-09-24[Offload] Print Image location rather than casting it (#160309)Ross Brunton
This squishes a warning where the runtime tries to bind a StringRef to a `%p`.
2025-09-23[Offload][NFC] Avoid temporary string copies in InfoTreeNode (#159372)Alexey Sachkov
2025-09-23[Offload] Don't add the unsupported host plugin to the list (#159642)Joseph Huber
Summary: The host plugin is basically OpenMP specific and doesn't work very well. Previously we were skipping over it in the list instead of just not adding it at all.
2025-09-23[Offload] Re-allocate overlapping memory (#159567)Ross Brunton
If olMemAlloc happens to allocate memory that was already allocated elsewhere (possibly by another device on another platform), it is now thrown away and a new allocation generated. A new `AllocBases` vector is now available, which is an ordered list of allocation start addresses.
2025-09-22[Remarks] Restructure bitstream remarks to be fully standalone (#156715)Tobias Stadler
Currently there are two serialization modes for bitstream Remarks: standalone and separate. The separate mode splits remark metadata (e.g. the string table) from actual remark data. The metadata is written into the object file by the AsmPrinter, while the remark data is stored in a separate remarks file. This means we can't use bitstream remarks with tools like opt that don't generate an object file. Also, it is confusing to post-process bitstream remarks files, because only the standalone files can be read by llvm-remarkutil. We always need to use dsymutil to convert the separate files to standalone files, which only works for MachO. It is not possible for clang/opt to directly emit bitstream remark files in standalone mode, because the string table can only be serialized after all remarks were emitted. Therefore, this change completely removes the separate serialization mode. Instead, the remark string table is now always written to the end of the remarks file. This requires us to tell the serializer when to finalize remark serialization. This automatically happens when the serializer goes out of scope. However, often the remark file goes out of scope before the serializer is destroyed. To diagnose this, I have added an assert to alert users that they need to explicitly call finalizeLLVMOptimizationRemarks. This change paves the way for further improvements to the remark infrastructure, including more tooling (e.g. #159784), size optimizations for bitstream remarks, and more. Pull Request: https://github.com/llvm/llvm-project/pull/156715
2025-09-20[Offload] Remove non-blocking allocation type (#159851)Joseph Huber
Summary: This was originally added in as a hack to work around CUDA's limitation on allocation. The `libc` implementation now isn't even used for CUDA so this code is never hit. Even if this case, this code never truly worked. A true solution would be to use CUDA's virtual memory API instead to allocate 2MiB slabs independenctly from the normal memory management done in the stream.