summaryrefslogtreecommitdiff
path: root/offload/src
AgeCommit message (Collapse)Author
2025-02-10[Offload][NFC] Rename `src/` -> `libomptarget/` (#126573)Joseph Huber
Summary: The name `src` is confusing when combined with the plugins and the newly added `liboffload`.
2025-02-06[Offload] Unify offloading entries into a single section (#125731)Joseph Huber
Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.
2025-02-03[OpenMP] Guard OpenMP specific entry handlingJoseph Huber
2025-01-31[Offload][NFC] Fix typos discovered by codespell (#125119)Christian Clauss
https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`
2025-01-28[Offload] Rework offloading entry type to be more generic (#124018)Joseph Huber
Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.
2025-01-21[Offload][NFC] Factor out and rename the `__tgt_offload_entry` struct (#123785)Joseph Huber
Summary: This patch is an NFC renaming to make using the offloading entry type more portable between other targets. Right now this is just moving its definition to LLVM so others can use it. Future work will rework the struct layout.
2024-12-06[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042)Shilei Tian
2024-09-23[NFC][offload][OMPT] Cleanup of OMPT internals (#109005)Michael Halkenhäuser
Removed `OmptCallbacks.cpp` since relevant contents were duplicated. Because of the static linking there should be no change in functionality.
2024-09-05[Offload][NFC] Reorganize `utils::` and make Device/Host/Shared clearer ↵Johannes Doerfert
(#100280) We had three `utils::` namespaces, all with different "meaning" (host, device, hsa_utils). We should, when we can, keep "include/Shared" accessible from host and device, thus RefCountTy has been moved to a separate header. `hsa_utils` was introduced to make `utils::` less overloaded. And common functionality was de-duplicated, e.g., `utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type punning now checks for the size of the result to make sure it matches the source type. No functional change was intended.
2024-08-13[Offload] Ensure to load images when the device is used (#103002)Johannes Doerfert
When we use the device, e.g., with an API that interacts with it, we need to ensure the image is loaded and the constructors are executed. Two tests are included to verify we 1) load images and run constructors when needed, and 2) we do so lazily only if the device is actually used. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-08-12[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549)Johannes Doerfert
Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm*` functions to orchestrate the kernel launch rather than `cuda*` functions. These `llvm*` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void *llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void *DevicePtr, int DeviceNum); } __global__ void square(int *A) { *A = 42; } int main(int argc, char **argv) { int DevNo = 0; int *Ptr = reinterpret_cast<int *>(llvm_omp_target_alloc_shared(4, DevNo)); *Ptr = 7; printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, *Ptr 7 Ptr 0x155448ac8000, *Ptr 42 ```
2024-08-01[Offload][OpenMP] Prettify error messages by "demangling" the kernel name ↵Johannes Doerfert
(#101400) The kernel names for OpenMP are manually mangled and not ideal when we report something to the user. We demangle them now, providing the function and line number of the target region, together with the actual kernel name.
2024-07-30[Offload] Implement double free (and other allocation error) reporting (#100261)Johannes Doerfert
As a first step towards a GPU sanitizer we now can track allocations and deallocations in order to report double frees, and other problems during deallocation.
2024-07-24[Offload] Repair and rename `llvm-omp-device-info` (to `-offload-`) (#100309)Johannes Doerfert
The `llvm-omp-device-info` tool is very handy, but broke due to the lazy evaluation of devices. This repairs the functionality and adds a test. The tool is also renamed into `llvm-offload-device-info` as `-omp-` is going away.
2024-07-17[Offload][OMPT] Add callbacks for (dis)associate_ptr (#99046)Jan Patrick Lehr
This adds the OMPT callbacks for the API functions disassociate_ptr and associate_ptr.
2024-06-06[Libomptarget] Rework device initialization and image registration (#93844)Joseph Huber
Summary: Currently, we register images into a linear table according to the logical OpenMP device identifier. We then initialize all of these images as one block. This logic requires that images are compatible with *all* devices instead of just the one that it can run on. This prevents us from running on systems with heterogeneous devices (i.e. image 1 runs on device 0 image 0 runs on device 1). This patch reworks the logic by instead making the compatibility check a per-device query. We then scan every device to see if it's compatible and do it as they come.
2024-05-28[Offload][Fix] Fix lazy initialization with multiple imagesusers/daltenty/testJoseph Huber
Summary: There was a bug here where we would initialize the plugin multiple times when there were multiple images. Fix it by putting the `is_initliaized` check later.
2024-05-23[Offload] Only initialize a plugin if it is needed (#92765)Joseph Huber
Summary: Initializing the plugins requires initializing the runtime like CUDA or HSA. This has a considerable overhead on most platforms, so we should only actually initialize a plugin if it is needed by any image that is loaded.
2024-05-17[Offload][NFC] Remove 'libomptarget' message helpers (#92581)Joseph Huber
Summary: This isn't `libomptarget` anymore, and these messages were always unnecessary because no other project uses these prefixed messages. The effect of this is that no longer will the logs have `LIBOMPTARGET --` in front of everything. We have a message stating when we start building the offload project so it'll still be trivial to find.
2024-05-17[Offload][NFC] Remove all trailing whitespace from offload/ (#92578)Joseph Huber
Summary: This patch cleans up the training whitespace in a bunch of tests and CMake files. Most just in preparation for other cleanups.
2024-05-17[Offload][NFC] Remove header license in CMake files (#92544)Joseph Huber
Summary: No other project has these in the CMake itself, and they're wildly inconsistent even within the project. These don't really add anything so I think they should be removed.
2024-05-16[Libomptarget] Pass '-Werror=global-constructors' to the libomptarget build ↵Joseph Huber
(#88531) Summary: A runtime library should not have global constructors. Everything is now expected to go through the init methods. This patch ensures that global constructors will not accidentally be introduced.
2024-05-16[Libomptarget] Remove requires information from plugin (#80345)Joseph Huber
Summary: Currently this is only used for the zero-copy handling. However, this can easily be moved into `libomptarget` so that we do not need to bother setting the requires flags in the plugin. The advantage here is that we no longer need to do this for every device redundently. Additionally, these requires flags are specifically OpenMP related, so they should live in `libomptarget`.
2024-05-15[Offload][NFC] Fix warning messages in runtimeJoseph Huber
Summary: These are lots of random warnings due to inconsistent initialization or signedness.
2024-05-13[Offload][NFC] Remove unused compiler definition from CMakeJoseph Huber
2024-05-09[Reland][Libomptarget] Statically link all plugin runtimes (#87009)Joseph Huber
This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868
2024-05-09Revert "[Libomptarget] Statically link all plugin runtimes (#87009)"Joseph Huber
Caused failures on build-bots, reverting to investigate. This reverts commit 80f9e814ec896fdc57ee84afad8ac4cb1f8e4627.
2024-05-09[Libomptarget] Statically link all plugin runtimes (#87009)Joseph Huber
This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868
2024-05-07[Offload] Fix dataDelete op for TARGET_ALLOC_HOST memory type (#91134)Jhonatan Cléto
Summary: The `GenericDeviceTy::dataDelete` method doesn't verify the `TargetAllocTy` of the of the device pointer. Because of this, it can use the `MemoryManager` to free the ptr. However, the `TARGET_ALLOC_HOST` and `TARGET_ALLOC_SHARED` types are not allocated using the `MemoryManager` in the `GenericDeviceTy::dataAlloc` method. Since the `MemoryManager` uses the `DeviceAllocatorTy::free` operation without specifying the type of the ptr, some plugins may use incorrect operations to free ptrs of certain types. In particular, this bug causes the CUDA plugin to use the `cuMemFree` operation on ptrs of type `TARGET_ALLOC_HOST`, resulting in an unchecked error, as shown in the output snippet of the test `offload/test/api/omp_host_pinned_memory_alloc.c`: ``` omptarget --> Notifying about an unmapping: HstPtr=0x00007c6114200000 omptarget --> Call to llvm_omp_target_free_host for device 0 and address 0x00007c6114200000 omptarget --> Call to omp_get_num_devices returning 1 omptarget --> Call to omp_get_initial_device returning 1 PluginInterface --> MemoryManagerTy::free: target memory 0x00007c6114200000. PluginInterface --> Cannot find its node. Delete it on device directly. TARGET CUDA RTL --> Failure to free memory: Error in cuMemFree[Host]: invalid argument omptarget --> omp_target_free deallocated device ptr ``` This patch fixes this by adding the check of the device pointer type before calling the appropriate operation for each type.
2024-04-29[Libomptarget] Rework interface for enabling plugins (#86875)Joseph Huber
Summary: Previously we would build all of the plugins by default and then only load some using the `LIBOMPTARGET_PLUGINS_TO_LOAD` variable. This patch renamed this to `LIBOMPTARGET_PLUGINS_TO_BUILD` and changes whether or not it will include the plugin in CMake. Additionally this patch creates a new `Targets.def` file that allows us to enumerate all of the enabled plugins. This is somewhat different from the old method, and it's done this way for future use that will need to be shared. This follows the same method that LLVM uses for its targets, however it does require adding an extra include path. Depends on https://github.com/llvm/llvm-project/pull/86868
2024-04-26[Libomptarget] Rename `libomptarget.rtl.x86_64` to `libomptarget.rtl.host` ↵Joseph Huber
(#86868) Summary: All of these are functionally the same code, just compiled for separate architectures. We currently do not expose a way to execute these on separate architectures as the host plugin works using `dlopen` into the same process, and therefore cannot possibly be an incompatible architecture. (This could work with a remote plugin, but this is not supported yet). This patch simply renames all of these to the same thing so we no longer need to check around for its varying definitions.
2024-04-22[Offload] Move `/openmp/libomptarget` to `/offload` (#75125)Johannes Doerfert
In a nutshell, this moves our libomptarget code to populate the offload subproject. With this commit, users need to enable the new LLVM/Offload subproject as a runtime in their cmake configuration. No further changes are expected for downstream code. Tests and other components still depend on OpenMP and have also not been renamed. The results below are for a build in which OpenMP and Offload are enabled runtimes. In addition to the pure `git mv`, we needed to adjust some CMake files. Nothing is intended to change semantics. ``` ninja check-offload ``` Works with the X86 and AMDGPU offload tests ``` ninja check-openmp ``` Still works but doesn't build offload tests anymore. ``` ls install/lib ``` Shows all expected libraries, incl. - `libomptarget.devicertl.a` - `libomptarget-nvptx-sm_90.bc` - `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git` - `libomptarget.so` -> `libomptarget.so.18git` Fixes: https://github.com/llvm/llvm-project/issues/75124 --------- Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>