summaryrefslogtreecommitdiff
path: root/offload/plugins-nextgen/common/src/PluginInterface.cpp
AgeCommit message (Collapse)Author
2024-06-14[OpenMP] Add Environment Variable to disable Reuse of Blocks for High Loop ↵Tim Gymnich
Trip Counts (#89239) Sometimes it might be beneficial to spawn more thread blocks instead of reusing existing for multiple loop iterations. **Alternatives considered:** Make `DefaultNumBlocks` settable via an environment variable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-06-13[Offload] Use flat array for cuLaunchKernel (#95116)Johannes Doerfert
We already used a flat array of kernel launch parameters for the AMD GPU launch but now we also use this scheme for the NVIDIA GPU launch. The only remaining/required use of the indirection is the host plugin (due ot ffi). This allows to us simplify the use for non-OpenMP kernel launch.
2024-06-06[Libomptarget] Rework device initialization and image registration (#93844)Joseph Huber
Summary: Currently, we register images into a linear table according to the logical OpenMP device identifier. We then initialize all of these images as one block. This logic requires that images are compatible with *all* devices instead of just the one that it can run on. This prevents us from running on systems with heterogeneous devices (i.e. image 1 runs on device 0 image 0 runs on device 1). This patch reworks the logic by instead making the compatibility check a per-device query. We then scan every device to see if it's compatible and do it as they come.
2024-05-23[Offload] Only initialize a plugin if it is needed (#92765)Joseph Huber
Summary: Initializing the plugins requires initializing the runtime like CUDA or HSA. This has a considerable overhead on most platforms, so we should only actually initialize a plugin if it is needed by any image that is loaded.
2024-05-16[Libomptarget] Rework Record & Replay to be a plugin member (#88928) (#89097)Joseph Huber
Summary: Previously, the R&R support was global state initialized by a global constructor. This is bad because it prevents us from adequately constraining the lifetime of the library. Additionally, we want to minimize the amount of global state floating around. This patch moves the R&R support into a plugin member like everything else. This means there will be multiple copies of the R&R implementation floating around, but this was already the case given the fact that we currently handle everything with dynamic libraries.
2024-05-16[Libomptarget] Remove requires information from plugin (#80345)Joseph Huber
Summary: Currently this is only used for the zero-copy handling. However, this can easily be moved into `libomptarget` so that we do not need to bother setting the requires flags in the plugin. The advantage here is that we no longer need to do this for every device redundently. Additionally, these requires flags are specifically OpenMP related, so they should live in `libomptarget`.
2024-05-15[Offload][NFC] Fix warning messages in runtimeJoseph Huber
Summary: These are lots of random warnings due to inconsistent initialization or signedness.
2024-05-14[Offload] Remove old references to `isCtor` (#91766)Joseph Huber
Summary: These have long since been removed, support for ctors / dtors now happens through special kernels the backend creates.
2024-05-09[Reland][Libomptarget] Statically link all plugin runtimes (#87009)Joseph Huber
This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868
2024-05-09Revert "[Libomptarget] Statically link all plugin runtimes (#87009)"Joseph Huber
Caused failures on build-bots, reverting to investigate. This reverts commit 80f9e814ec896fdc57ee84afad8ac4cb1f8e4627.
2024-05-09[Libomptarget] Statically link all plugin runtimes (#87009)Joseph Huber
This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868
2024-05-07[Offload] Fix dataDelete op for TARGET_ALLOC_HOST memory type (#91134)Jhonatan Cléto
Summary: The `GenericDeviceTy::dataDelete` method doesn't verify the `TargetAllocTy` of the of the device pointer. Because of this, it can use the `MemoryManager` to free the ptr. However, the `TARGET_ALLOC_HOST` and `TARGET_ALLOC_SHARED` types are not allocated using the `MemoryManager` in the `GenericDeviceTy::dataAlloc` method. Since the `MemoryManager` uses the `DeviceAllocatorTy::free` operation without specifying the type of the ptr, some plugins may use incorrect operations to free ptrs of certain types. In particular, this bug causes the CUDA plugin to use the `cuMemFree` operation on ptrs of type `TARGET_ALLOC_HOST`, resulting in an unchecked error, as shown in the output snippet of the test `offload/test/api/omp_host_pinned_memory_alloc.c`: ``` omptarget --> Notifying about an unmapping: HstPtr=0x00007c6114200000 omptarget --> Call to llvm_omp_target_free_host for device 0 and address 0x00007c6114200000 omptarget --> Call to omp_get_num_devices returning 1 omptarget --> Call to omp_get_initial_device returning 1 PluginInterface --> MemoryManagerTy::free: target memory 0x00007c6114200000. PluginInterface --> Cannot find its node. Delete it on device directly. TARGET CUDA RTL --> Failure to free memory: Error in cuMemFree[Host]: invalid argument omptarget --> omp_target_free deallocated device ptr ``` This patch fixes this by adding the check of the device pointer type before calling the appropriate operation for each type.
2024-04-22[Offload] Move `/openmp/libomptarget` to `/offload` (#75125)Johannes Doerfert
In a nutshell, this moves our libomptarget code to populate the offload subproject. With this commit, users need to enable the new LLVM/Offload subproject as a runtime in their cmake configuration. No further changes are expected for downstream code. Tests and other components still depend on OpenMP and have also not been renamed. The results below are for a build in which OpenMP and Offload are enabled runtimes. In addition to the pure `git mv`, we needed to adjust some CMake files. Nothing is intended to change semantics. ``` ninja check-offload ``` Works with the X86 and AMDGPU offload tests ``` ninja check-openmp ``` Still works but doesn't build offload tests anymore. ``` ls install/lib ``` Shows all expected libraries, incl. - `libomptarget.devicertl.a` - `libomptarget-nvptx-sm_90.bc` - `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git` - `libomptarget.so` -> `libomptarget.so.18git` Fixes: https://github.com/llvm/llvm-project/issues/75124 --------- Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>