summaryrefslogtreecommitdiff
path: root/offload/DeviceRTL/include
AgeCommit message (Collapse)Author
2025-09-08[OpenMP] Change build of OpenMP device runtime to be a separate runtime ↵Joseph Huber
(#136729) Summary: Currently we build the OpenMP device runtime as part of the `offload/` project. This is problematic because it has several restrictions when compared to the normal offloading runtime. It can only be built with an up-to-date clang and we need to set the target appropriately. Currently we hack around this by creating the compiler invocation manually, but this patch moves it into a separate runtimes build. This follows the same build we use for libc, libc++, compiler-rt, and flang-rt. This also moves it from `offload/` into `openmp/` because it is still the `openmp/` runtime and I feel it is more appropriate. We do want a generic `offload/` library at some point, but it would be trivial to then add that as a separate library now that we have the infrastructure that makes adding these new libraries trivial. This most importantly will require that users update their build configs, mostly adding the following lines at a minimum. I was debating whether or not I should 'auto-upgrade' this, but I just went with a warning. ``` -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda' \ -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=openmp \ -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=openmp \ ``` This also changed where the `.bc` version of the library lives, but it's still created.
2025-08-28[OpenMP][clang] 6.0: num_threads strict (part 2: device runtime) (#146404)Robert Imschweiler
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary device runtime changes.
2025-08-05[OpenMP] Fix weak linkage on malloc declarationJoseph Huber
Summary: This being weak forces the external reference to be weak. Either we define it weak or not by pulling it from `libc`. Doing it here causes it to not be extracted properly.
2025-05-05[OpenMP] Add pre sm_70 load hack back in (#138589)Joseph Huber
Summary: Different ordering modes aren't supported for an atomic load, so we just do an add of zero as the same thing. It's less efficient, but it works. Fixes https://github.com/llvm/llvm-project/issues/138560
2025-03-18[OpenMP] Use 'gpuintrin.h' definitions for simple block identifiers (#131631)Joseph Huber
Summary: This patch ports the runtime to use `gpuintrin.h` instead of calling the builtins for most things. The `lanemask_gt` stuff was left for now with a fallback. AMD version for Ron https://gist.github.com/jhuber6/42014d635b9a8158727640876bf47226.
2025-02-09[OpenMP] Replace use of target address space with <gpuintrin.h> local (#126119)Joseph Huber
Summary: This definition is more portable since it defines the correct value for the target. I got rid of the helper mostly because I think it's easy enough to use now that it's a type and being explicit about what's `undef` or `poison` is good.
2025-02-05[OpenMP] Port the OpenMP device runtime to direct C++ compilation (#123673)Joseph Huber
Summary: This removes the use of OpenMP offloading to build the device runtime. The main benefit here is that we no longer need to rely on offloading semantics to build a device only runtime. Things like variants are now no longer needed and can just be simple if-defs. In the future, I will remove most of the special handling here and fold it into calls to the `<gpuintrin.h>` functions instead. Additionally I will rework the compilation to make this a separate runtime. The current plan is to have this, but make including OpenMP and offloading either automatically add it, or print a warning if it's missing. This will allow us to use a normal CMake workflow and delete all the weird 'lets pull the clang binary out of the build' business. ``` -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=offload -DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa ``` After that, linking the OpenMP device runtime will be `-Xoffload-linker -lomp`. I.e. no more fat binary business. Only look at the most recent commit since this includes the two dependencies (fix to AMDGPUEmitPrintfBinding and the PointerToMember bug).
2025-01-31[Offload][NFC] Fix typos discovered by codespell (#125119)Christian Clauss
https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`
2025-01-21[OpenMP] Remove usage of pointer-to-member in lookup (#123671)Joseph Huber
Summary: This is buggy and is currently being tracked in https://github.com/llvm/llvm-project/issues/123241. For now, replace it with a macro so that we can use address spaces directly.
2025-01-20[OpenMP] Make each atomic helper take an atomic scope argument (#122786)Joseph Huber
Summary: Right now we just default to device for each type, and mix an ad-hoc scope with the one used by the compiler's builtins. Unify this can make each version take the scope optionally. For @ronlieb, this will remove the need for `add_system` in the fork as well as the extra `cas` with system scope, just pass `system`.
2025-01-20[OpenMP] Adjust 'printf' handling in the OpenMP runtime (#123670)Joseph Huber
Summary: We used to avoid a lot of this stuff because we didn't properly handle variadics in device code. That's been solved for now, so we can just make an internal printf handler that forwards to the external `vprintf` function. This is either provided by NVIDIA's SDK or by the GPU libc implementation. The main reason for doing this is because it prevents the stupid AMDGPU printf pass from mangling our beautiful printfs!
2025-01-20[OpenMP] Fix mispelled attribute and warningJoseph Huber
Summary: This is spelled `ompx_aligned_barrier` when used directly, but wasn't included in the list of known assumptions. Fix that so now th test works.
2025-01-20[OpenMP] Remove 'omp assumes' scopes now that we have no inline ASM (#123611)Joseph Huber
Summary: We used this globally scoped `ext_no_call_asm` as a sort of hack around the compiler that allowed the attributor to optimize out inline assembly calls to PTX instructions. Quite some time ago I got rid of every inline assembly call and replaced it with a builitin, so this can just be deleted. Furthermore, I use the `[[omp::assume]]` attribute directly for the aligned barrier usage. This prints an unknown assumption warning (even though it isn't) so I'm just silencing that for now until I fix it later. --------- Co-authored-by: Michael Kruse <github@meinersbur.de>
2025-01-16[OpenMP] Remove hack around missing atomic load (#122781)Joseph Huber
Summary: We used to do a fetch add of zero to approximate a load. This is because the NVPTX backend didn't handle this properly. It's not an issue anymore so simply use the proper atomic builtin.
2025-01-10[OpenMP] Fix missing type getter for SFINAE helperJoseph Huber
Summary: This didn't get the type, which made using this always return false.
2025-01-09[OpenMP] Use __builtin_bit_cast instead of UB type punning (#122325)Joseph Huber
Summary: Use a normal bitcast, remove from the shared utils since it's not available in GCC 7.4
2025-01-09[OpenMP] Update atomic helpers to just use headers (#122185)Joseph Huber
Summary: Previously we had some indirection here, this patch updates these utilities to just be normal template functions. We use SFINAE to manage the special case handling for floats. Also this strips address spaces so it can be used more generally.
2024-12-12[OpenMP] Replace AMDGPU fences with generic scoped fences (#119619)Joseph Huber
Summary: This is simpler and more common. I would've replaced the CUDA uses and made this the same but currently it doesn't codegen these fences fully and just emits a full system wide barrier as a fallback.
2024-09-13[OpenMP] Fix redefining `stdint.h` types (#108607)Joseph Huber
Summary: We can include `stdint.h` just fine as long as we don't allow it to find system headers, passing `-nostdlibinc` and `-nogpuinc` suppresses these extra paths so we will just use the clang resource headers for `stdint.h` and `stddef.h`.
2024-09-05[Offload][NFC] Reorganize `utils::` and make Device/Host/Shared clearer ↵Johannes Doerfert
(#100280) We had three `utils::` namespaces, all with different "meaning" (host, device, hsa_utils). We should, when we can, keep "include/Shared" accessible from host and device, thus RefCountTy has been moved to a separate header. `hsa_utils` was introduced to make `utils::` less overloaded. And common functionality was de-duplicated, e.g., `utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type punning now checks for the size of the result to make sure it matches the source type. No functional change was intended.
2024-08-22[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (#102691)Ethan Luis McDonough
This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.
2024-08-14[OpenMP] Implement 'omp_alloc' on the device (#102526)Joseph Huber
Summary: The 'omp_alloc' function should be callable from a target region. This patch implemets it by simply calling `malloc` for every non-default trait value allocator. All the special access modifiers are unimplemented and return null. The null allocator returns null as the spec states it should not be usable from the target.
2024-07-26Reapply "[OpenMP][libc] Remove special handling for OpenMP printf (#98940)"Joseph Huber
This reverts commit fea5914c926e2f013a8b5e27eaa74c7047fb2c71.
2024-07-26Revert "[OpenMP][libc] Remove special handling for OpenMP printf (#98940)"Joseph Huber
This reverts commit 069e8bcd82c4420239f95c7e6a09e1f756317cfc. Summary: Some tests failing, revert this for now.
2024-07-26[OpenMP][libc] Remove special handling for OpenMP printf (#98940)Joseph Huber
Summary: Currently there are several layers to handle `printf`. Since we now have varargs and an implementation of `printf` this can be heavily simplified. 1. The frontend renames `printf` into `omp_vprintf` and gives it an argument buffer. Removing 1. triggered some code in the AMDGPU backend menat for HIP / OpenCL, so I hadded an exception to it. 2. Forward this to CUDA vprintf or ignore it. We no longer need special handling for it since we have varargs. So now we just forward this to CUDA vprintf if we have libc, otherwise just leave `printf` as an external function and expect that `libc` will be linked in.
2024-07-01[OpenMP][offload] Fix dynamic schedule tracking (#97065)Gheorghe-Teodor Bercea
This patch fixes the dynamic schedule tracking.
2024-06-28Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587)"Ethan Luis McDonough
This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.
2024-06-28[PGO][OpenMP] Instrumentation for GPU devices (#76587)Ethan Luis McDonough
This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.
2024-06-03Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)" (#94139)Shilei Tian
2024-05-26Revert "Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)""Shilei Tian
This reverts commit 7b4865582299294455bc816358fd88a9c6e5e0be.
2024-05-26Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)"Shilei Tian
This reverts commit 9b31cc71d66064dfaf2afabf4a835211321bb4a0.
2024-05-24Revert "[OpenMP][OMPX] Add shfl_down_sync (#93311)"Joseph Huber
This reverts commit 098c6dfa8157681699a71fce9e3d94515e66311f. This reverts commit 8c718a3a91df4ab68dc3f1ca3887ea730c9aed84. This reverts commit 4fb02de9d490d0773441aa30124bb4d1272230d3.
2024-05-24[OpenMP][OMPX] Add shfl_down_sync (#93311)Shilei Tian
2024-05-24[OpenMP][OMPX] Add ballot_sync (#91297)Shilei Tian
This patch adds the support for `ballot_sync` in ompx.
2024-04-22[Offload] Move `/openmp/libomptarget` to `/offload` (#75125)Johannes Doerfert
In a nutshell, this moves our libomptarget code to populate the offload subproject. With this commit, users need to enable the new LLVM/Offload subproject as a runtime in their cmake configuration. No further changes are expected for downstream code. Tests and other components still depend on OpenMP and have also not been renamed. The results below are for a build in which OpenMP and Offload are enabled runtimes. In addition to the pure `git mv`, we needed to adjust some CMake files. Nothing is intended to change semantics. ``` ninja check-offload ``` Works with the X86 and AMDGPU offload tests ``` ninja check-openmp ``` Still works but doesn't build offload tests anymore. ``` ls install/lib ``` Shows all expected libraries, incl. - `libomptarget.devicertl.a` - `libomptarget-nvptx-sm_90.bc` - `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git` - `libomptarget.so` -> `libomptarget.so.18git` Fixes: https://github.com/llvm/llvm-project/issues/75124 --------- Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>