llvm-project.git/openmp/libomptarget/DeviceRTL/src/Parallelism.cpp, branch main

[Offload] Move `/openmp/libomptarget` to `/offload` (#75125)

2024-04-22T16:51:33+00:00

In a nutshell, this moves our libomptarget code to populate the offload
subproject.

With this commit, users need to enable the new LLVM/Offload subproject
as a runtime in their cmake configuration.
No further changes are expected for downstream code.

Tests and other components still depend on OpenMP and have also not been
renamed. The results below are for a build in which OpenMP and Offload
are enabled runtimes. In addition to the pure `git mv`, we needed to
adjust some CMake files. Nothing is intended to change semantics.

```
ninja check-offload
```
Works with the X86 and AMDGPU offload tests

```
ninja check-openmp
```
Still works but doesn't build offload tests anymore.

```
ls install/lib
```
Shows all expected libraries, incl.
- `libomptarget.devicertl.a`
- `libomptarget-nvptx-sm_90.bc`
- `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git`
- `libomptarget.so` -> `libomptarget.so.18git`

Fixes: https://github.com/llvm/llvm-project/issues/75124

---------

Co-authored-by: Saiyedul Islam

[libomptarget][NFC] Outline parallel SPMD function (#78642)

2024-01-29T16:41:35+00:00

This patch outlines the SPMD code path into a separate function that can
be called directly.

[Libomptarget][NFC] Format in-line comments consistently (#77530)

2024-01-10T16:10:08+00:00

Summary:
The LLVM style uses /*Foo=*/ when indicating the name of a constant. See
https://llvm.org/docs/CodingStandards.html#comment-formatting. This is
useful for consistency, as well as because `clang-format` understands
this syntax and formats it more cleanly. Do a bulk update of this
syntax.

Attributes (#69358)

2023-10-18T16:52:43+00:00

- [Libomptarget] Make the references to 'malloc' and 'free' weak.
- [Libomptarget][NFC] Use C++ style attributes instead

[OpenMP] Force the parallel abstraction to be inlined

2023-08-23T18:48:18+00:00

This is good for performance and compile time and the indirection (+
switch statements) is nothing that needs to be preserved.

[Libomptarget] Remove debug RAII from libomptarget

2023-08-03T14:37:47+00:00

This feature was supposed to allow you to trace execution inside of
Libomptarget. However, this never really worked properly. The printing
was always reoganized, only worked for single  threads, and pretty much
only told you a handful of things about a runtime library that's an
implementation detail to all users. Despite this, it contributed about
40% of the total filesize of the deviceRTL. This patch simply removes
this functionalit which I think was past due.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D157001

[OpenMP][NFC] Reorganize the ompx::mapping layer in the GPU runtime

2023-07-31T20:44:51+00:00

This change makes the naming more consistent, I hope.

[OpenMP][NFCI] Avoid storing non-constant values in ICV

2023-07-18T23:50:50+00:00

If we store a constant in an ICV it is easier for the optimizer to
propagate it. Since we often use the full block for the thread limit and
the parallel team size, we can instead replace that dynamic value with a
constant that otherwise cannot occur, here 0.

[OpenMP][NFCI] Split assertion message from assertion expression

2023-07-18T23:50:50+00:00

We ended up with `llvm.assume(icmp ne ptr as(4) null, as(4) @str)`
because the string in address space 4 was not known to be non-null.
There is no need to create these assumes.

[OpenMP] Ensure memory fences are created with barriers for AMDGPUs

2023-04-17T22:27:17+00:00

It turns out that the __builtin_amdgcn_s_barrier() alone does not emit
a fence. We somehow got away with this and assumed it would work as it
(hopefully) is correct on the NVIDIA path where we just emit a
__syncthreads. After talking to @arsenm we now (mostly) align with the
OpenCL barrier implementation [1] and emit explicit fences for AMDGPUs.

It seems this was the underlying cause for #59759, but I am not 100%
certain. There is a chance this simply hides the problem.

Fixes: https://github.com/llvm/llvm-project/issues/59759

[1] https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/07b347366eb2c6ebc3414af323c623cbbbafc854/opencl/src/workgroup/wgbarrier.cl#L21