llvm-project.git/offload/DeviceRTL/src/State.cpp, branch release/20

[openmp][nfc] Use builtin align in the devicertl (#131918)

2025-03-18T21:31:49+00:00

Noticed while extracting the smartstack as a test case

Revert "[openmp][nfc] Refactor shared/lds smartstack for spirv (#131905)"

2025-03-18T20:43:05+00:00

This reverts commit c02b935a9be888bbdf9f8cb0bf980bd411ae5893.
Failed a check-offload test under CI

[openmp][nfc] Refactor shared/lds smartstack for spirv (#131905)

2025-03-18T20:33:24+00:00

Spirv doesn't have implicit conversions between address spaces (at least
at present, we might need to change that) and address space qualified
*this pointers are not handled well by clang. This commit changes the
single instance of the smartstack to be explicitly a singleton, for
fractionally simpler IR generation (no this pointer) and to sidestep the
work in progress spirv64-- openmp target not being able to compile the
original version.

[OpenMP] Replace use of target address space with local (#126119)

2025-02-09T16:25:25+00:00

Summary:
This definition is more portable since it defines the correct value for
the target. I got rid of the helper mostly because I think it's easy
enough to use now that it's a type and being explicit about what's
`undef` or `poison` is good.

[OpenMP] Port the OpenMP device runtime to direct C++ compilation (#123673)

2025-02-05T14:18:52+00:00

Summary:
This removes the use of OpenMP offloading to build the device runtime.
The main benefit here is that we no longer need to rely on offloading
semantics to build a device only runtime. Things like variants are now
no longer needed and can just be simple if-defs. In the future, I will
remove most of the special handling here and fold it into calls to the
`` functions instead. Additionally I will rework the
compilation to make this a separate runtime.

The current plan is to have this, but make including OpenMP and
offloading either automatically add it, or print a warning if it's
missing. This will allow us to use a normal CMake workflow and delete
all the weird 'lets pull the clang binary out of the build' business.
```
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=offload
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa
```

After that, linking the OpenMP device runtime will be `-Xoffload-linker
-lomp`. I.e. no more fat binary business.

Only look at the most recent commit since this includes the two
dependencies
(fix to AMDGPUEmitPrintfBinding and the PointerToMember bug).

[OpenMP] Adjust 'printf' handling in the OpenMP runtime (#123670)

2025-01-21T03:56:46+00:00

Summary:
We used to avoid a lot of this stuff because we didn't properly handle
variadics in device code. That's been solved for now, so we can just
make an internal printf handler that forwards to the external `vprintf`
function. This is either provided by NVIDIA's SDK or by the GPU libc
implementation.

The main reason for doing this is because it prevents the stupid AMDGPU
printf pass from mangling our beautiful printfs!

[Offload][NFC] Reorganize `utils::` and make Device/Host/Shared clearer (#100280)

2024-09-05T20:36:26+00:00

We had three `utils::` namespaces, all with different "meaning" (host,
device, hsa_utils). We should, when we can, keep "include/Shared"
accessible from host and device, thus RefCountTy has been moved to a
separate header. `hsa_utils` was introduced to make `utils::` less
overloaded. And common functionality was de-duplicated, e.g.,
`utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type
punning now checks for the size of the result to make sure it matches
the source type.

No functional change was intended.

[OpenMP] Implement 'omp_alloc' on the device (#102526)

2024-08-14T18:38:55+00:00

Summary:
The 'omp_alloc' function should be callable from a target region. This
patch implemets it by simply calling `malloc` for every non-default
trait value allocator. All the special access modifiers are
unimplemented and return null. The null allocator returns null as the
spec states it should not be usable from the target.

[Offload] Move `/openmp/libomptarget` to `/offload` (#75125)

2024-04-22T16:51:33+00:00

In a nutshell, this moves our libomptarget code to populate the offload
subproject.

With this commit, users need to enable the new LLVM/Offload subproject
as a runtime in their cmake configuration.
No further changes are expected for downstream code.

Tests and other components still depend on OpenMP and have also not been
renamed. The results below are for a build in which OpenMP and Offload
are enabled runtimes. In addition to the pure `git mv`, we needed to
adjust some CMake files. Nothing is intended to change semantics.

```
ninja check-offload
```
Works with the X86 and AMDGPU offload tests

```
ninja check-openmp
```
Still works but doesn't build offload tests anymore.

```
ls install/lib
```
Shows all expected libraries, incl.
- `libomptarget.devicertl.a`
- `libomptarget-nvptx-sm_90.bc`
- `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git`
- `libomptarget.so` -> `libomptarget.so.18git`

Fixes: https://github.com/llvm/llvm-project/issues/75124

---------

Co-authored-by: Saiyedul Islam