llvm-project.git/offload, branch users/matthias-springer/subset2

[OpenMP] Fix num_iters in __kmpc_*_loop DeviceRTL functions (#133435)

2025-04-01T09:29:08+00:00

This patch removes the addition of 1 to the number of iterations when
calling the following DeviceRTL functions:
- `__kmpc_distribute_for_static_loop*`
- `__kmpc_distribute_static_loop*`
- `__kmpc_for_static_loop*`

Calls to these functions are currently only produced by the OMPIRBuilder
from flang, which already passes the correct number of iterations to
these functions. By adding 1 to the received `num_iters` variable,
worksharing can produce incorrect results. This impacts flang OpenMP
offloading of `do`, `distribute` and `distribute parallel do`
constructs.

Expecting the application to pass `tripcount - 1` as the argument seems
unexpected as well, so rather than updating flang I think it makes more
sense to update the runtime.

Reland "Symbolize line zero as if no source info is available (#124846)" (#133798)

2025-03-31T23:13:46+00:00

This land commits 23aca2f88dd5d2447e69496c89c3ed42a56f9c31 and
1b15a89a23c631a8e2d096dad4afe456970572c0.
https://github.com/llvm/llvm-project/pull/128619 makes symbolizer to
always use debug info when available so we can reland this chagnge.

[OFFLOAD] Stricter enforcement of user offload disable (#133470)

2025-03-28T22:28:14+00:00

If user specifies offload is disabled (e.g.,
OMP_TARGET_OFFLOAD=disable), disable library almost completely. This
reduces resources spent to a minimum and ensures all APIs behave as if
the only available device is the host device.

Currently some of the APIs behave as if there were devices avaible for
offload even when under OMP_TARGET_OFFLOAD=disable.

---------

Co-authored-by: Joseph Huber

[PGO][Offload] Disable PGO on NVPTX (#133522)

2025-03-28T21:32:32+00:00

[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870)

2025-03-28T12:35:16+00:00

Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.

[offload] Remove bad assert in StaticLoopChunker::Distribute (#132705)

2025-03-28T09:53:00+00:00

When building with asserts enabled, this can actually cause strange
miscompilations because an incorrect llvm.assume is generated at the
point of the assertion.

[Offload] Guard HSA implicit arguments if they aren't created (#133073)

2025-03-26T13:54:33+00:00

Summary:
We conditionally allocate the implicit arguments, so they possibly are
null. The flang compiler seems to hit this case, even though it
shouldn't when it's supposed to conform to the HSA code object. For now
guard this to fix the regression and cover a case in the future where
someone rolls a fully custom implementatation.

Fixes: https://github.com/llvm/llvm-project/issues/132982

[Offload] Remove handling for COV4 binaries from offload/ (#131033)

2025-03-24T23:58:20+00:00

Summary:
We moved from cov4 to cov5 a long time ago, and it guards simplifying
some front end code, so we should be able to move up with this.

[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268)

2025-03-20T00:01:38+00:00

This pull request is the third part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on
https://github.com/llvm/llvm-project/pull/93365. This PR makes the
following changes:

- Allows PGO flags to be supplied to GPU targets
- Pulls version global from device
- Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to
allow the PGO version to be overridden

[OpenMP] Replace utilities with 'gpuintrin.h' definitions (#131644)

2025-03-19T15:47:21+00:00

Summary:
Port more instructions. AMD version is at
https://gist.github.com/jhuber6/235d7ee95f747c75f9a3cfd8eedac6aa