llvm-project.git/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp, branch users/MaskRay/spr/main.elf-orphan-placement-remove-hasinputsections-condition

[LoopUnroll] Consider convergence control tokens when unrolling (#91715)

2024-06-06T07:43:46+00:00

- There is no restriction on a loop with controlled convergent
operations when
  the relevant tokens are defined and used within the loop.

- When a token defined outside a loop is used inside (also called a loop
convergence heart), unrolling is allowed only in the absence of
remainder or
  runtime checks.

- When a token defined inside a loop is used outside, such a loop is
said to be
"extended". This loop can only be unrolled by also duplicating the
extended part
  lying outside the loop. Such unrolling is disabled for now.

- Clean up loop hearts: When unrolling a loop with a heart, duplicating
the
heart will introduce multiple static uses of a convergence control token
in a
cycle that does not contain its definition. This violates the static
rules for
tokens, and needs to be cleaned up into a single occurrence of the
intrinsic.

- Spell out the initializer for UnrollLoopOptions to improve
readability.


Original implementation [D85605] by Nicolai Haehnle
.

[OpenMP][LLVM] Update alloca IP after `PrivCB` in `OMPIRBUIlder` (#93920)

2024-06-05T03:13:47+00:00

Fixes a crash uncovered by
[pr77666.f90](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr77666.f90)
in the test suite when delayed privatization is enabled by default.

In particular, whenever `PrivCB` (the callback responsible for
generating privatizaiton logic for an OMP variable) generates a
multi-block privatization region, the insertion point diverges: the BB
component of the IP can become a different BB from the parent block of
the instruction iterator component of the IP. This PR updates the IP to
make sure that the BB is the parent block of the instruction iterator.

[flang][MLIR][OpenMP] make reduction by-ref toggled per variable (#92244)

2024-05-16T14:27:59+00:00

Fixes #88935

Toggling reduction by-ref broke when multiple reduction clauses were
used. Decisions made for the by-ref status for later clauses could then
invalidate decisions for earlier clauses. For example,

```
reduction(+:scalar,scalar2) reduction(+:array)
```

The first clause would choose by value reduction and generate by-value
reduction regions, but then after this the second clause would force
by-ref to support the array argument. But by the time the second clause
is processed, the first clause has already had the wrong kind of
reduction regions generated.

This is solved by toggling whether a variable should be reduced by
reference per variable. In the above example, this allows only `array`
to be reduced by ref.

[mlir][OpenMP] - Honor dependencies in code-generation of the if clause in `omp.task` correctly (#90891)

2024-05-13T13:54:23+00:00

This patch fixes the code generation of the if clause, specifically when the
condition evaluates to false and when the task directive has the depend
clause on it. When the if clause of a task construct evaluates to false,
then the task is an undeferred task. This undeferred task still has to
honor dependencies. Previously, the OpenMPIRbuilder didn't honor
dependencies. This patch fixes that.

Fixes https://github.com/llvm/llvm-project/issues/90869

[flang][OMPIRBuilder] Keep debug location in sync with insert point. (#89953)

2024-05-10T09:12:24+00:00

A customer reported an issue which I have reduced to the test in the PR.
If built with debug info enabled, the build fails with the following
error in the verifier.

!dbg attachment points at wrong subprogram for function

The problem happened because some of the functions in OMPIRBuilder.cpp
updated the insertion point with the passed in location but did not
change the current debug location. This caused a stale debug location to
be attached to the instruction.

I have solved it by replacing restoreIP with updateToLocation which
updates both the insertion point and debug location. The
updateToLocation is used in many places already, so this PR brings
functions that I have changed in line with rest of the file.

Slight issue is that I am not checking the return type of
updateToLocation as there is no good value I could return in that case.
But if we have a condition where updateToLocation will return false,
these functions will fail in any case.

I have added a test that checks that build does not fail. I was not sure
what is the correct location for the test should be. Happy to move it to
more appropriate location.

[MLIR][OpenMP] Extend omp.private materialization support: `dealloc` (#90841)

2024-05-03T06:59:01+00:00

Extends current support for delayed privatization during translation to
LLVM IR. This adds support for materlizaing the `dealloc` region in
`omp.private` ops when this region contains clean-up/deallocation logic
that needs to be executed at the end of the parallel region.

This changes the `OMPIRBuilder` slightly to execute the finalization
callback **after** the privatization callback. This allows us to collect
information about privatized variables on the MLIR and LLVM sides so
that we can properly emit deallocation logic.

[OpenMP] Remove 'minncta' attributes from NVPTX kernels (#88398)

2024-04-15T20:28:05+00:00

Summary:
Currently we treat this attribute as a minimum number for the amount of
blocks scheduled on the kernel. However, the doucmentation states that
this applies to CTA's mapped onto a *single* SM. Currently we just set
it to the total number of blocks, which will almost always result in a
warning that the value is out of range and will be ignored. We don't
have a good way to automatically know how many CTAs can be put on a
single SM nor if we should do this, so we should probably leave this up
to users manually adding it.


https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm

[OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels (#87695)

2024-04-05T12:38:01+00:00

Summary:
This new attribute was introduced recently. We already do this for NVPTX
kernels so we should apply this for AMDGPU as well. This patch simply
applies this metadata in cases where a lower bound is known

Reapply "[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737)"

2024-03-19T15:49:10+00:00

Fixes a build error caused by an unupdated getAsInstruction callsite in clang.

This reverts commit ab851f7fe946e7eed700ef9d82082eb721860189.

Revert "[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737)"

2024-03-19T14:41:27+00:00

Reverted due to buildbot failures:
https://lab.llvm.org/buildbot/#/builders/139/builds/61717/

This reverts commit 7ef433f62c199c414bffdcac1c8ee3159b29c5f5.