| Age | Commit message (Collapse) | Author |
|
Post-commit fix of #164794 reported at
https://github.com/llvm/llvm-project/pull/164794#issuecomment-3536253493
`LLVM_LIBRARY_OUTPUT_INTDIR` and `LLVM_RUNTIME_OUTPUT_INTDIR` is used by
`AddLLVM.cmake` as output directories. Unless we are in a
bootstrapping-build, It must not point to directories found by
`find_package(LLVM)` which may be read-only directories. MLIR for
instance sets thesese variables to its own build output
directory, so should the runtimes.
|
|
omp_get_device_from_uid()" (#168547)
Reverts llvm/llvm-project#164392 due to fortran issues
|
|
(#164392)
Use the implementation in libomptarget. If libomptarget is not
available, always return the UID / device number of the host / the
initial device.
|
|
(#140562)
This patch adds support to emit default declare mappers for implicit
mapping of derived types when not supplied by user. This especially
helps tackle mapping of allocatables of derived types.
|
|
After #163011 was merged, the tests in
[`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1)
broke because the offload plugins were no longer able to find
`__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm`
visible to the host on GPU targets and reverses the changes made in
f7e9968a5ba99521e6e51161f789f0cc1745193f.
|
|
Summary:
I made that an unimplemented error, but forgot that it was used for this
environment variable.
|
|
Summary:
This was a lot of code that was only used for upstream LLVM builds of
AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so
just use that. Simplifies code, can be added back if we start providing
alternate forms but I don't think there's a single use-case that would
justify it yet.
|
|
offsets (#165486)
Currently this is being calculated incorrectly and will result in
incorrect index offsets in more complicated array slices. This PR tries
to address it by refactoring and changing the calculation to be more
correct.
|
|
Adds omp_target_is_accessible routine.
Refactors common code from omp_target_is_present to work for both
routines.
---------
Co-authored-by: Shilei Tian <i@tianshilei.me>
|
|
- ompt_target_data_op_t, ompt_scope_endpoint_t and ompt_target_t are now
printed as strings instead of just numbers to ease debugging
- some missing clang-format clauses have been added
|
|
The failure was reported here:
https://github.com/llvm/llvm-project/pull/164039#issuecomment-3425429556
The test was checking for the "bad" behavior so as to keep track of it, but there seem to be some issues with the pointer arithmetic specific to aarch64.
The update for now is to not check for the "bad" behavior fully.
We may need to debug further if similar issues are encountered eventually once the codegen has been fixed.
|
|
Two of the tests are currently asserting, and two are emitting
unexpected results.
The asserting tests will be fixed using the ATTACH-style codegen from
#153683.
The other two involve `use_device_addr` on byrefs, and need more
follow-up codegen changes, that have been noted in a FIXME comment.
|
|
While people look into it, xfail the tests.
|
|
|
|
The test covers some of the identifier symbols in the interop runtime.
This test, for now, is to guard against complete breakage, which was the
result of the other `interop.c` test not being enabled on AMD and thus,
not caught by our buildbots.
|
|
(#154349)
This PR adds deferral of descriptor maps until they are necessary for
assumed dummy argument types. The intent is to avoid a problem where a
user can inadvertently map a temporary local descriptor to device
without their knowledge and proceed to never unmap it. This temporary
local descriptor remains lodged in OpenMP device memory and the next
time another variable or descriptor residing in the same stack address
is mapped we incur a runtime OpenMP map error as we try to remap the
same address.
This fix was discussed with the OpenMP committee and applies to OpenMP
5.2 and below, future versions of OpenMP can avoid this issue via the
attach semantics added to the specification.
|
|
types (#160766)
This PR adds support for nested derived types and their mappers to the
MapInfoFinalization pass.
- Generalize MapInfoFinalization to add child maps for arbitrarily
nested allocatables when a derived object is mapped via declare mapper.
- Traverse HLFIR designates rooted at the target block arg and build
full coordinate_of chains; append members with correct membersIndex.
This fixes #156461.
|
|
Summary:
Several of these tests have been failing for literal years. Ideally we
make efforts to fix this, but keeping these broken has had serious
consequences on our testing infrastructure where failures are the norm
so almost all test failures are disregarded. I made a tracking issue for
the ones that have been disabled.
https://github.com/llvm/llvm-project/issues/161265
|
|
Fortran no-loop test is supported only for GPU.
|
|
Enable the generation of no-loop kernels for Fortran OpenMP code. target
teams distribute parallel do pragmas can be promoted to no-loop kernels
if the user adds the -fopenmp-assume-teams-oversubscription and
-fopenmp-assume-threads-oversubscription flags.
If the OpenMP kernel contains reduction or num_teams clauses, it is not
promoted to no-loop mode.
The global OpenMP device RTL oversubscription flags no longer force
no-loop code generation for Fortran.
|
|
derived types" (#160759)
Reverts llvm/llvm-project#160116
|
|
types (#160116)
This PR adds support for nested derived types and their mappers to the
MapInfoFinalization pass.
- Generalize MapInfoFinalization to add child maps for arbitrarily
nested allocatables when a derived object is mapped via declare mapper.
- Traverse HLFIR designates rooted at the target block arg and build
full coordinate_of chains; append members with correct membersIndex.
This fixes #156461.
|
|
With declare mapper, the parent base entry was emitted as `TARGET_PARAM`
only. The mapper received a map-type without `to/from`, causing
components to degrade to `alloc`-only (no copies), breaking allocatable
payload mapping. This PR preserves the map-type bits from the parent.
This fixes #156466.
|
|
Adds end-to-end tests for `do concurrent` offloading to the device.
PR stack:
- https://github.com/llvm/llvm-project/pull/155754
- https://github.com/llvm/llvm-project/pull/155987
- https://github.com/llvm/llvm-project/pull/155992
- https://github.com/llvm/llvm-project/pull/155993 ◀️
- https://github.com/llvm/llvm-project/pull/157638
- https://github.com/llvm/llvm-project/pull/156610
- https://github.com/llvm/llvm-project/pull/156837
|
|
The refactoring lead to an additional data transfer. This changes the
assumed transfers in the check-strings to work with that changed
behavior.
|
|
Per the logic in top-level CMakeLists, `libomptarget` is placed into
`LLVM_LIBRARY_OUTPUT_INTDIR` when this variable is set. Adjust the test
logic to include this directory in `-L` and `-Wl,-rpath` arguments as
well, in order to fix finding tests when building via the `runtimes`
top-level directory.
Signed-off-by: Michał Górny <mgorny@gentoo.org>
|
|
Currently, there's a number of issues with mapping characters with LEN's
specified (strings effectively). They're represented as a char type in
FIR with a len parameter, and then later on they're expanded into an
array of characters when we're translating to the LLVM dialect. However,
we don't generate a bounds for these at lowering. The fix in this PR for
this is to generate a bounds from the LEN parameter and attatch it to
the map on lowering from FIR to the LLVM dialect when we encounter this
type.
|
|
|
|
This PR adds several new tests for mapping of chained structures, i.e.
those resembling:
#pragma omp target map(tofrom: a->b->c)
These are currently XFAILed, although the first two tests actually work
with unified memory -- I'm not sure if it's possible to easily improve
the condition on the XFAILs in question to make them more accurate.
These cases are all fixed by the WIP PR
https://github.com/llvm/llvm-project/pull/153683.
|
|
Reverts llvm/llvm-project#156675 due to regressions in standalone build
and test errors without all plugins enabled (#157345).
|
|
|
|
Summary:
Add a dependnecy on the unit tests on the main check-offload test suite.
This matches what the other projects do, pass `llvm-lit` to the
directory to only run the lit tests, use the `check-offload-unit` for
only the unit tests.
|
|
(Re)enables a couple of tests that were disabled on AMDGPU for some
reason. Pass for me locally.
|
|
These were not updated after #154736 .
|
|
These tests currently fail when libc is not configured to be built as
they require printf to be available in target regions.
|
|
Most tests are either compfailing or runfailing.
They should start passing once we start using ATTACH map-type based
codegen. (#153683)
Even after they start passing, there are a few places where the EXPECTED
and actual CHECKs are different, due to two main issues:
* use_device_ptr translation on `&p[0]` is not succeeding in looking-up
a previously mapped `&p[1]`
* privatization of byref use_device_addr operands is not happening
correctly.
The above should be fixed as separate standalone changes.
|
|
|
|
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.
Automap Ref: OpenMP 6.0 section 7.9.7.
|
|
A few tests were only mapping a pointee, like: `map(pp[0][0])`, on an
`int** pp`, but expecting the pointers, like `pp`, `pp[0]` to also be
mapped, which is incorrect.
This change fixes six such tests.
|
|
(#153048)"
This reverts commit 4e6d510eb3ec5b5e5ea234756ea1f0b283feee4a.
|
|
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.
Automap Ref: OpenMP 6.0 section 7.9.7.
|
|
This patch handles the strided update in the `#pragma omp target update
from(data[a:b:c])` directive where 'c' represents the strided access
leading to non-contiguous update in the `data` array when the offloaded
execution returns the control back to host from device using the `from`
clause.
Issue: Clang CodeGen where info is generated for the particular
`MapType` (to, from, etc), it was failing to detect the strided access.
Because of this, the `MapType` bits were incorrect when passed to
runtime. This led to incorrect execution (contiguous) in the
libomptarget runtime code.
Added a minimal testcase that verifies the working of the patch.
|
|
(#151989)"
This reverts commit 5a5e8ba0c388d57aecb359ed67919cda429fc7b1.
|
|
Add a new `AutomapToTargetData` pass. This gathers the declare target
enter variables which have the `AUTOMAP` modifier. And adds
`omp.declare_target_enter/exit` mapping directives for `fir.allocmem`
and `fir.freemem` oeprations on the `AUTOMAP` enabled variables.
Automap Ref: OpenMP 6.0 section 7.9.7.
|
|
in omp_target_alloc (#151882)
Enables AMD data center class GPUs to use memory manager memory pooling
up to 3GB allocation by default, up from the "1 << 13" threshold that
all plugin-nextgen devices use.
|
|
Made a typo in 963259ef6be4871e5252ff3ac9df737af5d2b4cb because I cannot
run tests and also did not review it. This should fix it...
|
|
This patch removes all the instances of %T from offload/ (only one test
contained this construction). %T has been deprecated for ~7 years and is
not reccomended as it does not use a unique directory per test. Switch
to using %t to ensure we use a unique dir per test and so that we can
eventually remove %T.
I did not actually test this. A couple feeble attempts at
building/running the offload tests just leaves me with a ton of test
failures. Given how small this is I'm reasonably sure it works though.
|
|
MapInfoFinalization (#150311)
Currently, we return early whenever we've already generated an
allocation for intermediate descriptor variables (required in certain
cases when we can't directly access the base address of a passes in
descriptor function argument due to HLFIR/FIR restrictions). This
unfortunately, skips over the presence check and load/store required to
set the intermediate descriptor allocations values/data. This is fine in
most cases, but if a function happens to have a series of branches with
seperate target regions capturing the same input argument, we'd emit the
present/load/store into the first branch with the first target inside of
it, the secondary (or any preceding) branches would not have the
present/load/store, this would lead to the subsequent mapped values in
that branch being empty and then leading to a memory access violation on
device.
The fix for the moment is to emit a present/load/store at the relevant
location of every target utilising the input argument, this likely will
also lead to fixing possible issues with the input argument being
manipulated inbetween target regions (primarily resizing, the data
should remain the same as we're just copying an address around, in
theory at least). There's possible optimizations/simplifications to emit
less load/stores such as by raising the load/store out of the branches
when we can, but I'm inclined to leave this sort of optimization to
lower level passes such as an LLVM pass (which very possibly already
covers it).
|
|
Enables 9 more tests
|
|
Fixes the incorrect lit command line introduced in 536ba87726d8dea862d964678dbb761ca32e21fb
|