| Age | Commit message (Collapse) | Author |
|
This PR introduces new debug macros that allow a more fined control of
which debug message to output and introduce C++ stream style for debug
messages.
Changing existing messages (except a few that I changed for testing)
will come in subsequent PRs.
I also think that we should make debug enabling OpenMP agnostic but, for
now, I prioritized maintaing the current libomptarget behavior for now,
and we might need more changes further down the line as we we decouple
libomptarget.
|
|
omp_get_device_from_uid()" (#168547)
Reverts llvm/llvm-project#164392 due to fortran issues
|
|
(#164392)
Use the implementation in libomptarget. If libomptarget is not
available, always return the UID / device number of the host / the
initial device.
|
|
(#164613)
Follow-up from #162652
---------
Co-authored-by: Michael Klemm <michael.klemm@amd.com>
|
|
Adds omp_target_is_accessible routine.
Refactors common code from omp_target_is_present to work for both
routines.
---------
Co-authored-by: Shilei Tian <i@tianshilei.me>
|
|
On Windows, for a reason I don't fully understand boolean bits get extra
padding (even when asking for packed structures) in the structures that
messes the offsets between the compiler and the runtime.
Also, "weak" works differently on Windows than Linux (i.e., the "local"
routine has preference) which causes it to crash as we don't really have
an alternate implementation of __kmpc_omp_wait_deps. Given this, it
doesn't make sense to mark it as "weak" for Linux either.
|
|
|
|
This implements two pieces to restore the interop functionality (that I
broke) when the 6.0 interfaces were added:
* A set of wrappers that support the old interfaces on top of the new
ones
* The same level of interop support for the CUDA amd AMD plugins
|
|
corresponding-pointer-initialization. (#160760)
`PRIVATE | ATTACH` maps can be used to represent firstprivate pointers
that should be initialized by doing doing the pointee's device address,
if its lookup succeeds, or retain the original host pointee's address
otherwise.
With this, for a test like the following:
```f90
integer, pointer :: p(:)
!$omp target map(p(1))
... print*, p(1)
!$omp end target
```
The codegen can look like:
```llvm
; maps for p:
; &p(1), &p(1), sizeof(p(1)), TO|FROM //(1)
; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), ATTACH //(2)
; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), PRIVATE|ATTACH|PARAM //(3)
call... @__omp_outlined...(ptr %ref_ptr_of_p)
```
* `(1)` maps the pointee `p(1)`.
* `(2)` attaches it to the (previously) mapped `ref_ptr(p)`, if present.
It can be controlled via OpenMP 6.1's `attach(auto/always/never)`
map-type modifiers.
* `(3)` privatizes and initializes the local `ref_ptr(p)`, which gets
passed
in as the kernel argument `%ref_ptr_of_p`. Can be skipped if p is not
referenced directly within the region.
While similar mapping can be used for C/C++, it's more important/useful
for Fortran as we can avoid creating another argument for passing the
descriptor, and use that to initialize the private copy in the body of
the kernel.
|
|
Summary:
This exposes the 'isDeviceCompatible' routine for checking if a binary
*can* be loaded. This is useful if people don't want to consume errors
everywhere when figuring out which image to put to what device.
I don't know if this is a good name, I was thining like `olIsCompatible`
or whatever. Let me know what you think.
Long term I'd like to be able to do something similar to what OpenMP
does where we can conditionally only initialize devices if we need them.
That's going to be support needed if we want this to be more
generic.
|
|
(#158370)
This change adds support for saving full contents of attached Fortran
descriptors, and not just their pointee address, in the shadow-pointer
table.
With this, we now support:
* comparing full contents of descriptors to check whether a previous
shadow-pointer entry is stale;
* restoring the full contents of descriptors
And with that, we can now use ATTACH map-types (added in #149036) for
mapping Fortran pointer/allocatable arrays, and array-sections on them.
e.g.:
```f90
integer, allocatable :: x(:)
!$omp target enter data map(to: x(:))
```
as:
```
void* addr_of_pointee = allocated(x) ? &x(1) : nullptr;
int64_t sizeof_pointee = allocated(x) ? sizeof(x(:)) : 0
addr_of_pointee, addr_of_pointee, sizeof_pointee, TO
addr_of_descriptor, addr_of_pointee, size_of_descriptor, ATTACH
```
|
|
Summary:
This operation is done every time we load a binary, this behavior should
be moved into OpenMP since it concerns an OpenMP specific data struct.
This is a little messy, because ideally we should only be using public
APIs, but more can be extracted later.
|
|
Also adds two debug dumps regarding pointer-attachment.
|
|
This patch introduces libomptarget support for the ATTACH map-type,
which can be used to implement OpenMP conditional compliant pointer
attachment, based on whether the pointer/pointee is newly mapped on a
given construct.
For example, for the following:
```c
int *p;
#pragma omp target enter data map(p[1:10])
```
The following maps can be emitted by clang:
```
(A)
&p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
&p, &p[1], sizeof(p), ATTACH
```
Without this map-type, these two possible maps could be emitted by
clang:
```
(B)
&p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
(C)
&p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ
````
(B) does not perform any pointer attachment, while (C) also maps the
pointer p, which are both incorrect.
In terms of implementation, maps with the ATTACH map-type are handled
after all other maps have been processed, as it requires knowledge of
which new allocations happened as part of the construct. As per OpenMP
5.0, an attachment should happen only when either the pointer or the
pointee was newly mapped while handling the construct.
Maps with ATTACH map-type-bit do not increase/decrease the ref-count.
With OpenMP 6.1, `attach(always/never)` can be used to force/prevent
attachment. For `attach(always)`, the compiler will insert the ALWAYS
map-type, which would let libomptarget bypass the check about one of the
pointer/pointee being new. With `attach(never)`, the ATTACH map will not
be emitted at all.
The size argument of the ATTACH map-type can specify values greater than
`sizeof(void*)` which can be used to support pointer attachment on
Fortran descriptors. Note that this also requires shadow-pointer
tracking to also support them. That has not been implemented in this
patch.
This was worked upon in coordination with Ravi Narayanaswamy, who has
since retired. Happy retirement, Ravi!
---------
Co-authored-by: Alex Duran <alejandro.duran@intel.com>
|
|
This sprinkles a few mutexes around the plugin interface so that the
olLaunchKernel CTS test now passes when ran on multiple threads.
Part of this also involved changing the interface for device synchronise
so that it can optionally not free the underlying queue (which
introduced a race condition in liboffload).
|
|
The following patch introduces a new interop interface implementation
with the following characteristics:
* It supports the new 6.0 prefer_type specification
* It supports both explicit objects (from interop constructs) and
implicit objects (from variant calls).
* Implements a per-thread reuse mechanism for implicit objects to reduce
overheads.
* It provides a plugin interface that allows selecting the supported
interop types, and managing all the backend related interop operations
(init, sync, ...).
* It enables cooperation with the OpenMP runtime to allow progress on
OpenMP synchronizations.
* It cleanups some vendor/fr_id mismatchs from the current query
routines.
* It supports extension to define interop callbacks for library cleanup.
|
|
OpenMP allows duplicate mappings, i.e. in OpenMP 6.0, 7.9.6 "map
Clause":
Two list items of the map clauses on the same construct must not share
original storage unless one of the following is true: they are the same
list item [or other omitted reasons]"
Duplicate mappings can arise as a result of user-defined mapper
processing (which I think is a separate bug, and is not addressed here),
but also in straightforward cases such as:
#pragma omp target map(tofrom: s.mem[0:10]) map(tofrom: s.mem[0:10])
Both these cases cause crashes at runtime at present, due to an
unfortunate interaction between reference counting behaviour and shadow
pointer handling for blocks. This is what happens:
1. The member "s.mem" is copied to the target
2. A shadow pointer is created, modifying the pointer on the target
3. The member "s.mem" is copied to the target again
4. The previous shadow pointer metadata is still present, so the runtime doesn't modify the target pointer a second time.
The fix is to disable step 3 if we've already done step 2 for a given
block that has the "is new" flag set.
|
|
Summary:
Currently the Auto Zero-Copy is enabled by checking every initialized
device to ensure that no dGPU is attached to an APU. However, an APU is
designed to comprise a homogeneous set of GPUs, therefore, it should be
sufficient to check any device for configuring Auto Zero-Copy. In this
PR, it checks the first initialized device in the list.
The changes in this PR are to clearly reflect the design and logic of
enabling the feature for further improving the readibility.
|
|
Summary:
This is a weird dependency on libomp just for testing if version scripts
work. We shouldn't need to do this because LLVM already checks for
this. I believe this should be available as well in standalone when we
call `addLLVM` but I did not test that directly.
|
|
(#139275)
[Offload] Use new error code handling mechanism
This removes the old ErrorCode-less error method and requires
every user to provide a concrete error code. All calls have been
updated.
In addition, for consistency with error messages elsewhere in LLVM, all
messages have been made to start lower case.
|
|
(#139147)
Summary:
Right now we generally assume that we have one image per device. The
binary descriptor represents a single 'compilation'. This means that
each image is going to contain the same code built for different
architectures when used through the OpenMP interface. This is
problematic when we have cases where the same code will then be loaded
multiple times (like wiht sm_80, sm_89 or the generic GFX ISAs). This
patch is the quick and dirty slution, we just prevent this from
happening at all. This means we use the first one we find, which might
not be overly optimal, but it should be better than the alternative.
Note that this does not affect shared library loads as it is per binary
descriptor, not per device.
|
|
If user specifies offload is disabled (e.g.,
OMP_TARGET_OFFLOAD=disable), disable library almost completely. This
reduces resources spent to a minimum and ensures all APIs behave as if
the only available device is the host device.
Currently some of the APIs behave as if there were devices avaible for
offload even when under OMP_TARGET_OFFLOAD=disable.
---------
Co-authored-by: Joseph Huber <huberjn@outlook.com>
|
|
Also add some clarifying comments.
|
|
Summary:
The name `src` is confusing when combined with the plugins and the newly
added `liboffload`.
|