<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/offload/DeviceRTL/include, branch users/nico/python-2</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[OpenMP] Add pre sm_70 load hack back in (#138589)</title>
<updated>2025-05-05T21:33:41+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-05-05T21:33:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=dfcb8cb2a92c9f72ddde5ea08dadf2f640197d32'/>
<id>dfcb8cb2a92c9f72ddde5ea08dadf2f640197d32</id>
<content type='text'>
Summary:
Different ordering modes aren't supported for an atomic load, so we just
do an add of zero as the same thing. It's less efficient, but it works.

Fixes https://github.com/llvm/llvm-project/issues/138560</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
Different ordering modes aren't supported for an atomic load, so we just
do an add of zero as the same thing. It's less efficient, but it works.

Fixes https://github.com/llvm/llvm-project/issues/138560</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Use 'gpuintrin.h' definitions for simple block identifiers (#131631)</title>
<updated>2025-03-18T20:38:46+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-03-18T20:38:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=206f78dfec71b86d4f58ef6bf23cc12151f5b54d'/>
<id>206f78dfec71b86d4f58ef6bf23cc12151f5b54d</id>
<content type='text'>
Summary:
This patch ports the runtime to use `gpuintrin.h` instead of calling the
builtins for most things. The `lanemask_gt` stuff was left for now with
a fallback.

AMD version for Ron
https://gist.github.com/jhuber6/42014d635b9a8158727640876bf47226.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
This patch ports the runtime to use `gpuintrin.h` instead of calling the
builtins for most things. The `lanemask_gt` stuff was left for now with
a fallback.

AMD version for Ron
https://gist.github.com/jhuber6/42014d635b9a8158727640876bf47226.</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Replace use of target address space with &lt;gpuintrin.h&gt; local (#126119)</title>
<updated>2025-02-09T16:25:25+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-02-09T16:25:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=ed9107f2d71804f6bedff6cd05b1f1a4750eb112'/>
<id>ed9107f2d71804f6bedff6cd05b1f1a4750eb112</id>
<content type='text'>
Summary:
This definition is more portable since it defines the correct value for
the target. I got rid of the helper mostly because I think it's easy
enough to use now that it's a type and being explicit about what's
`undef` or `poison` is good.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
This definition is more portable since it defines the correct value for
the target. I got rid of the helper mostly because I think it's easy
enough to use now that it's a type and being explicit about what's
`undef` or `poison` is good.</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Port the OpenMP device runtime to direct C++ compilation (#123673)</title>
<updated>2025-02-05T14:18:52+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-02-05T14:18:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=bb7ab2557c485e004e619570cca7e2204b98a71b'/>
<id>bb7ab2557c485e004e619570cca7e2204b98a71b</id>
<content type='text'>
Summary:
This removes the use of OpenMP offloading to build the device runtime.
The main benefit here is that we no longer need to rely on offloading
semantics to build a device only runtime. Things like variants are now
no longer needed and can just be simple if-defs. In the future, I will
remove most of the special handling here and fold it into calls to the
`&lt;gpuintrin.h&gt;` functions instead. Additionally I will rework the
compilation to make this a separate runtime.

The current plan is to have this, but make including OpenMP and
offloading either automatically add it, or print a warning if it's
missing. This will allow us to use a normal CMake workflow and delete
all the weird 'lets pull the clang binary out of the build' business.
```
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=offload
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa
```

After that, linking the OpenMP device runtime will be `-Xoffload-linker
-lomp`. I.e. no more fat binary business.

Only look at the most recent commit since this includes the two
dependencies
(fix to AMDGPUEmitPrintfBinding and the PointerToMember bug).</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
This removes the use of OpenMP offloading to build the device runtime.
The main benefit here is that we no longer need to rely on offloading
semantics to build a device only runtime. Things like variants are now
no longer needed and can just be simple if-defs. In the future, I will
remove most of the special handling here and fold it into calls to the
`&lt;gpuintrin.h&gt;` functions instead. Additionally I will rework the
compilation to make this a separate runtime.

The current plan is to have this, but make including OpenMP and
offloading either automatically add it, or print a warning if it's
missing. This will allow us to use a normal CMake workflow and delete
all the weird 'lets pull the clang binary out of the build' business.
```
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=offload
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa
```

After that, linking the OpenMP device runtime will be `-Xoffload-linker
-lomp`. I.e. no more fat binary business.

Only look at the most recent commit since this includes the two
dependencies
(fix to AMDGPUEmitPrintfBinding and the PointerToMember bug).</pre>
</div>
</content>
</entry>
<entry>
<title>[Offload][NFC] Fix typos discovered by codespell (#125119)</title>
<updated>2025-01-31T15:35:29+00:00</updated>
<author>
<name>Christian Clauss</name>
<email>cclauss@me.com</email>
</author>
<published>2025-01-31T15:35:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=1f56bb3137827d66093b66aa3a6447fdaba61783'/>
<id>1f56bb3137827d66093b66aa3a6447fdaba61783</id>
<content type='text'>
https://github.com/codespell-project/codespell

% `codespell
--ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes
--write-changes`</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
https://github.com/codespell-project/codespell

% `codespell
--ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes
--write-changes`</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Remove usage of pointer-to-member in lookup (#123671)</title>
<updated>2025-01-21T13:50:40+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-21T13:50:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=f233a54ae80b5fe7604aa20007d050cefdd5f663'/>
<id>f233a54ae80b5fe7604aa20007d050cefdd5f663</id>
<content type='text'>
Summary:
This is buggy and is currently being tracked in
https://github.com/llvm/llvm-project/issues/123241. For now, replace it
with a macro so that we can use address spaces directly.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
This is buggy and is currently being tracked in
https://github.com/llvm/llvm-project/issues/123241. For now, replace it
with a macro so that we can use address spaces directly.</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Make each atomic helper take an atomic scope argument (#122786)</title>
<updated>2025-01-21T03:58:27+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-21T03:58:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=3274bf6b4282a0dafd4b5a2efa09824e5ca417d0'/>
<id>3274bf6b4282a0dafd4b5a2efa09824e5ca417d0</id>
<content type='text'>
Summary:
Right now we just default to device for each type, and mix an ad-hoc
scope with the one used by the compiler's builtins. Unify this can make
each version take the scope optionally.

For @ronlieb, this will remove the need for `add_system` in the fork as
well as the extra `cas` with system scope, just pass `system`.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
Right now we just default to device for each type, and mix an ad-hoc
scope with the one used by the compiler's builtins. Unify this can make
each version take the scope optionally.

For @ronlieb, this will remove the need for `add_system` in the fork as
well as the extra `cas` with system scope, just pass `system`.</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Adjust 'printf' handling in the OpenMP runtime (#123670)</title>
<updated>2025-01-21T03:56:46+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-21T03:56:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=2d9f40694324a72c2b7a3d6a9cfcc7ce8069afc1'/>
<id>2d9f40694324a72c2b7a3d6a9cfcc7ce8069afc1</id>
<content type='text'>
Summary:
We used to avoid a lot of this stuff because we didn't properly handle
variadics in device code. That's been solved for now, so we can just
make an internal printf handler that forwards to the external `vprintf`
function. This is either provided by NVIDIA's SDK or by the GPU libc
implementation.

The main reason for doing this is because it prevents the stupid AMDGPU
printf pass from mangling our beautiful printfs!</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
We used to avoid a lot of this stuff because we didn't properly handle
variadics in device code. That's been solved for now, so we can just
make an internal printf handler that forwards to the external `vprintf`
function. This is either provided by NVIDIA's SDK or by the GPU libc
implementation.

The main reason for doing this is because it prevents the stupid AMDGPU
printf pass from mangling our beautiful printfs!</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Fix mispelled attribute and warning</title>
<updated>2025-01-20T14:40:19+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-20T14:39:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=723a3e746ab7f130d448343e6a7b61e146954b60'/>
<id>723a3e746ab7f130d448343e6a7b61e146954b60</id>
<content type='text'>
Summary:
This is spelled `ompx_aligned_barrier` when used directly, but wasn't
included in the list of known assumptions. Fix that so now th test
works.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
This is spelled `ompx_aligned_barrier` when used directly, but wasn't
included in the list of known assumptions. Fix that so now th test
works.
</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Remove 'omp assumes' scopes now that we have no inline ASM (#123611)</title>
<updated>2025-01-20T14:11:06+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-20T14:11:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=58af82b4623c1871a78a53ef86f64d4891dcc2da'/>
<id>58af82b4623c1871a78a53ef86f64d4891dcc2da</id>
<content type='text'>
Summary:
We used this globally scoped `ext_no_call_asm` as a sort of hack around
the compiler that allowed the attributor to optimize out inline assembly
calls to PTX instructions. Quite some time ago I got rid of every inline
assembly call and replaced it with a builitin, so this can just be
deleted.

Furthermore, I use the `[[omp::assume]]` attribute directly for the
aligned barrier usage. This prints an unknown assumption warning (even
though it isn't) so I'm just silencing that for now until I fix it
later.

---------

Co-authored-by: Michael Kruse &lt;github@meinersbur.de&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
We used this globally scoped `ext_no_call_asm` as a sort of hack around
the compiler that allowed the attributor to optimize out inline assembly
calls to PTX instructions. Quite some time ago I got rid of every inline
assembly call and replaced it with a builitin, so this can just be
deleted.

Furthermore, I use the `[[omp::assume]]` attribute directly for the
aligned barrier usage. This prints an unknown assumption warning (even
though it isn't) so I'm just silencing that for now until I fix it
later.

---------

Co-authored-by: Michael Kruse &lt;github@meinersbur.de&gt;</pre>
</div>
</content>
</entry>
</feed>
