<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/offload/DeviceRTL/include/Synchronization.h, branch users/nico/python-2</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[OpenMP] Add pre sm_70 load hack back in (#138589)</title>
<updated>2025-05-05T21:33:41+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-05-05T21:33:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=dfcb8cb2a92c9f72ddde5ea08dadf2f640197d32'/>
<id>dfcb8cb2a92c9f72ddde5ea08dadf2f640197d32</id>
<content type='text'>
Summary:
Different ordering modes aren't supported for an atomic load, so we just
do an add of zero as the same thing. It's less efficient, but it works.

Fixes https://github.com/llvm/llvm-project/issues/138560</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
Different ordering modes aren't supported for an atomic load, so we just
do an add of zero as the same thing. It's less efficient, but it works.

Fixes https://github.com/llvm/llvm-project/issues/138560</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Port the OpenMP device runtime to direct C++ compilation (#123673)</title>
<updated>2025-02-05T14:18:52+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-02-05T14:18:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=bb7ab2557c485e004e619570cca7e2204b98a71b'/>
<id>bb7ab2557c485e004e619570cca7e2204b98a71b</id>
<content type='text'>
Summary:
This removes the use of OpenMP offloading to build the device runtime.
The main benefit here is that we no longer need to rely on offloading
semantics to build a device only runtime. Things like variants are now
no longer needed and can just be simple if-defs. In the future, I will
remove most of the special handling here and fold it into calls to the
`&lt;gpuintrin.h&gt;` functions instead. Additionally I will rework the
compilation to make this a separate runtime.

The current plan is to have this, but make including OpenMP and
offloading either automatically add it, or print a warning if it's
missing. This will allow us to use a normal CMake workflow and delete
all the weird 'lets pull the clang binary out of the build' business.
```
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=offload
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa
```

After that, linking the OpenMP device runtime will be `-Xoffload-linker
-lomp`. I.e. no more fat binary business.

Only look at the most recent commit since this includes the two
dependencies
(fix to AMDGPUEmitPrintfBinding and the PointerToMember bug).</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
This removes the use of OpenMP offloading to build the device runtime.
The main benefit here is that we no longer need to rely on offloading
semantics to build a device only runtime. Things like variants are now
no longer needed and can just be simple if-defs. In the future, I will
remove most of the special handling here and fold it into calls to the
`&lt;gpuintrin.h&gt;` functions instead. Additionally I will rework the
compilation to make this a separate runtime.

The current plan is to have this, but make including OpenMP and
offloading either automatically add it, or print a warning if it's
missing. This will allow us to use a normal CMake workflow and delete
all the weird 'lets pull the clang binary out of the build' business.
```
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=offload
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa
```

After that, linking the OpenMP device runtime will be `-Xoffload-linker
-lomp`. I.e. no more fat binary business.

Only look at the most recent commit since this includes the two
dependencies
(fix to AMDGPUEmitPrintfBinding and the PointerToMember bug).</pre>
</div>
</content>
</entry>
<entry>
<title>[Offload][NFC] Fix typos discovered by codespell (#125119)</title>
<updated>2025-01-31T15:35:29+00:00</updated>
<author>
<name>Christian Clauss</name>
<email>cclauss@me.com</email>
</author>
<published>2025-01-31T15:35:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=1f56bb3137827d66093b66aa3a6447fdaba61783'/>
<id>1f56bb3137827d66093b66aa3a6447fdaba61783</id>
<content type='text'>
https://github.com/codespell-project/codespell

% `codespell
--ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes
--write-changes`</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
https://github.com/codespell-project/codespell

% `codespell
--ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes
--write-changes`</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Make each atomic helper take an atomic scope argument (#122786)</title>
<updated>2025-01-21T03:58:27+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-21T03:58:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=3274bf6b4282a0dafd4b5a2efa09824e5ca417d0'/>
<id>3274bf6b4282a0dafd4b5a2efa09824e5ca417d0</id>
<content type='text'>
Summary:
Right now we just default to device for each type, and mix an ad-hoc
scope with the one used by the compiler's builtins. Unify this can make
each version take the scope optionally.

For @ronlieb, this will remove the need for `add_system` in the fork as
well as the extra `cas` with system scope, just pass `system`.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
Right now we just default to device for each type, and mix an ad-hoc
scope with the one used by the compiler's builtins. Unify this can make
each version take the scope optionally.

For @ronlieb, this will remove the need for `add_system` in the fork as
well as the extra `cas` with system scope, just pass `system`.</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Fix mispelled attribute and warning</title>
<updated>2025-01-20T14:40:19+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-20T14:39:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=723a3e746ab7f130d448343e6a7b61e146954b60'/>
<id>723a3e746ab7f130d448343e6a7b61e146954b60</id>
<content type='text'>
Summary:
This is spelled `ompx_aligned_barrier` when used directly, but wasn't
included in the list of known assumptions. Fix that so now th test
works.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
This is spelled `ompx_aligned_barrier` when used directly, but wasn't
included in the list of known assumptions. Fix that so now th test
works.
</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Remove 'omp assumes' scopes now that we have no inline ASM (#123611)</title>
<updated>2025-01-20T14:11:06+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-20T14:11:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=58af82b4623c1871a78a53ef86f64d4891dcc2da'/>
<id>58af82b4623c1871a78a53ef86f64d4891dcc2da</id>
<content type='text'>
Summary:
We used this globally scoped `ext_no_call_asm` as a sort of hack around
the compiler that allowed the attributor to optimize out inline assembly
calls to PTX instructions. Quite some time ago I got rid of every inline
assembly call and replaced it with a builitin, so this can just be
deleted.

Furthermore, I use the `[[omp::assume]]` attribute directly for the
aligned barrier usage. This prints an unknown assumption warning (even
though it isn't) so I'm just silencing that for now until I fix it
later.

---------

Co-authored-by: Michael Kruse &lt;github@meinersbur.de&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
We used this globally scoped `ext_no_call_asm` as a sort of hack around
the compiler that allowed the attributor to optimize out inline assembly
calls to PTX instructions. Quite some time ago I got rid of every inline
assembly call and replaced it with a builitin, so this can just be
deleted.

Furthermore, I use the `[[omp::assume]]` attribute directly for the
aligned barrier usage. This prints an unknown assumption warning (even
though it isn't) so I'm just silencing that for now until I fix it
later.

---------

Co-authored-by: Michael Kruse &lt;github@meinersbur.de&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Remove hack around missing atomic load (#122781)</title>
<updated>2025-01-16T21:17:15+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-16T21:17:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=1c00d0d7768f959d80393012e93a53c3bad3c138'/>
<id>1c00d0d7768f959d80393012e93a53c3bad3c138</id>
<content type='text'>
Summary:
We used to do a fetch add of zero to approximate a load. This is because
the NVPTX backend didn't handle this properly. It's not an issue anymore
so simply use the proper atomic builtin.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
We used to do a fetch add of zero to approximate a load. This is because
the NVPTX backend didn't handle this properly. It's not an issue anymore
so simply use the proper atomic builtin.</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Use __builtin_bit_cast instead of UB type punning (#122325)</title>
<updated>2025-01-09T19:59:21+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-09T19:59:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=f53cb84df6b80458cb4d5ab7398a590356a3a952'/>
<id>f53cb84df6b80458cb4d5ab7398a590356a3a952</id>
<content type='text'>
Summary:
Use a normal bitcast, remove from the shared utils since it's not
available in
GCC 7.4</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
Use a normal bitcast, remove from the shared utils since it's not
available in
GCC 7.4</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Update atomic helpers to just use headers (#122185)</title>
<updated>2025-01-09T19:57:39+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2025-01-09T19:57:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=b57c0bac81fe4f5c85c6554ca574cf2d5648e0c5'/>
<id>b57c0bac81fe4f5c85c6554ca574cf2d5648e0c5</id>
<content type='text'>
Summary:
Previously we had some indirection here, this patch updates these
utilities to just be normal template functions. We use SFINAE to manage
the special case handling for floats. Also this strips address spaces so
it can be used more generally.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
Previously we had some indirection here, this patch updates these
utilities to just be normal template functions. We use SFINAE to manage
the special case handling for floats. Also this strips address spaces so
it can be used more generally.</pre>
</div>
</content>
</entry>
<entry>
<title>[OpenMP] Replace AMDGPU fences with generic scoped fences (#119619)</title>
<updated>2024-12-12T13:54:51+00:00</updated>
<author>
<name>Joseph Huber</name>
<email>huberjn@outlook.com</email>
</author>
<published>2024-12-12T13:54:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=f4ee5a673f6e593e85306cdf65493b53e62f936e'/>
<id>f4ee5a673f6e593e85306cdf65493b53e62f936e</id>
<content type='text'>
Summary:
This is simpler and more common. I would've replaced the CUDA uses and
made this the same but currently it doesn't codegen these fences fully
and just emits a full system wide barrier as a fallback.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Summary:
This is simpler and more common. I would've replaced the CUDA uses and
made this the same but currently it doesn't codegen these fences fully
and just emits a full system wide barrier as a fallback.</pre>
</div>
</content>
</entry>
</feed>
