llvm-project.git/libc/src/__support/GPU/allocator.cpp, branch users/nico/python-2

[libc] Fix internal alignment in allcoator (#146738)

2025-07-02T17:29:01+00:00

Summary:
The allocator interface is supposed to have 16 byte alignment (to keep
it consistent with the CPU allocator. We could probably drop this to 8
if desires.) But this was not enforced because the number of bytes used
for the bitfield sometimes resulted in alignment of 8 instead of 16.
Explicitly align the number of bytes to be a multiple of 16 even if
unused.

[libc] Efficiently implement `aligned_alloc` for AMDGPU (#146585)

2025-07-02T14:25:57+00:00

Summary:
This patch uses the actual allocator interface to implement
`aligned_alloc`. We do this by simply rounding up the amount allocated.
Because of how index calculation works, any offset within an allocated
pointer will still map to the same chunk, so we can just adjust
internally and it will free all the same.

[libc] Use is aligned builtin instead of ptrtoint (#146402)

2025-07-02T12:03:11+00:00

Summary:
This avoids a ptrtoint by just using the clang builtin. This is clang
specific but only clang can compile GPU code anyway so I do not bother
with a fallback.

[libc] Efficiently implement 'realloc' for AMDGPU devices (#145960)

2025-06-30T13:39:40+00:00

Summary:
Now that we have `malloc` we can implement `realloc` efficiently. This
uses the known chunk sizes to avoid unnecessary allocations. We just
return nullptr for NVPTX. I'd remove the list for the entrypoint but
then the libc++ code would stop working. When someone writes the NVPTX
support this will be trivial.

[libc] Add and use 'cpp::launder' to guard placement new (#146123)

2025-06-27T19:34:33+00:00

Summary:
In the GPU allocator we reinterpret cast from a void pointer. We know
that an actual object was constructed there according to the C++ object
model, but to make it fully standards compliant we need to 'launder' it
to forward that information to the compiler. Add this function and call
it as appropriate.

[libc] Perform bitfield zero initialization wave-parallel (#143607)

2025-06-11T23:22:05+00:00

Summary:
We need to set the bitfield memory to zero because the system does not
guarantee zeroed out memory. Even if fresh pages are zero, the system
allows re-use so we would need a `kfd` level API to skip this step.

Because we can't this patch updates the logic to perform the zero
initialization wave-parallel. This reduces the amount of time it takes
to allocate a fresh by up to a tenth.

This has the unfortunate side effect that the control flow is more
convoluted and we waste some extra registers, but it's worth it to
reduce the slab allocation latency.

[libc][NFC] Remove template from GPU allocator reference counter

2025-06-11T16:37:51+00:00

Summary:
We don't need this to be generic, precommit for
https://github.com/llvm/llvm-project/pull/143607

[libc] Coalesce bitfield access in GPU malloc (#142692)

2025-06-05T01:32:07+00:00

Summary:
This improves performance by reducing the amount of RMW operations we
need to do to a single slot. This improves repeated allocations without
much contention about ten percent.

[libc] Implement efficient 'malloc' on the GPU (#140156)

2025-05-28T13:21:43+00:00

Summary:
This is the big patch that implements an efficient device-side `malloc`
on the GPU. This is the first pass and many improvements will be made
later.

The scheme revolves around using a global reference counted pointer to
hand out access to a dynamically created and destroyed slab interface.
The slab is simply a large bitfield with one bit for each slab. All
allocations are the same size in a slab, so different sized allocations
are done through different slabs.

Allocation is thus searching for or creating a slab for the desired
slab, reserving space, and then searching for a free bit. Freeing is
clearing the bit and then releasing the space.

This interface allows memory to dynamically grow and shrink. Future
patches will have different modes to allow fast first-time-use as well
as a non-RPC version.

[libc][NFC] Rename RPC opcodes to better reflect their usage

2024-12-02T21:35:08+00:00

Summary:
RPC_ is a generic prefix here, use LIBC_ to indicate that these are
opcodes used to implement the C library