summaryrefslogtreecommitdiff
path: root/libc/src/stdlib/gpu
AgeCommit message (Collapse)Author
2025-07-02[libc] Efficiently implement `aligned_alloc` for AMDGPU (#146585)Joseph Huber
Summary: This patch uses the actual allocator interface to implement `aligned_alloc`. We do this by simply rounding up the amount allocated. Because of how index calculation works, any offset within an allocated pointer will still map to the same chunk, so we can just adjust internally and it will free all the same.
2025-06-30[libc] Efficiently implement 'realloc' for AMDGPU devices (#145960)Joseph Huber
Summary: Now that we have `malloc` we can implement `realloc` efficiently. This uses the known chunk sizes to avoid unnecessary allocations. We just return nullptr for NVPTX. I'd remove the list for the entrypoint but then the libc++ code would stop working. When someone writes the NVPTX support this will be trivial.
2025-01-24[libc] Use the NVIDIA device allocator for GPU malloc (#124277)Joseph Huber
Summary: This is a blocker on another patch in the OpenMP runtime. The problem is that NVIDIA truly doesn't handle RPC-based allocations very well. It cannot reliably update the MMU while a kernel is running and it will usually deadlock if called from a separate thread due to internal use of TLS. This patch just removes the definition of `malloc` and `free` for NVPTX. The result here is that they will be undefined, which is the cue for the `nvlink` linker to define them for us. So, as far as `libc` is concerned it still implements malloc.
2024-12-02[libc][NFC] Rename RPC opcodes to better reflect their usageJoseph Huber
Summary: RPC_ is a generic prefix here, use LIBC_ to indicate that these are opcodes used to implement the C library
2024-11-19[libc] Replace usage of GPU helpers with ones from 'gpuintrin.h' (#116454)Joseph Huber
Summary: These are provided by a resource header now, cut these from the dependencies and only provide the ones we use for RPC.
2024-10-15[libc] Remove dependency on `cpp::function` in `rpc.h` (#112422)Joseph Huber
Summary: I'm going to attempt to move the `rpc.h` header to a separate folder that we can install and include outside of `libc`. Before doing this I'm going to try to trim up the file so there's not as many things I need to copy to make it work. This dependency on `cpp::functional` is a low hanging fruit. I only did it so that I could overload the argument of the work function so that passing the id was optional in the lambda, that's not a *huge* deal and it makes it more explicit I suppose.
2024-09-23[libc] Add GPU support for the 'system' function (#109687)Joseph Huber
Summary: This function can easily be implemented by forwarding it to the host process. This shows up in a few places that we might want to test the GPU so it should be provided. Also, I find the idea of the GPU offloading work to the CPU via `system` very funny.
2024-07-30[libc] Implement placeholder memory functions on the GPU (#101082)Joseph Huber
Summary: These functions are needed for `libc++` to link successfully. We can't implement them well currently, so simply provide some stand-in implementations. `realloc` will currently copy garbage and potentially fault and `aligned_alloc` will work unless your alignment is more than 4K alignment. However, these should work in practice to get tests running. I will write a real allocator soon™.
2024-07-12[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)Petr Hosek
This is a part of #97655.
2024-07-12Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace ↵Mehdi Amini
declaration" (#98593) Reverts llvm/llvm-project#98075 bots are broken
2024-07-11[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075)Petr Hosek
This is a part of #97655.
2024-03-10[libc][NFC] Move GPU allocator implementation to common header (#84690)Joseph Huber
Summary: This is a NFC move preceding more radical functional changes to the allocator implementation. We just move it to a common utility so it will be easier to write these in tandem.
2023-09-26[libc] Mass replace enclosing namespace (#67032)Guillaume Chatelet
This is step 4 of https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
2023-09-12[libc][NFC] Factor GPU exiting into a common function (#66093)Joseph Huber
Summary: We currently call the GPU routine to terminate the current thread in three separate locations .This should be wrapped into a helper function to simplify the implementation.
2023-08-31[libc] Implement the 'abort' function on the GPUJoseph Huber
This function implements the `abort` function on the GPU. The implementation here closely mirros the `exit` call where we first synchornize with the RPC server to make sure it's listening and then we exit on the GPU. I was unsure if this should be a simple `__builtin_assert` on the GPU. I elected to go with an RPC approach to make this a more "true" `abort` call. That is, it should invoke some signal handlers and exit with the proper code according to the implemented C library on the server. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159210
2023-06-15[libc] Export GPU extensions to `libc` for external useJoseph Huber
The GPU port of the LLVM C library needs to export a few extensions to the interface such that users can interface with it. This patch adds the necessary logic to define a GPU extension. Currently, this only exports a `rpc_reset_client` function. This allows us to use the server in D147054 to set up the RPC interface outside of `libc`. Depends on https://reviews.llvm.org/D147054 Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152283
2023-06-05[libc] Implement basic `malloc` and `free` support on the GPUJoseph Huber
This patch adds support for the `malloc` and `free` functions. These currently aren't implemented in-tree so we first add the interface filies. This patch provides the most basic support for a true `malloc` and `free` by using the RPC interface. This is functional, but in the future we will want to implement a more intelligent system and primarily use the RPC interface more as a `brk()` or `sbrk()` interface only called when absolutely necessary. We will need to design an intelligent allocator in the future. The semantics of these memory allocations will need to be checked. I am somewhat iffy on the details. I've heard that HSA can allocate asynchronously which seems to work with my tests at least. CUDA uses an implicit synchronization scheme so we need to use an explicitly separate stream from the one launching the kernel or the default stream. I will need to test the NVPTX case. I would appreciate if anyone more experienced with the implementation details here could chime in for the HSA and CUDA cases. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D151735