llvm-project.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2025-05-28	[libc] Implement efficient 'malloc' on the GPU (#140156)	Joseph Huber
	Summary: This is the big patch that implements an efficient device-side `malloc` on the GPU. This is the first pass and many improvements will be made later. The scheme revolves around using a global reference counted pointer to hand out access to a dynamically created and destroyed slab interface. The slab is simply a large bitfield with one bit for each slab. All allocations are the same size in a slab, so different sized allocations are done through different slabs. Allocation is thus searching for or creating a slab for the desired slab, reserving space, and then searching for a free bit. Freeing is clearing the bit and then releasing the space. This interface allows memory to dynamically grow and shrink. Future patches will have different modes to allow fast first-time-use as well as a non-RPC version.
2023-09-26	[libc] Mass replace enclosing namespace (#67032)	Guillaume Chatelet
	This is step 4 of https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
2023-06-05	[libc] Implement basic `malloc` and `free` support on the GPU	Joseph Huber
	This patch adds support for the `malloc` and `free` functions. These currently aren't implemented in-tree so we first add the interface filies. This patch provides the most basic support for a true `malloc` and `free` by using the RPC interface. This is functional, but in the future we will want to implement a more intelligent system and primarily use the RPC interface more as a `brk()` or `sbrk()` interface only called when absolutely necessary. We will need to design an intelligent allocator in the future. The semantics of these memory allocations will need to be checked. I am somewhat iffy on the details. I've heard that HSA can allocate asynchronously which seems to work with my tests at least. CUDA uses an implicit synchronization scheme so we need to use an explicitly separate stream from the one launching the kernel or the default stream. I will need to test the NVPTX case. I would appreciate if anyone more experienced with the implementation details here could chime in for the HSA and CUDA cases. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D151735