llvm-project.git/llvm/lib/IR/DataLayout.cpp, branch main

[DataLayout][LangRef] Split non-integral and unstable pointer properties

2025-09-23T18:16:47+00:00

This commit adds finer-grained versions of isNonIntegralAddressSpace() and
isNonIntegralPointerType() where the current semantics prohibit
introduction of both ptrtoint and inttoptr instructions. The current
semantics are too strict for some targets (e.g. AMDGPU/CHERI) where
ptrtoint has a stable value, but the pointer has additional metadata.
Currently, marking a pointer address space as non-integral also marks it
as having an unstable bitwise representation (e.g. when pointers can be
changed by a copying GC). This property inhibits a lot of
optimizations that are perfectly legal for other non-integral pointers
such as fat pointers or CHERI capabilities that have a well-defined
bitwise representation but can't be created with only an address.

This change splits the properties of non-integral pointers and allows
for address spaces to be marked as unstable or non-integral (or both)
independently using the 'p' part of the DataLayout string.
A 'u' following the p marks the address space as unstable and specifying
a index width != representation width marks it as non-integral.
Finally, we also add an 'e' flag to mark pointers with external state
(such as the CHERI capability validity) state. These pointers require
special handling of loads and stores in addition to being non-integral.

This does not change the checks in any of the passes yet - we
currently keep the existing non-integral behaviour. In the future I plan
to audit calls to DL.isNonIntegral[PointerType]() and replace them with
the DL.mustNotIntroduce{IntToPtr,PtrToInt}() checks that allow for more
optimizations.

RFC: https://discourse.llvm.org/t/rfc-finer-grained-non-integral-pointer-properties/83176

Reviewed By: nikic, krzysz00

Pull Request: https://github.com/llvm/llvm-project/pull/105735

[llvm] Move data layout string computation to TargetParser (#157612)

2025-09-11T18:05:29+00:00

Clang and other frontends generally need the LLVM data layout string in
order to generate LLVM IR modules for LLVM. MLIR clients often need it
as well, since MLIR users often lower to LLVM IR.

Before this change, the LLVM datalayout string was computed in the
LLVM${TGT}CodeGen library in the relevant TargetMachine subclass.
However, none of the logic for computing the data layout string requires
any details of code generation. Clients who want to avoid duplicating
this information were forced to link in LLVMCodeGen and all registered
targets, leading to bloated binaries. This happened in PR #145899,
which measurably increased binary size for some of our users.

By moving this information to the TargetParser library, we
can delete the duplicate datalayout strings in Clang, and retain the
ability to generate IR for unregistered targets.

This is intended to be a very mechanical LLVM-only change, but there is
an immediately obvious follow-up to clang, which will be prepared
separately.

The vast majority of data layouts are computable with two inputs: the
triple and the "ABI name". There is only one exception, NVPTX, which has
a cl::opt to enable short device pointers. I invented a "shortptr" ABI
name to pass this option through the target independent interface.
Everything else fits. Mips is a bit awkward because it uses a special
MipsABIInfo abstraction, which includes members with codegen-like
concepts like ABI physical registers that can't live in TargetParser. I
think the string logic of looking for "n32" "n64" etc is reasonable to
duplicate. We have plenty of other minor duplication to preserve
layering.

---------

Co-authored-by: Matt Arsenault 
Co-authored-by: Sergei Barannikov

[DataLayout] Remove i1 alignment entry (#156657)

2025-09-08T07:51:16+00:00

I don't think we need to explicitly specify i1 alignment, as this is
going to fall back to i8 alignment.

This may change behavior if a data layout explicitly sets i8 alignment
without also setting i1 layout, but I'd expect this to be a bug fix in
that case.

[DataLayout] Specialize the getTypeAllocSize() implementation (#156687)

2025-09-04T07:27:33+00:00

getTypeAllocSize() currently works by taking the type store size and
aligning it to the ABI alignment. However, this ends up doing redundant
work in various cases, for example arrays will unnecessarily repeat the
alignment step, and structs will fetch the StructLayout multiple times.

As this code is rather hot (it is called every time we need to calculate
GEP offsets for example), specialize the implementation. This repeats a
small amount of logic from getAlignment(), but I think that's
worthwhile.

[DataLayout] Use linear scan to determine integer alignment (NFC)

2025-09-03T12:04:54+00:00

The number of alignment entries is usually very small (5-7), so
it is more efficient to use a linear scan than a binary search.

[DataLayout] Explicitly call getFixedValue() (NFC)

2025-09-02T07:52:16+00:00

Instead of relying on the implicit cast. The scalable case has
been explicitly checked beforehand.

[IR] Use llvm::upper_bound (NFC) (#139656)

2025-05-13T05:59:05+00:00

[NFC][llvm] Drop isOsWindowsOrUEFI API (#138733)

2025-05-06T22:41:35+00:00

The Triple and SubTarget API functions isOsWindowsOrUEFI is not
preferred. Dropping them.

[nfc][llvm] Clean up isUEFI checks (#124845)

2025-01-28T23:18:10+00:00

The check for `isOSWindows() || isUEFI()` is used in several places
across the codebase. Introducing `isOSWindowsOrUEFI()` in Triple.h
to simplify these checks.

[DataLayout] Remove getMaxIndexSizeInBits() API

2024-12-13T12:01:01+00:00

The last use was removed in #119365, and we should not add more
uses of this concept in the future either.