| Age | Commit message (Collapse) | Author |
|
For swift async code, we need to use a debug intrinsic that behaves like
an llvm.dbg.declare but can take any location type rather than just a
pointer or integer.
To solve this, a new debug instrinsic called llvm.dbg.declare_value has
been created, which behaves exactly like an llvm.dbg.declare but can
take non pointer and integer location types.
More information here:
https://discourse.llvm.org/t/rfc-introduce-new-llvm-dbg-coroframe-entry-intrinsic/88269
This is the first patch as part of a stack of patches, with the one
succeeding it being: https://github.com/llvm/llvm-project/pull/168134
|
|
(#166170)
…nd update docs
|
|
Fixes: https://github.com/llvm/llvm-project/issues/167132
|
|
GlobalISel now selects hadd family of intrinsics, without falling back
to SDAG.
|
|
This commit adds the below fence intrinsics:
- llvm.nvvm.fence.acquire.sync_restrict.space.cluster.scope.cluster
- llvm.nvvm.fence.release.sync_restrict.space.cta.scope.cluster
- llvm.nvvm.fence.mbarrier_init.release.cluster
-
llvm.nvvm.fence.proxy.async.generic.acquire.sync_restrict.space.cluster.scope.cluster
-
llvm.nvvm.fence.proxy.async.generic.release.sync_restrict.space.cta.scope.cluster
llvm.nvvm.fence.proxy.alias
- llvm.nvvm.fence.proxy.async
- llvm.nvvm.fence.proxy.async.global
- llvm.nvvm.fence.proxy.async.shared_cluster
- llvm.nvvm.fence.proxy.async.shared_cta
For more information, please refere the [PTX
ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)
|
|
This change drops the use of the "Layout" type and instead uses explicit
padding throughout the compiler to represent types in HLSL buffers.
There are a few parts to this, though it's difficult to split them up as
they're very interdependent:
1. Refactor HLSLBufferLayoutBuilder to allow us to calculate the padding
of arbitrary types.
2. Teach Clang CodeGen to use HLSL specific paths for cbuffers when
generating aggregate copies, array accesses, and structure accesses.
3. Simplify DXILCBufferAccesses such that it directly replaces accesses
with dx.resource.getpointer rather than recalculating the layout.
4. Basic infrastructure for SPIR-V handling, but the implementation
itself will need work in follow ups.
Fixes several issues, including #138996, #144573, and #156084.
Resolves #147352.
|
|
deduced (#164440)
Previously, the handling of the `cleanup` attribute had some checks
based on the type, but we were deducing the type after handling the
attribute.
This PR fixes the way the are dealing with type checks for the `cleanup`
attribute by delaying these checks after we are deducing the type.
It is also fixed in a way that the solution can be adapted for other
attributes that does some type based checks.
This is the list of C/C++ attributes that are doing type based checks
and will need to be fixed in additional PRs:
- CUDAShared
- MutualExclusions
- PassObjectSize
- InitPriority
- Sentinel
- AcquireCapability
- RequiresCapability
- LocksExcluded
- AcquireHandle
NB: Some attributes could have been missed in my shallow search.
Fixes #129631
|
|
This patch introduces preliminary support for additional memory
locations.
They are: target_mem0 and target_mem1 and they model memory locations
that cannot be represented with existing memory locations.
It was a solution suggested in :
https://discourse.llvm.org/t/rfc-improving-fpmr-handling-for-fp8-intrinsics-in-llvm/86868/6
Currently, these locations are not yet target-specific. The goal is to
enable the compiler to express read/write effects on these resources.
|
|
Add documentation about CMAKE_OSX_SYSROOT so that folks bringing up on
OSX can have a clean test run.
|
|
LDS block size should be 2048 bytes (512 dwords) based on current spec.
|
|
This patch is limited to hyphenation to ease the review process.
|
|
(#168226)
The CIBestPractices.rst document uses `releases/*` as the branch name
filter for push events. The actual release branch names match the
pattern `release/*`.
|
|
This patch is limited to single-word replacements to fix spelling
and/or grammar to ease the review process. Punctuation and markdown
fixes are specifically excluded.
|
|
|
|
This also improves the error message to be more clear for folks who
haven't used a lot of rst.
|
|
This was previously under the ELF specific options section, but is
actually only supported for Mach-O
|
|
Reduces memory usage compiling backend sources, most notably for
AMDGPU by ~98 MB per source on average.
AMDGPUGenRegisterInfo.inc is tens of megabytes in size now, and
is even larger downstream. At the same time, it is included in
nearly all backend sources, typically just for a small portion of
its content, resulting in compilation being unnecessarily
memory-hungry, which in turn stresses buildbots and wastes their
resources.
Splitting .inc files also helps avoiding extra ccache misses
where changes in .td files don't cause changes in all parts of
what previously was a single .inc file.
It is thought that rather than building on top of the current
single-output-file design of TableGen, e.g., using `split-file`,
it would be more preferable to recognise the need for multi-file
outputs and give it a proper first-class support directly in
TableGen.
|
|
The Transactional Memory Extension (TME) was introduced as part of
Armv9-A but has not been adopted by the ecosystem. This mirrors what
Arm has observed with similar extensions in other architectures.
Therefore, remove FEAT_TME assembly and ACLE code from llvm, because
support for TME has now been officially withdrawn, as noted here:
```
FEAT_TME is withdrawn from all future versions of Arm®
Architecture Reference Manual for A-profile architecture.
```
referenced in Known Issue D24093, documented here:
https://developer.arm.com/documentation/102105/lb-05/
|
|
(#164912)
Add assembly/disassembly support for AArch64 `FEAT_S1POE2` (Stage 1
Permission Overlay Extension 2), as blogged about here:
* https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/future-architecture-technologies-poe2-and-vmte
and as documented here:
* https://developer.arm.com/documentation/109697/2025_09/Future-Architecture-Technologies
Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
|
|
(#166625)
GitHub's Update Branch button is a helpful tool for quickly updating a
PR before merging, but it might also be important to point out that it
creates a merge commit without additional prompting, which may or may
not be desired behavior for a given LLVM contributor.
Opened on the suggestion of @lamb-j
|
|
This patch adds a Clang-compatible --save-stats option to opt, to
provide an easy to use way to save LLVM statistics files when working
with opt on the middle end.
This is a follow up on the addition to `llc`:
https://github.com/llvm/llvm-project/pull/163967
Like on Clang, one can specify --save-stats, --save-stats=cwd, and
--save-stats=obj with the same semantics and JSON format. The
pre-existing --stats option is not affected.
The implementation extracts the flag and its methods into the common
`CodeGen/CommandFlags` as `LLVM_ABI`, using a new registration class to
conservatively enable opt-in rather than let all tools take it. Its only
needed for llc and opt for now. Then it refactors llc and adds support
for opt.
|
|
This patch adds a TMA intrinsic for Global to
shared::cta copy, which was introduced with ptx86.
Also remove the NoCapture<> annotation from the
pointer arguments to these intrinsics, since the
copy operations are asynchronous in nature.
lit tests are verified with a ptxas from cuda-12.8.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
|
|
(#167553)
This commit adds documentation clarifying the meaning of `align` on ptr
addrpsace(7) (buffer fat pointer) and ptr addrspace(9) (bufferef
structured pointer) operations (specifying that both the base and the
offset need to be aligned) and documents the meaning of the `align`
attribute when used as an argument on *.buffer.ptr.* intrinsics.
|
|
Add the following `FEAT_MOPS_GO` instructions:
* `SETGOP`, `SETGOM`, `SETGOE`
* `SETGOPN`, `SETGOMN`, `SETGOEN`
* `SETGOPT`, `SETGOMT`, `SETGOET`
* `SETGOPTN`, `SETGOMTN`, `SETGOETN`
as blogged about here:
*
https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/future-architecture-technologies-poe2-and-vmte
and as documented here:
*
https://developer.arm.com/documentation/109697/2025_09/Future-Architecture-Technologies
|
|
|
|
A new InstCombine transform uses this attribute to rewrite calls to a
modular version of the implementation along with llvm.reloc.none
relocations against aspects of the implementation needed by the call.
This change only adds support for the 'float' aspect, but it also builds
the structure needed for others.
See issue #146159
|
|
|
|
spec:
https://github.com/riscv/riscv-isa-manual/blob/smpmpmt/src/smpmpmt.adoc
Co-Authored-by: Jesse Huang <jesse.huang@sifive.com>
|
|
extension (#166257)
This enables support for atomic RMW ops (add, sub, min and max to be
precise) with `bfloat16` operands, via the [SPV_INTEL_16bit_atomics
extension](https://github.com/intel/llvm/pull/20009). It's logically a
successor to #166031 (I should've used a stack), but I'm putting it up
for early review.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
This patch adds a Clang-compatible `--save-stats` option, to provide an
easy to use way to save LLVM statistics files when working with llc on
the backend.
Like on Clang, one can specify `--save-stats`, `--save-stats=cwd`, and
`--save-stats=obj` with the same semantics and JSON format.
The implementation uses 2 methods `MaybeEnableStats` and
`MaybeSaveStats` called before and after `compileModule` respectively
that externally own the statistics related logic, while `compileModule`
is now required to return the resolved output filename via an output
param.
Note: like on Clang, the pre-existing `--stats` option is not affected.
|
|
This patch is limited to single-word replacements to fix spelling
and/or grammar to ease the review process. Punctuation and markdown
fixes are specifically excluded.
|
|
|
|
With this intrinsic, and supporting SelectionDAG nodes, we can better
make use of instructions such as AArch64's `FDOT`.
|
|
This declares PR #147427.
|
|
This intrinsic emits a BFD_RELOC_NONE relocation at the point of call,
which allows optimizations and languages to explicitly pull in symbols
from static libraries without there being any code or data that has an
effectual relocation against such a symbol.
See issue #146159 for context.
|
|
|
|
--Added support for the extension SPV_ALTERA_blocking_pipes
--Added test files for the extension SPV_ALTERA_blocking_pipes
|
|
|
|
Address spaces 10 and 11 are reserved for future use in the sense that
we plain to upstream their use.
Address space 12 is used by LLPC. It is used in a workaround for an
issue with SMEM accesses to PRT buffers that is specific to the LLPC
ecosystem and makes no sense to upstream.
|
|
Enable the `SPV_INTEL_bfloat16_arithmetic` extension, which allows arithmetic, relational and `OpExtInst` instructions to take `bfloat16` arguments. This patch only adds support to arithmetic and relational ops. The extension itself is rather fresh, but `bfloat16` is ubiquitous at this point and not supporting these ops is limiting.
|
|
|
|
`SPV_INTEL_kernel_attributes` (#165891)
This adds BE support for the
[`SPV_INTEL_kernel_attributes`](https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_kernel_attributes.html)
extension. The extension is necessary to encode the rather useful
`max_work_group_size` kernel attribute, via `OpExecutionMode
MaxWorkgroupSizeINTEL`, which is the only Execution Mode added by the
extension that this patch adds full processing for. Future patches will
add the other Execution Modes and Capabilities. The test is adapted from
the equivalent Translator test; it depends on #165815.
|
|
Also add a corresponding intrinsic property that can be used to mark
intrinsics that do not introduce poison, for example simple arithmetic
intrinsics that propagate poison just like a simple arithmetic
instruction.
As a smoke test this patch adds the new property to
llvm.amdgcn.fmul.legacy.
|
|
This patch adds a new option `--child-tags` (`-t` for short), which
makes dwarfdump only dump children whose DWARF tag is in the list of
tags specified by the user.
Motivating examples are:
* dumping all global variables in a CU
* dumping all non-static data members of a structure
* dumping all module import declarations of a CU
* etc.
For tags not known to dwarfdump, we pretend that the tag wasn't
specified.
Note, this flag only takes effect when `--show-children` is set (either
explicitly or implicitly). We error out when trying to use the flag
without dumping children.
Example:
```
$ builds/release/bin/llvm-dwarfdump -t DW_TAG_structure_type a.out.dSYM
...
0x0000000c: DW_TAG_compile_unit
DW_AT_producer ("clang version 22.0.0git (git@github.com:Michael137/llvm-project.git 737da3347c2fb01dd403420cf83e9b8fbea32618)")
DW_AT_language (DW_LANG_C11)
...
0x0000002a: DW_TAG_structure_type
DW_AT_APPLE_block (true)
DW_AT_byte_size (0x20)
0x00000067: DW_TAG_structure_type
DW_AT_APPLE_block (true)
DW_AT_name ("__block_descriptor")
DW_AT_byte_size (0x10)
...
```
```
$ builds/release/bin/llvm-dwarfdump -t DW_TAG_structure_type -t DW_TAG_member a.out.dSYM
...
0x0000000c: DW_TAG_compile_unit
DW_AT_producer ("clang version 22.0.0git (git@github.com:Michael137/llvm-project.git 737da3347c2fb01dd403420cf83e9b8fbea32618)")
DW_AT_language (DW_LANG_C11)
DW_AT_name ("macro.c")
...
0x0000002a: DW_TAG_structure_type
DW_AT_APPLE_block (true)
DW_AT_byte_size (0x20)
0x0000002c: DW_TAG_member
DW_AT_name ("__isa")
DW_AT_type (0x00000051 "void *")
DW_AT_data_member_location (0x00)
0x00000033: DW_TAG_member
DW_AT_name ("__flags")
DW_AT_type (0x00000052 "int")
DW_AT_data_member_location (0x08)
0x0000003a: DW_TAG_member
DW_AT_name ("__reserved")
DW_AT_type (0x00000052 "int")
DW_AT_data_member_location (0x0c)
0x00000041: DW_TAG_member
DW_AT_name ("__FuncPtr")
DW_AT_type (0x00000056 "void (*)(int)")
DW_AT_data_member_location (0x10)
0x00000048: DW_TAG_member
DW_AT_name ("__descriptor")
DW_AT_type (0x00000062 "__block_descriptor *")
DW_AT_alignment (8)
DW_AT_data_member_location (0x18)
0x00000067: DW_TAG_structure_type
DW_AT_APPLE_block (true)
DW_AT_name ("__block_descriptor")
DW_AT_byte_size (0x10)
0x0000006a: DW_TAG_member
DW_AT_name ("reserved")
DW_AT_type (0x00000079 "unsigned long")
DW_AT_data_member_location (0x00)
0x00000071: DW_TAG_member
DW_AT_name ("Size")
DW_AT_type (0x00000079 "unsigned long")
DW_AT_data_member_location (0x08)
...
```
|
|
They were introduced in #164349
|
|
The default behavior is to _not_ copy such swiftmodules into the dSYM,
as perviously implemented in 96f95c9d89d8a1784d3865fa941fb1c510f4e2d7.
This patch adds the option to override the behavior, so that such
swiftmodules can be copied into the dSYM.
This is useful when the dSYM will be used on a machine which has a
different Xcode/SDK than where the swiftmodules were built. Without
this, when LLDB is asked to "p/po" a Swift variable, the underlying
Swift compiler code would rebuild the dependent `.swiftmodule` files of
the Swift stdlibs, which takes ~1 minute in some cases.
See PR for tests.
|
|
|
|
|
|
This is a follow-up for https://github.com/llvm/llvm-project/pull/103397
|
|
paths (#103397)
If any of the printed paths by llvm-config contain quotes, spaces,
backslashes or dollar sign characters, these paths will be quoted and
escaped, but only if using `--quote-paths`. The previous behavior is
retained for compatibility and `--quote-paths` is there to acknowledge
the migration to the new behavior.
Following discussion in #76304
Fixes #28117
Superseeds https://github.com/llvm/llvm-project/pull/97305
I could also do what @tothambrus11 suggests in
https://github.com/llvm/llvm-project/pull/97305#issuecomment-2282847990
but that makes all Windows paths quoted & escaped since they all contain
backslashes.
|