llvm-project.git/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp, branch users/mingmingl-llvm/samplefdo-profile-format

[NVPTX] Support i256 load/store with 256-bit vector load (#155198)

2025-08-28T19:29:05+00:00

[NVPTX] Fixup NVPTXPrologEpilogPass for opt-bisect-limit (#144136)

2025-06-27T15:31:08+00:00

Currently, the NVPTXPrologEpilogPass will crash if LIFETIME_START or
LIFETIME_END instructions are encountered. Usually this isn't a problem
since a couple earlier passes will always remove them. However, when
using opt-bisect-limit crashes can occur. This can hinder debugging and
reveals a potential future problem if these optimization passes change
their behavior. https://cuda.godbolt.org/z/E81xxKGdb

This change updates NVPTXPrologEpilogPass and
NVPTXRegisterInfo::eliminateFrameIndex to gracefully handle these
instructions by simply removing them. While I'm here I also did some
general fixup in NVPTXPrologEpilogPass to make it look more like
PrologEpilogInserter (from which it was copied).

[llvm] annotate interfaces in llvm/Target for DLL export (#143615)

2025-06-17T20:28:45+00:00

## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Target` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.

## Background

This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

A sub-set of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The bulk of this change is manual additions of `LLVM_ABI` to
`LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target.
Adding `LLVM_ABI` to the function implementation is required here
because they do not `#include "llvm/Support/TargetSelect.h"`, which
contains the declarations for this functions and was already updated
with `LLVM_ABI` in a previous patch. I considered patching these files
with `#include "llvm/Support/TargetSelect.h"` instead, but since
TargetSelect.h is a large file with a bunch of preprocessor x-macro
stuff in it I was concerned it would unnecessarily impact compile times.

In addition, a number of unit tests under llvm/unittests/Target required
additional dependencies to make them build correctly against the LLVM
DLL on Windows using MSVC.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang

[AA] Move Target Specific AA before BasicAA (#125965)

2025-05-07T22:25:48+00:00

In this change, NVPTX AA is moved before Basic AA to potentially improve
compile time. Additionally, it introduces a flag in the
`ExternalAAWrapper` that allows other backends to run their
target-specific AA passes before Basic AA, if desired.

The change works for both New Pass Manager and Legacy Pass Manager.

Original implementation by Princeton Ferro

Register assembly printer passes (#138348)

2025-05-07T01:01:17+00:00

Register assembly printer passes in the pass registry.

This makes it possible to use `llc -start-before=-asm-printer ...` in tests.

Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow
targets to use the `INITIALIZE_PASS` macros and register the pass in the
pass registry. This currently has a default parameter so it won't break
any targets that have not been updated.

[NVPTX] Pull invariant load identification into IR pass (#138015)

2025-05-01T17:19:46+00:00

Pull invariant load identification, which was previously part of
DAGToDAG ISel, into a new IR pass NVPTXTagInvariantLoads. This makes it
possible to disable this optimization at O0 and reduces the complexity
of the SelectionDAG pass. Moving this logic to an IR pass also allows
for implementing a more powerful traversal in the future.

Fixes https://github.com/llvm/llvm-project/issues/138138

[TTI] Simplify implementation (NFCI) (#136674)

2025-04-26T12:25:40+00:00

Replace "concept based polymorphism" with simpler PImpl idiom.

This pursues two goals:
* Enforce static type checking. Previously, target implementations hid
base class methods and type checking was impossible. Now that they
override the methods, the compiler will complain on mismatched
signatures.
* Make the code easier to navigate. Previously, if you asked your
favorite LSP server to show a method (e.g. `getInstructionCost()`), it
would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`,
`TTIImplBase`, and target overrides. Now it is two less :)

There are three commits to hopefully simplify the review.

The first commit removes `TTI::Model`. This is done by deriving
`TargetTransformInfoImplBase` from `TTI::Concept`. This is possible
because they implement the same set of interfaces with identical
signatures.

The first commit makes `TargetTransformImplBase` polymorphic, which
means all derived classes should `override` its methods. This is done in
second commit to make the first one smaller. It appeared infeasible to
extract this into a separate PR because the first commit landed
separately would result in tons of `-Woverloaded-virtual` warnings (and
break `-Werror` builds).

The third commit eliminates `TTI::Concept` by merging it with the only
derived class `TargetTransformImplBase`. This commit could be extracted
into a separate PR, but it touches the same lines in
`TargetTransformInfoImpl.h` (removes `override` added by the second
commit and adds `virtual`), so I thought it may make sense to land these
two commits together.

Pull Request: https://github.com/llvm/llvm-project/pull/136674

[NVPTX] Add support for Shared Cluster Memory address space [1/2] (#135444)

2025-04-22T22:14:39+00:00

Adds support for new Shared Cluster Memory Address Space
(SHARED_CLUSTER, addrspace 7). See
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#distributed-shared-memory
for details.

1. Update address space structures and datalayout to contain the new
space
2. Add new intrinsics that use this new address space
3. Update NVPTX alias analysis

The existing intrinsics are updated in
https://github.com/llvm/llvm-project/pull/136768

[NVPTX] Improve NVVMReflect Efficiency (#134416)

2025-04-11T01:33:37+00:00

The NVVMReflect pass simply replaces calls to nvvm-reflect functions
with the appropriate constant, either the architecture number, or
nvvm-reflect-ftz, found in the module's metadata.

The implementation is inefficient and does this by traversing through
all instructions to find calls. The common case is that you never call
nvvm-reflect, so this traversal is costly.

This PR:
- Updates the pass so that it finds the reflect functions by name, and
then traverses through their uses to find the calls directly.
- Adds a line (245) to make sure the dead nvvm-reflect definitions are
erased.
- Adds the ability to set reflect values via command line

[NFC][LLVM][NVPTX] Cleanup pass initialization for NVPTX (#134311)

2025-04-07T15:54:44+00:00

- Move all pass initialization function calls to NVPTX target
initialization and out of individual pass constructors.
- Move all pass initialization function declaration to NVPTX.h.
- https://github.com/llvm/llvm-project/issues/111767