| Age | Commit message (Collapse) | Author |
|
|
|
This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.
The problem was a conflict with
https://github.com/llvm/llvm-project/commit/e55d6f5ea2656bf842973d8bee86c3ace31bc865
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).
...if only we had a merge queue.
|
|
(#108054)"" (#108341)
Reverts llvm/llvm-project#108173
si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.
|
|
This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.
The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.
|
|
Breaks bots, see #105822.
Reverts llvm/llvm-project#105822
|
|
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.
As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):
```
entry:
%func_exec = call i1 @llvm.amdgcn.init.whole.wave()
br i1 %func_exec, label %func, label %tail
func:
; ... stuff that should run with the actual EXEC mask
br label %tail
tail:
; ... stuff that runs with all the lanes enabled;
; can contain more than one basic block
```
It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).
The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).
|
|
"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a
function uses dynamic LDS.
hidden argument will be added in below cases:
- LDS global is used in the kernel.
- Kernel calls a function which uses LDS global.
- LDS pointer is passed as argument to kernel itself.
|
|
|
|
Introduce a helper to check if a function is at the bottom of the stack,
i.e. if it's an entry function or a chain function.
This was suggested in #71913.
|
|
This will represent functions with the amdgpu_cs_chain or
amdgpu_cs_chain_preserve calling conventions.
Differential Revision: https://reviews.llvm.org/D156410
|
|
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.
Prior to this patch:
The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol
metadata so that the assembler can build lookup tables out of it. There is a fragile association between
kernel functions and named structs which is used to recompute the frame layout in the backend, with
fatal_errors catching inconsistencies in the second calculation.
After this patch:
The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same
absolute_symbol metadata that the assembler uses to place objects within that frame size.
Deleted the now dead allocation code from the backend. Left for a later cleanup:
- enabling lowering for anonymous functions
- removing the elide-module-lds attribute (test churn, it's not used by llc any more)
- adjusting the dynamic alignment check to not use symbol names
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D155190
|
|
The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.
1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel
2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables
3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen
4/ The assembler builds the lookup table using the metadata
5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use
Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144233
|
|
Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently used to build assembler
tables from their addresses.
This is a precursor to lowering dynamic/external LDS accesses from non-kernel
functions.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144221
|
|
This fixes what I consider to be an API flaw I've tripped over
multiple times. The point this is constructed isn't well defined, so
depending on where this is first called, you can conclude different
information based on the MachineFunction. For example, the AMDGPU
implementation inspected the MachineFrameInfo on construction for the
stack objects and if the frame has calls. This kind of worked in
SelectionDAG which visited all allocas up front, but broke in
GlobalISel which hasn't visited any of the IR when arguments are
lowered.
I've run into similar problems before with the MIR parser and trying
to make use of other MachineFunction fields, so I think it's best to
just categorically disallow dependency on the MachineFunction state in
the constructor and to always construct this at the same time as the
MachineFunction itself.
A missing feature I still could use is a way to access an custom
analysis pass on the IR here.
|
|
|
|
Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.
Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139433
|
|
This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed.
Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862
|
|
Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.
Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139433
|
|
|
|
It does not belong to a general AMDGPU MFI.
Differential Revision: https://reviews.llvm.org/D134666
|
|
A kernel may have an associated struct for laying out LDS variables.
This patch puts that instance, if present, at a deterministic address by
allocating it at the same time as the module scope instance.
This is relatively likely to be where the instance was allocated anyway (~NFC)
but will allow later patches to calculate where a given field can be found,
which means a function which is only reachable from a single kernel will be
able to access a LDS variable with zero overhead. That will be particularly
helpful for applications that instantiate a function template containing LDS
variables once per kernel.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D127052
|
|
Break msan bots. Details in D134666.
This reverts commit 0ce96e06ee0226938e723bd0c8e16e3d2d51f203.
|
|
It does not belong to a general AMDGPU MFI.
Differential Revision: https://reviews.llvm.org/D134666
|
|
Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
|
|
Differential Revision: https://reviews.llvm.org/D128118
|
|
Running iwyu-diff on LLVM codebase since 7030654296a0416bd9402a0278 detected a few
regressions, fixing them.
Differential Revision: https://reviews.llvm.org/D126417
|
|
Introduces a string attribute, amdgpu-requires-module-lds, to allow
eliding the module.lds block from kernels. Will allocate the block as before
if the attribute is missing or has its default value of true.
Patch uses the new attribute to detect the simplest possible instance of this,
where a kernel makes no calls and thus cannot call any functions that use LDS.
Tests updated to match, coverage was already good. Interesting cases is in
lower-module-lds-offsets where annotating the kernel allows the backend to pick
a different (in this case better) variable ordering than previously. A later
patch will avoid moving kernel variables into module.lds when the kernel can
have this attribute, allowing optimal ordering and locally unused variable
elimination.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122091
|
|
Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few
regressions, fixing them.
Differential Revision: https://reviews.llvm.org/D124847
|
|
These don't seem to be very well used or tested, but try to make the
behavior a bit more consistent with LDS globals.
I'm not sure what the definition for amdgpu-gds-size is supposed to
mean. For now I assumed it's allocating a static size at the beginning
of the allocation, and any known globals are allocated after it.
|
|
|
|
allocateModuleLDSGlobal"
Reconsidered, better to handle per-function state in the constructor as before.
This reverts commit 98e474c1b3210d90e313457bf6a6e39a7edb4d2b.
|
|
|
|
|
|
[amdgpu] Implement lower function LDS pass
Local variables are allocated at kernel launch. This pass collects global
variables that are used from non-kernel functions, moves them into a new struct
type, and allocates an instance of that type in every kernel. Uses are then
replaced with a constantexpr offset.
Prior to this pass, accesses from a function are compiled to trap. With this
pass, most such accesses are removed before reaching codegen. The trap logic
is left unchanged by this pass. It is still reachable for the cases this pass
misses, notably the extern shared construct from hip and variables marked
constant which survive the optimizer.
This is of interest to the openmp project because the deviceRTL runtime library
uses cuda shared variables from functions that cannot be inlined. Trunk llvm
therefore cannot compile some openmp kernels for amdgpu. In addition to the
unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled
and the function pointer hashing scheme deleted passes the openmp suite.
This lowering will use more LDS than strictly necessary. It is intended to be
a functionally correct fallback for cases that are difficult to target from
future optimisation passes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D94648
|
|
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
|
- Allows lds allocations
- Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata
Differential Revision: https://reviews.llvm.org/D92946
|
|
Summary:
- HIP uses an unsized extern array `extern __shared__ T s[]` to declare
the dynamic shared memory, which size is not known at the
compile time.
Reviewers: arsenm, yaxunl, kpyzhov, b-sumner
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82496
|
|
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
|
|
|
|
Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.
This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.
|
|
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, jdoerfert
Reviewed By: courbet
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D68792
llvm-svn: 374884
|
|
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
Reverts r337079 with fix for msan error.
llvm-svn: 337535
|
|
This reverts commit r337021.
WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
#1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
#2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
#3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
#4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
#5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
#6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
#7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
#8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26
[...]
Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
#0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192
llvm-svn: 337079
|
|
This was completely broken if there was ever a struct argument, as
this information is thrown away during the argument analysis.
The offsets as passed in to LowerFormalArguments are not useful,
as they partially depend on the legalized result register type,
and they don't consider the alignment in the first place.
Ignore the Ins array, and instead figure out from the raw IR type
what we need to do. This seems to fix the padding computation
if the DAG lowering is forced (and stops breaking arguments
following padded arguments if the arguments were only partially
lowered in the IR)
llvm-svn: 337021
|
|
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
|
|
We have too many mechanisms for tracking the various offsets
used for kernel arguments, so remove one. There's still a lot of
confusion with these because there are two different "implicit"
argument areas located at the beginning and end of the kernarg
segment.
Additionally, the offset was determined based on the memory
size of the split element types. This would break in a future
commit where v3i32 is decomposed into separate i32 pieces.
llvm-svn: 335830
|
|
AFAIK the driver's allocation will actually have to round this
up anyway. It is useful to track the rounded up size, so that
the end of the kernel segment is known to be dereferencable so
a wider s_load_dword can be used for a short argument at the end
of the segment.
llvm-svn: 333456
|
|
This is adoption of HSAIL perfhint pass. Two types of hints are produced:
1. Function is memory bound.
2. Kernel can use wave limiter.
Currently these hints are used in the scheduler. If a function is suspected
to be memory bound we allow occupancy to decrease to 4 waves in the course
of scheduling.
Differential Revision: https://reviews.llvm.org/D46992
llvm-svn: 333289
|
|
"not not" -> "not" etc
llvm-svn: 330083
|