llvm-project.git/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp, branch users/kerbowa/amdgpu-load-lat-scale

[AMDGPU] Dynamically set load latency in the scheduler

2025-09-16T05:51:53+00:00

[AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization (#149224)

2025-08-08T12:26:04+00:00

The `RPTarget`'s way of determining whether VGPRs are beneficial to save
and whether the target has been reached w.r.t. VGPR usage currently
assumes, if `CombinedVGPRSavings` is true, that free slots in one VGPR
RC can always be used for the other. Implicitly, this makes the
rematerialization stage (only current user of `RPTarget`) follow a
different occupancy calculation than the "regular one" that the
scheduler uses, one that assumes that ArchVGPR/AGPR usage can be
balanced perfectly and at no cost, which is untrue in general. This
ultimately yields suboptimal rematerialization decisions that require
cross-VGPR-RC copies unnecessarily.

This fixes that, making the `RPTarget`'s internal model of occupancy
consistent with the regular one. The `CombinedVGPRSavings` flag is
removed, and a form of cross-VGPR-RC saving implemented only for unified
RFs, which is where it makes the most sense. Only when the amount of
free VGPRs in a given VGPR RC (ArchVPGR or AGPR) is lower than the
excess VGPR usage in the other VGPR RC does the `RPTarget` consider that
a pressure reduction in the former will be beneficial to the latter.

[AMDGPU][Scheduler] Delete RegionsWithMinOcc bitvector from scheduler (NFC) (#142361)

2025-08-01T11:20:05+00:00

The `GCNScheduleDAGMILive`'s `RegionsWithMinOcc` bitvector is only used
by the `UnclusteredHighRPStage`. Its presence in the scheduler's state
forces us to maintain its value throughout scheduling even though it is
of no use to the iterative scheduling process itself. At any point
during scheduling it is possible to cheaply compute the occupancy
induced by a particular register pressure. Furthermore, the field
doesn't appear to be updated correctly throughout scheduling i.e., bits
corresponding to regions at minimum occupancy are not always set in the
vector.

This removes the bitvector from `GCNScheduleDAGMILive`.
`UnclusteredHighRPStage::initGCNRegion` now directly computes the
occupancy of possibly reschedulable regions instead of querying the
vector. Since it is the most expensive check, it is done last in the
list.

[MachineScheduler] Make cluster check more efficient (#150884)

2025-08-01T08:00:42+00:00

[AMDGPU] Don't skip regions in getRegionLiveInMap (#151423)

2025-07-31T22:05:18+00:00

Currently, this skips any region that is not the first region in a
block. This is because the only user of it only cares about the LiveIns
per-block. However, as named, this is supposed to compute the per-region
LiveIns.

This doesn't have any effect on scheduling / CodeGen currently (aside
from computing LiveIns for all regions) since only the per-block LiveIns
are needed. However, I'm working on something that will use this.

Intended User:

https://github.com/llvm/llvm-project/pull/149367/


https://github.com/llvm/llvm-project/blob/c62a2f127cba5d6df350474dfd4a6e5f9250fe4f/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp#L1351

[AMDGPU][Scheduler] Use `AMDGPU::NoSubRegister` instead of 0 (NFC) (#150610)

2025-07-25T12:25:13+00:00

[AMDGPU][Scheduler] Fix usage of `TII.reMaterialize` (NFC) (#150259)

2025-07-25T10:51:21+00:00

Any non-zero `SubIdx` passed to the method is composed with the
rematerialized instruction's first operand's subregister to determine
the new register's subregister. In our case we want the new register to
have the same subregister as the old one, so we should pass 0.

[AMDGPU] NFC: Decouple getRealRegPressure from current region (#149219)

2025-07-17T02:59:38+00:00

We're already accepting a RegionIdx for the LiveIns, also use this for
the instruction iterators.

Enables querying RP for other regions -- useful for function wide
transformations (e.g. rematerialization, rewriting, etc).

[AMDGPU] Add `GCNRPTarget` to track register pressure against a target (#145765)

2025-06-26T11:11:20+00:00

This adds the `GCNRPTarget` class which models a register pressure
target (i.e., maximum number of SGPRs/VGPRS) that one can track register
savings against. The only current use of this class is in the
scheduler's rematerialization stage. It replaces the more ad-hoc (and
now deleted) `ExcessRP` class which used to serve the same purpose.

This is only NFC~ish because `GCNRPTarget` tracks VGPR usage more
accurately than `ExcessRP` used to. To estimate required combined VGPR
savings we now additionally take into account the number of available
VGPRs in both banks (ArchVGPR and AGPR) at the time where the RP target
is created, whereas we used to only consider explicit savings made from
the starting RP. This makes VGPR savings estimations more accurate in
cases where we allow for savings in one VGPR bank to help towards
reducing pressure in another VGPR bank (see
`GCNRPTarget::CombineVGPRSavings`). This is the cause for unit test
changes.

[AMDGPU][Scheduler] Support for rematerializing SGPRs and AGPRs (#140036)

2025-06-24T17:30:27+00:00

This adds the ability to rematerialize SGPRs and AGPRs to the
scheduler's `PreRARematStage`, which can currently only rematerialize
ArchVGPRs. This also fixes a small potential issue in the stage where,
in case of spilling, the target occupancy could be set to a lower than
expected value when the function had either one of the "amdgpu-num-sgpr"
or "amdgpu-num-vgpr" attributes set.