llvm-project.git/llvm/lib/CodeGen/RegAllocFast.cpp, branch users/shawbyoung/spr/main.boltnfc-refactoring-callgraph

[CodeGen][NewPM] Extract MachineFunctionProperties modification part to an RAII class (#94854)

2024-06-22T09:34:03+00:00

Modify MachineFunctionProperties in PassModel makes `PassT P;
P.run(...);` not work properly. This is a necessary compromise.

[RegAllocFast] Replace UsedInInstr with vector (#96323)

2024-06-21T17:35:29+00:00

A SparseSet adds an avoidable layer of indirection and possibly looping
control flow. Avoid this overhead by using a vector to store
UsedInInstrs and PhysRegUses.

To avoid clearing the vector after every instruction, use a
monotonically increasing counter. The two maps are now merged and the
lowest bit indicates whether the use is relevant for the livethrough
handling code only.

[RegAlloc] Don't call always-true ShouldAllocClass (#96296)

2024-06-21T11:18:35+00:00

Previously, there was at least one virtual function call for every
allocated register. The only users of this feature are AMDGPU and RISC-V
(RVV), other targets don't use this. To easily identify these cases,
change the default functor to nullptr and don't call it for every
allocated register.

[RegAllocFast] Handle single-vdef instrs faster (#96284)

2024-06-21T10:30:59+00:00

On x86, many instructions have tied operands, so allocateInstruction
uses the more complex assignment strategy, which computes the assignment
order of virtual defs first. This involves iterating over all register
classes (or register aliases for physical defs) to compute the possible
number of defs per register class.

However, this information is only used for sorting virtual defs and
therefore not required when there's only one virtual def -- which is a
very common case. As iterating over all register classes/aliases is not
cheap, do this only when there's more than one virtual def.

[RegAllocFast] Use unsigned for operand indices

2024-06-21T10:25:28+00:00

MachineInstr operand indices can be up 24 bits currently. Use unsigned
as consistent data type for operand indices instead of uint16_t.

[NewPM][CodeGen] Port `regallocfast` to new pass manager (#94426)

2024-06-07T04:22:42+00:00

This pull request port `regallocfast` to new pass manager. It exposes
the parameter `filter` to handle different register classes for AMDGPU.
IIUC AMDGPU need to allocate different register classes separately so it
need implement its own `---regalloc`. Now users can use e.g.
`-passe=regallocfast` to allocate specific register class.
The command line option `--regalloc-npm` is still in work progress, plan
to reuse the syntax of passes, e.g. use
`--regalloc-npm=regallocfast,greedy` to
replace `--sgpr-regalloc` and `--vgpr-regalloc`.

[CodeGen] Do not pass MF into MachineRegisterInfo methods. NFC. (#84770)

2024-03-11T15:35:05+00:00

MachineRegisterInfo already knows the MF so there is no need to pass it
in as an argument.

[RegAllocFast] Lazily initialize InstrPosIndexes for each MBB (#76275)

2023-12-25T01:42:31+00:00

Most basic block do not need to query dominates. Defer initialization of
InstrPosIndexes to first query for each MBB.

[RegAllocFast] Avoid duplicate hash lookup (NFC)

2023-12-22T15:52:20+00:00

[RegAllocFast] Refactor dominates algorithm for large basic block (#72250)

2023-12-22T15:06:16+00:00

The original brute force dominates algorithm is O(n) complexity so it is
very slow for very large machine basic block which is very common with
O0. This patch added InstrPosIndexes to assign index for each
instruction and use it to determine dominance. The complexity is now
O(1).