| Age | Commit message (Collapse) | Author |
|
This changes `MCRegUnit` type from `unsigned` to `enum class : unsigned`
and inserts necessary casts.
The added `MCRegUnitToIndex` functor is used with `SparseSet`,
`SparseMultiSet` and `IndexedMap` in a few places.
`MCRegUnit` is opaque to users, so it didn't seem worth making it a
full-fledged class like `Register`.
Static type checking has detected one issue in
`PrologueEpilogueInserter.cpp`, where `BitVector` created for
`MCRegister` is indexed by both `MCRegister` and `MCRegUnit`.
The number of casts could be reduced by using `IndexedMap` in more
places and/or adding a `BitVector` adaptor, but the number of casts *per
file* is still small and `IndexedMap` has limitations, so it didn't seem
worth the effort.
Pull Request: https://github.com/llvm/llvm-project/pull/167943
|
|
|
|
Previously this took hints from subregister extract of physreg,
like %vreg.sub = COPY $physreg
This now also handles the rarer case:
$physreg_sub = COPY %vreg
Also make an accidental bug here before explicit; this was
only using the superregister as a hint if it was already
in the copy, and not if using the existing assignment. There are
a handful of regressions in that case, so leave that extension
for a future change.
|
|
|
|
|
|
Instead of checking if the recoloring candidate is a virtual register,
avoid adding it to the candidates in the first place.
|
|
(#160424)
For subregister copies, do a subregister live check instead of checking
the main range. Doesn't do much yet, the split analysis still does not
track live ranges.
|
|
(#160294)
This is essentially the same patch as
116ca9522e89f1e4e02676b5bbe505e80c4d4933;
when trying to match a physreg hint, try to find a compatible physreg if
there is
a subregister copy. This has the slight difference of using getSubReg on
the hint
instead of getMatchingSuperReg (the other use should also use getSubReg
instead,
it's faster).
At the moment this turns out to have very little effect. The adjacent
code needs
better handling of subregisters, so continue adding this piecemeal. The
X86 test
shows a net reduction in real instructions, plus a few new kills.
|
|
If a COPY uses Reg but only in an implicit operand then the new
implementation ignores it but the old implementation would have treated
it as a copy of Reg. Probably this case never occurs in practice. Other
than that, this patch is NFC.
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
Previously this would only accept full copy hints. This relaxes
this to accept some subregister copies. Specifically, this now
accepts:
- Copies to/from physical registers if there is a compatible
super register
- Subreg-to-subreg copies
This has the potential to repeatedly add the same hint to the
hint vector, but not sure if that's a real problem.
|
|
(#140002)
Add per-property has<Prop>/set<Prop>/reset<Prop> functions to
MachineFunctionProperties.
|
|
This experimental option was introduced in 2015 via commit 1192294, and
the target hook was added in 2020 via commit 99e865b6. There does not
appear to have ever been a use of this target hook in tree.
This code is complicating one of the most complicated and hard to
understand parts of our code base, and was an experiment introduced
nearly 10 years ago. Let's get rid of it.
Note that the idea described in the original patch is not neccessarily a
bad one, and we might return to it someday.
|
|
Current implementation tries to fold the operand before
rematerialization because it can reduce one register usage. But if there
is a physical register available we can still rematerialize it without
causing high register pressure.
This patch do this check to find the better choice. Then we can produce
xorps %xmm1, %xmm1
ucomiss %xmm1, %xmm0
instead of
ucomiss LCPI0_1(%rip), %xmm0
|
|
|
|
|
|
This is based off of 63efd8e7e68bc.
The following table shows the percent change to the dynamic instruction
count when the function in this patch returns 0 (default) versus other
values.
| benchmark | % speedup 1 over 0 | % speedup 4 over 0 | % speedup 16
over 0 | % speedup 64 over 0 | % speedup 128 over 0 |
| --------------- | ---------------------- | --------------------- |
--------------------- | -------------------- | -------------------- |
| 500.perlbench_r | 0.001018570165 | 0.001049508358 | 0.001001106529 |
0.03382582818 | 0.03395354577 |
| 502.gcc_r | 0.02850551412 | 0.02170512371 | 0.01453021263 |
0.06011008637 | 0.1215691521 |
| 505.mcf_r | -0.00009506373338 | -0.00009090057642 | -0.0000860991497 |
-0.00005027849766 | 0.00001251173791 |
| 520.omnetpp_r | 0.2958940288 | 0.2959715925 | 0.2961141505 |
0.2959823497 | 0.2963124341 |
| 523.xalancbmk_r | -0.0327074721 | -0.01037021046 | -0.3226810542 |
0.02127133714 | 0.02765388389 |
| 525.x264_r | 0.0000001381714403 | -0.00000007041540345 |
-0.00000002156399465 | 0.0000002108993364 | 0.0000002463382874 |
| 531.deepsjeng_r | 0.00000000339777238 | 0.000000003874652714 |
0.000000003636212547 | 0.000000003874652714 | 0.000000003159332213 |
| 541.leela_r | 0.0009186059953 | -0.000424159199 | 0.0004984456879 |
0.274948447 | 0.8135521414 |
| 557.xz_r | -0.000000003547118854 | -0.00004896449559 |
-0.00004910691576 | -0.0000491109983 | -0.00004895599589 |
| geomean | 0.03265937388 | 0.03424232324 | -0.00107917442 |
0.07629116165 | 0.1439913192 |
The following table shows the percent change to the runtime when the
function in this patch returns 0 (default) versus other values.
| benchmark | % speedup 1 over 0 | % speedup 4 over 0 | % speedup 16
over 0 | % speedup 64 over 0 | %speedup 128 over 0 |
| --------------- | ------------------ | ------------------ |
------------------- | ------------------- | ------------------- |
| 500.perlbench_r | 0.1722356761 | 0.2269681109 | 0.2596825578 |
0.361573851 | 1.15041305 |
| 502.gcc_r | -0.548415855 | -0.06187002799 | -0.5553684674 |
-0.8876686237 | -0.4668665535 |
| 505.mcf_r | -0.8786414258 | -0.4150938441 | -1.035517726 |
-0.1860770377 | -0.01904825648 |
| 520.omnetpp_r | 0.4130256072 | 0.6595976188 | 0.897332171 |
0.6252625622 | 0.3869467278 |
| 523.xalancbmk_r | 1.318132014 | -0.003927574 | 1.025962975 |
1.090320253 | -0.789206202 |
| 525.x264_r | -0.03112871796 | -0.00167557587 | 0.06932423155 |
-0.1919840015 | -0.1203585732 |
| 531.deepsjeng_r | -0.259516072 | -0.01973455652 | -0.2723227894 |
-0.005417022257 | -0.02222388177 |
| 541.leela_r | -0.3497178495 | -0.3510447393 | 0.1274508001 |
0.6485542452 | 0.2880651727 |
| 557.xz_r | 0.7683565263 | -0.2197509447 | -0.0431183874 |
0.07518130872 | 0.5236853039 |
| geomean | 0.06506952742 | -0.0211865386 | 0.05072694648 | 0.1684530637
| 0.1020533557 |
I chose to set the value to 5 on RISC-V because it has improvement to
both the dynamic IC and the runtime and because it showed good results
empirically and had a similar effect as setting it to higher numbers.
I looked at some diff and it seems like this patch leads to two things:
1. Less spilling -- not spilling the CSR led to better register
allocation and helped us avoid spills down the line
2. Avoid spilling CSR but spill more on paths that static heuristics
estimate as cold.
|
|
|
|
This fixes an assert after allocation failure.
Rather than collecting failed virtual registers and hacking
on the uses after the fact, directly hack on the uses and rewrite
the registers to the dummy assignment immediately.
Previously we were bypassing LiveRegMatrix and directly assigning
in the VirtRegMap. This resulted in inconsistencies where illegal
overlapping assignments were missing. Rather than try to hack in
some system to manage these in LiveRegMatrix (i.e. hacking around
cases with invalid iterators), avoid this by directly using the
physreg. This should also allow removal of special casing in
virtregrewriter for failed allocations.
|
|
(#128400)
Reapply "RegAlloc: Fix verifier error after failed allocation (#119690)"
This reverts commit 0c50054820799578be8f62b6fd2cc3fbc751c01e.
Reapply with more fixes to avoid expensive_checks failures. Make sure to
call splitSeparateComponents after shrinkToUses, and update the VirtRegMap
with the split registers. Also set undef on all physical register aliases to
the assigned register.
Move physreg handling. Not sure if necessary
Remove intervals from regunits. Not sure if necessary
|
|
Leaving out NPM command line support for the next patch.
|
|
|
|
|
|
return a physical register.
The callers of these functions return the value as an MCRegister
so this removes some casts from Register to MCRegister.
|
|
|
|
|
|
This reverts commit 34167f99668ce4d4d6a1fb88453a8d5b56d16ed5.
Different set of verifier errors appears after other regalloc failure
tests with EXPENSIVE_CHECKS.
|
|
In some cases after reporting an allocation failure, this would fail
the verifier. It picks the first allocatable register and assigns it,
but didn't update the liveness appropriately. When VirtRegRewriter
relied on the liveness to set kill flags, it would incorrectly add
kill flags if there was another overlapping kill of the virtual
register.
We can't properly assign the register to an overlapping range, so
break the liveness of the failing register (and any other interfering
registers) instead. Give the virtual register dummy liveness by
effectively deleting all the uses by setting them to undef.
The edge case not tested here which I'm worried about is if the read
of the register is a def of a subregister. I've been unable to come up
with a test where this occurs.
https://reviews.llvm.org/D122616
|
|
Similar to #117309.
The advisor and logger are accessed through the provider, which is
served by the new PM. Legacy PM forwards calls to the provider.
New PM is a machine function analysis that lazily initializes the
provider.
|
|
Legacy pass used to provide the advisor, so this extracts that logic
into a provider class used by both analysis passes.
All three (Default, Release, Development) legacy passes
`*AdvisorAnalysis` are basically renamed to `*AdvisorProvider`, so the
actual legacy wrapper passes are `*AdvisorAnalysisLegacy`.
There is only one NPM analysis `RegAllocEvictionAnalysis` that switches
between the three providers in the `::run` method, to be cached by the
NPM.
Also adds `RequireAnalysis<RegAllocEvictionAnalysis>` to the optimized
target reg alloc codegen builder.
|
|
|
|
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.
Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.
---------
Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
|
|
This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I
investigate what's causing the compile-time regression.
|
|
|
|
|
|
using(). NFC
Many of these are indexing BitVectors or something where we can't
using MCRegister and need the register number.
|
|
(#122665)
Makes Inline Spiller amenable to the new PM.
This reapplies commit a531800344dc54e9c197a13b22e013f919f3f5e1 reverted
because of two unused private members reported on sanitizer bots.
|
|
(#122426)
…181)"
This reverts commit a531800344dc54e9c197a13b22e013f919f3f5e1.
|
|
Makes Inline Spiller amenable to the new PM.
|
|
|
|
Last chance recoloring can delete the current fixed interval
during recursive assignment of interfering live intervals. Check
if the virtual register value was assigned before attempting the
unassignment, as is done in other scenarios. This relies on the fact
that we do not recycle virtual register numbers.
I have only seen this occur in error situations where the allocation
will fail, but I think this can theoretically happen in working
allocations.
This feels very brute force, but I've spent over a week debugging
this and this is what works without any lit regressions. The surprising
piece to me was that unspillable live ranges may be spilled, and
a number of tests rely on optimizations occurring on them. My other
attempts to fixed this mostly revolved around not identifying unspillable
live ranges as snippet copies. I've also discovered we're making some
unproductive live range splits with subranges. If we avoid such splits,
some of the unspillable copies disappear but mandating that be precise
to fix a use after free doesn't sound right.
|
|
I regularly struggle reproducing failures in greedy due to changes
in priority when resuming the allocation from MIR vs. a complete
compilation starting at IR. That is, the fix in
e0919b189bf2df4f97f22ba40260ab5153988b14 did not really fix the
problem of the instruction distance mattering.
Add a way to bypass all of the priority heuristics for MIR tests,
by prioritizing only by virtual register number. Could also
give this a more specific name, like PrioritizeLowVirtRegNumber
|
|
|
|
Greedy and fast would hit different assertions on undef uses if all
registers in a class were reserved.
|
|
|
|
The existing analysis was already a pimpl wrapper.
I have extracted legacy pass logic to a LDVImpl wrapper named
`LiveDebugVariables` which is the analysis::Result now. This controls
whether to activate the LDV (depending on `-live-debug-variables` and
DIsubprogram) itself.
The legacy and new analysis only construct the LiveDebugVariables.
VirtRegRewriter will test this.
|
|
|
|
|
|
Identified with misc-include-cleaner.
|
|
|
|
|