summaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
AgeCommit message (Collapse)Author
2025-11-14[AMDGPU] Make use of getFunction and getMF. NFC. (#167872)Jay Foad
2025-10-31[AMDGPU] Record old VGPR MSBs in the high bits of s_set_vgpr_msb (#165035)Stanislav Mekhanoshin
Fixes: SWDEV-562450
2025-09-24[AMDGPU][True16][CodeGen] true16 isel pattern for fma_mix_f16/bf16 (#159648)Brox Chen
This patch includes: 1. fma_mix inst takes fp16 type as input, but place the operand in vgpr32. Update selector to insert vgpr32 for true16 mode if necessary. 2. fma_mix inst returns fp16 type as output, but place the vdst in vgpr32. Create a fma_mix_t16 pesudo inst for isel pattern, and lower it to mix_lo/hi in the mc lowering pass. These stop isel from emitting illegal `vgpr32 = COPY vgpr16` and improve code quality
2025-09-04[AMDGPU] High VGPR lowering on gfx1250 (#156965)Stanislav Mekhanoshin
2025-07-15[AMDGPU] gfx1250 64-bit relocations and fixups (#148951)Stanislav Mekhanoshin
2025-04-25Reland [AMDGPU] Support block load/store for CSR #130013 (#137169)Diana Picus
Add support for using the existing SCRATCH_STORE_BLOCK and SCRATCH_LOAD_BLOCK instructions for saving and restoring callee-saved VGPRs. This is controlled by a new subtarget feature, block-vgpr-csr. It does not include WWM registers - those will be saved and restored individually, just like before. This patch does not change the ABI. Use of this feature may lead to slightly increased stack usage, because the memory is not compacted if certain registers don't have to be transferred (this will happen in practice for calling conventions where the callee and caller saved registers are interleaved in groups of 8). However, if the registers at the end of the block of 32 don't have to be transferred, we don't need to use a whole 128-byte stack slot - we can trim some space off the end of the range. In order to implement this feature, we need to rely less on the target-independent code in the PrologEpilogInserter, so we override several new methods in SIFrameLowering. We also add new pseudos, SI_BLOCK_SPILL_V1024_SAVE/RESTORE. One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the SCRATCH_LOAD_BLOCK instructions will have all the registers that are not transferred added as implicit uses. This is done in order to inform LiveRegUnits that those registers are not available before the restore (since we're not really restoring them - so we can't afford to scavenge them). Unfortunately, this trick doesn't work with the save, so before the save all the registers in the block will be unavailable (see the unit test). This was reverted due to failures in the builds with expensive checks on, now fixed by always updating LiveIntervals and SlotIndexes in SILowerSGPRSpills.
2025-04-23Revert "[AMDGPU] Support block load/store for CSR" (#136846)Diana Picus
Reverts llvm/llvm-project#130013 due to failures with expensive checks on.
2025-04-23[AMDGPU] Support block load/store for CSR (#130013)Diana Picus
Add support for using the existing `SCRATCH_STORE_BLOCK` and `SCRATCH_LOAD_BLOCK` instructions for saving and restoring callee-saved VGPRs. This is controlled by a new subtarget feature, `block-vgpr-csr`. It does not include WWM registers - those will be saved and restored individually, just like before. This patch does not change the ABI. Use of this feature may lead to slightly increased stack usage, because the memory is not compacted if certain registers don't have to be transferred (this will happen in practice for calling conventions where the callee and caller saved registers are interleaved in groups of 8). However, if the registers at the end of the block of 32 don't have to be transferred, we don't need to use a whole 128-byte stack slot - we can trim some space off the end of the range. In order to implement this feature, we need to rely less on the target-independent code in the PrologEpilogInserter, so we override several new methods in `SIFrameLowering`. We also add new pseudos, `SI_BLOCK_SPILL_V1024_SAVE/RESTORE`. One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the SCRATCH_LOAD_BLOCK instructions will have all the registers that are not transferred added as implicit uses. This is done in order to inform LiveRegUnits that those registers are not available before the restore (since we're not really restoring them - so we can't afford to scavenge them). Unfortunately, this trick doesn't work with the save, so before the save all the registers in the block will be unavailable (see the unit test).
2025-03-30Move relocation specifiers to AMDGPUMCExpr::SpecifierFangrui Song
Similar to previous migration done for all other ELF targets. Switch from the confusing `VariantKind` to `Specifier`, which aligns with Arm and IBM AIX's documentation. Moving forward, relocation specifiers should be integrated into AMDGPUMCExpr rather than MCSymbolRefExpr::SubclassData. (Note: the term AMDGPUMCExpr::VariantKind is for expressions without relocation specifiers: https://github.com/llvm/llvm-project/pull/82022 It's up to AMDGPU maintainers to integrate these constants into Specifier. ) Pull Request: https://github.com/llvm/llvm-project/pull/133608
2025-03-29[RISCV] Replace @plt/@gotpcrel in data directives with %pltpcrel %gotpcrelFangrui Song
clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and `@gotpcrel` specifiers in data directives. The syntax is not used in humand-written assembly code, and is not supported by GNU assembler. Note: the `@plt` in `.word foo@plt` is different from the legacy `call func@plt` (where `@plt` is simply ignored). The `@plt` syntax was selected was simply due to a quirk of AsmParser: the syntax was supported by all targets until I updated it to be an opt-in feature in a0671758eb6e52a758bd1b096a9b421eec60204c RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc, and we should follow this convention. This PR adds support for `.word %pltpcrel(foo+offset)` and `.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`. * MCValue::SymA can no longer have a SymbolVariant. Add an assert similar to that of AArch64ELFObjectWriter.cpp before https://reviews.llvm.org/D81446 (see my analysis at https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers if intrigued) * `jump foo@plt, x31` now has a different diagnostic. Pull Request: https://github.com/llvm/llvm-project/pull/132569
2025-02-18[AMDGPU][True16][CodeGen] reopen "FLAT_load using D16 pseudo instruction" ↵Brox Chen
(#127673) Previous patch is merged https://github.com/llvm/llvm-project/pull/114500 and it hit a buildbot failure and thus reverted It seems the AMDGPU::OpName::OPERAND_LAST is removed at the meantime when previous patch is merged and that's causing the compile error. Fixed and reopen it here
2025-02-18Revert "[AMDGPU][True16][CodeGen] FLAT_load using D16 pseudo instruction ↵Nikita Popov
(#114500)" This reverts commit f7a5f067885b7f6cc4a000c8392adf6b777a9108. Fails to build with: llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:126:37: error: no member named 'OPERAND_LAST' in 'llvm::AMDGPU::OpName' 126 | uint16_t OpName = AMDGPU::OpName::OPERAND_LAST;
2025-02-18[AMDGPU][True16][CodeGen] FLAT_load using D16 pseudo instruction (#114500)Brox Chen
Implement new pseudos with the suffix _t16 for FLAT_LOAD which have VGPR_16 as the load dst. Lower the pseudos to the existing real instructions with VGPR_32 src or dst (which makes them consistent with the hardware encoding). This patch reduces VGPR usage by making hi halves of VGPRs available for other values. There are more 8/16 bits ld/st instructions to be supported in the up-coming patches
2025-02-13[AMDGPU] Try to fix -mattr=dumpcode on big-endian hosts (#127073)Jay Foad
Blind fix for #116982 failing on big-endian buildbots.
2024-11-13[AMDGPU] Remove unused includes (NFC) (#116154)Kazu Hirata
Identified with misc-include-cleaner.
2024-09-20[llvm] Don't call raw_string_ostream::flush() (NFC)Youngsuk Kim
Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
2024-07-29[AsmPrinter] Don't EmitToStreamer instructions lowered by tblgenned code ↵Sergei Barannikov
(#100803) This allows lowering individual instructions in a bundle before a single call to EmitToStreamer for VLIW targets.
2024-07-21[MC] Drop unnecessary MCSymbol::setExternal calls for ELFFangrui Song
Similar to e4c360a897fe062914519d331e8f1e28b2b1fbfd (2020).
2023-07-05[AMDGPU] Fix expensive-checks build.Ivan Kosarev
Completes <https://reviews.llvm.org/D154337>.
2023-04-27AMDGPU: Define sub-class of SGPR_64 for tail call returnChangpeng Fang
Summary: Registers for tail call return should not be clobbered by callee. So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold the tail call return address. Because GFX and C calling conventions have different CSR, we need to define the sub-class separately. This work is an extension of D147096 with the consideration of GFX calling convention. Based on the calling conventions, different instructions will be selected with different sub-class of SGPR_64 as the input. Reviewers: arsenm, cdevadas and sebastian-ne Differential Revision: https://reviews.llvm.org/D148824
2023-04-06[MC] Always encode instruction into SmallVectorAlexis Engelke
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream to encode the instruction into a SmallVector. The raw_ostream however incurs some overhead for the actual encoding. This change allows an MCCodeEmitter to directly emit an instruction into a SmallVector without using a raw_ostream and therefore allow for performance improvments in encoding. A default path that uses existing raw_ostream implementations is provided. Reviewed By: MaskRay, Amir Differential Revision: https://reviews.llvm.org/D145791
2023-03-12[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MDJon Chesterfield
Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently used to build assembler tables from their addresses. This is a precursor to lowering dynamic/external LDS accesses from non-kernel functions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144221
2022-12-07[amdgpu] Reimplement LDS loweringJon Chesterfield
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis. Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139433
2022-12-06Revert "[amdgpu] Reimplement LDS lowering"Nico Weber
This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed. Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862
2022-12-06[amdgpu] Reimplement LDS loweringJon Chesterfield
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis. Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139433
2022-08-19[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategyAustin Kerbow
Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other mutations or clustering that may interfere with the desired pipeline. The first optimization strategy that is added here is designed to improve the performance of small gemm kernels on gfx90a. Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D132079
2022-07-28[AMDGPU] Add amdgcn_sched_group_barrier builtinAustin Kerbow
This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used either for testing, in situations where the default scheduler heuristics cannot be improved, or in critical kernels where users are trying to get performance that is close to handwritten assembly. Obviously using these builtins will require extra work from the kernel writer to maintain the desired behavior. The builtin can be used to create groups of instructions called "scheduling groups" where ordering between the groups is enforced by the scheduler. __builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter is a mask that determines the types of instructions that you would like to synchronize around and add to a scheduling group. These instructions will be selected from the bottom up starting from the sched_group_barrier's location during instruction scheduling. The second parameter is the number of matching instructions that will be associated with this sched_group_barrier. The third parameter is an identifier which is used to describe what other sched_group_barriers should be synchronized with. Note that multiple sched_group_barriers must be added in order for them to be useful since they only synchronize with other sched_group_barriers. Only "scheduling groups" with a matching third parameter will have any enforced ordering between them. As an example, the code below tries to create a pipeline of 1 VMEM_READ instruction followed by 1 VALU instruction followed by 5 MFMA instructions... // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 1 VALU __builtin_amdgcn_sched_group_barrier(2, 1, 0) // 5 MFMA __builtin_amdgcn_sched_group_barrier(8, 5, 0) // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 3 VALU __builtin_amdgcn_sched_group_barrier(2, 3, 0) // 2 VMEM_WRITE __builtin_amdgcn_sched_group_barrier(64, 2, 0) Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D128158
2022-07-14[CodeGen] Move instruction predicate verification to emitInstructionDavid Green
D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Recommitted with some fixes for the leftover MCII variables in release builds. Differential Revision: https://reviews.llvm.org/D129506
2022-07-13Revert "Move instruction predicate verification to emitInstruction"David Green
This reverts commit e2fb8c0f4b940e0285ee36c112469fa75d4b60ff as it does not build for Release builds, and some buildbots are giving more warning than I saw locally. Reverting to fix those issues.
2022-07-13Move instruction predicate verification to emitInstructionDavid Green
D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Differential Revision: https://reviews.llvm.org/D129506
2022-05-11[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsicAustin Kerbow
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If there is a need to have highly optimized codegen and kernel developers have knowledge of inter-wave runtime behavior which is unknown to the compiler this builtin can be used to tune scheduling. This intrinsic creates a barrier between scheduling regions. The immediate parameter is a mask to determine the types of instructions that should be prevented from crossing the sched_barrier. In this initial patch, there are only two variations. A mask of 0 means that no instructions may be scheduled across the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing instructions may cross the sched_barrier. Note that this intrinsic is only meant to work with the scheduling passes. Any other transformations that may move code will not be impacted in the ways described above. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D124700
2022-03-09[AMDGPU] Move call clobbered return address registers s[30:31] to callee ↵Venkata Ramanaiah Nalamothu
saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652
2022-02-16[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`Shao-Ce SUN
Reviewed By: skan Differential Revision: https://reviews.llvm.org/D119846
2021-12-22Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to ↵Ron Lieberman
callee saved range" This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3. Failed amdgpu runtime buildbot # 3514
2021-12-22[AMDGPU] Move call clobbered return address registers s[30:31] to callee ↵RamNalamothu
saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D114652
2021-11-17[NFC][llvm] Inclusive language: reword and remove uses of sanity in ↵Zarko Todorovski
llvm/lib/Target Reworded removed code comments that contain `sanity check` and `sanity test`.
2021-11-04[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 ↵Thomas Symalla
callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values. This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D111637
2021-10-01[NFC][AMDGPU] Reduce includes dependencies, part 2Daniil Fukalov
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Minor forward declarations, redundant includes and flags in GCNSubtarget cleanup. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D109351
2021-09-27[AMDGPU] Ignore KILLs when forming clausesSebastian Neubauer
KILL instructions are sometimes present and prevented hard clauses from being formed. Fix this by ignoring all meta instructions in clauses. Differential Revision: https://reviews.llvm.org/D106042
2021-08-25[NFC][AMDGPU] Reduce includes dependencies.Daniil Fukalov
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108596
2021-07-26[amdgpu] Add 64-bit PC support when expanding unconditional branches.Michael Liao
Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106445
2021-03-09[AMDGPU] Remove SI_MASK_BRANCHRuiling Song
This is already deprecated, so remove code working on this. Also update the tests by using S_CBRANCH_EXECZ instead of SI_MASK_BRANCH. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97545
2021-01-20[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargetsdfukalov
... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036
2021-01-07[NFC][AMDGPU] Reduce include files dependency.dfukalov
Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813
2020-10-23AMDGPU: Increase branch size estimate with offset bugMatt Arsenault
This will be relaxed to insert a nop if the offset hits the bad value, so over estimate branch instruction sizes.
2020-02-13[AsmPrinter][MCStreamer] De-capitalize EmitInstruction and EmitCFI*Fangrui Song
2020-01-06[MC] Add parameter `Address` to MCInstPrinter::printInstFangrui Song
printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172
2019-10-11[AMDGPU] link dpp pseudos and real instructions on gfx10Stanislav Mekhanoshin
This defaults to zero fi operand, but we do not expose it anyway. Should we expose it later it needs to be added to the pseudo. This enables dpp combining on gfx10. Differential Revision: https://reviews.llvm.org/D68888 llvm-svn: 374604
2019-06-16AMDGPU: Prepare for explicit absolute relocations in code generationNicolai Haehnle
Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517
2019-06-05AMDGPU: Fix using 2 different enums for same operand flagsMatt Arsenault
These enums are really for the same namespace of flags set on arbitrary MachineOperands, so merge them to avoid value collisions. llvm-svn: 362640