diff options
| author | Tamar Christina <tamar.christina@arm.com> | 2025-11-19 14:27:55 +0000 |
|---|---|---|
| committer | Tamar Christina <tamar.christina@arm.com> | 2025-11-19 14:27:55 +0000 |
| commit | 3027010d8bcc854eb43425cb1da573ff7345a5ac (patch) | |
| tree | 21098af4cc868406a83ef9882f5ca4f48abd45a4 /libjava/java/sql | |
| parent | a3e97daf1f7452d060d2e5e4eb2fea7717343f18 (diff) | |
AArch64: expand extractions of Adv.SIMD registers from SVE as separate insn.
For this example using the Adv.SIMD/SVE Bridge
#include <arm_neon.h>
#include <arm_neon_sve_bridge.h>
#include <stdint.h>
svint16_t sub_neon_i16_sve_bridged(svint8_t a, svint8_t b) {
return svset_neonq_s16(svundef_s16(),
vsubq_s16(vmovl_high_s8(svget_neonq(a)),
vmovl_high_s8(svget_neonq(b))));
}
we generate:
sub_neon_i16_sve_bridged(__SVInt8_t, __SVInt8_t):
sxtl2 v0.8h, v0.16b
ssubw2 v0.8h, v0.8h, v1.16b
ret
instead of just
sub_neon_i16_sve_bridged(__SVInt8_t, __SVInt8_t):
ssubl2 v0.8h, v0.16b, v1.16b
ret
Commit g:abf865732a7313cf79ffa325faed3467ed28d8b8 added a framework to fold
uses of instrinsics combined with lo/hi extractions into the appropriate low
or highpart instructions.
However this doesn't trigger because the Adv.SIMD from SVE extraction code for
vmovl_high_s8(svget_neonq(a))
does not have one argument as constant and only supports folding 2 insn, not 3
into 1.
The above in RTL generates
(insn 7 4 8 2 (set (reg:V8QI 103 [ _6 ])
(vec_select:V8QI (subreg:V16QI (reg/v:VNx16QI 109 [ a ]) 0)
(parallel:V16QI [
(const_int 8 [0x8])
(const_int 9 [0x9])
(const_int 10 [0xa])
(const_int 11 [0xb])
(const_int 12 [0xc])
(const_int 13 [0xd])
(const_int 14 [0xe])
(const_int 15 [0xf])
]))) "":3174:43 -1
(nil))
Since the SVE and the Adv. SIMD modes are tieable this is a valid instruction to
make, however it's suboptimal in that we can't fold this into the existing
instruction patterns. Eventually early-ra will split off the SVE reg from the
patterns but by then we're passed combine and insn foldings so we miss all the
optimizations.
This patch introduces vec_extract optabs for 128-bit and 64-bit Adv.SIMD vector
extraction from SVE registers and emits an explicit separate instruction for the
subregs. This then gives combine and rtl folding the opportunity to form the
combined instructions and if not we arrive at the same RTL after early-ra.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (vec_extract<mode><v128>,
vec_extract<mode><v64>): New.
* config/aarch64/iterators.md (V64, v64): New.
* config/aarch64/predicates.md (const0_to_1_operand): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/fold_to_highpart_6.c: Update codegen.
* gcc.target/aarch64/sve/fold_to_highpart_1.c: New test.
* gcc.target/aarch64/sve/fold_to_highpart_2.c: New test.
Diffstat (limited to 'libjava/java/sql')
0 files changed, 0 insertions, 0 deletions
