gcc.git/gcc/config, branch master

LoongArch: extract the base address to promote the combine of RTX.

2025-11-22T07:38:23+00:00

When use 256 bits vec for move src to dest, extract the base address
what plus operation to promote the combine of RTX.

gcc/ChangeLog:

	* config/loongarch/loongarch.cc: Extract plus operation.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lasx/lasx-struct-move.c: New test.

LoongArch: Optimize statement to use bstrins.{w|d}

2025-11-22T07:13:25+00:00

For statement (a << imm1) | (b & imm2), in case the imm2 equals to
(1 << imm1) - 1, it can be optimized to use bstrins.{w|d} instruction.

gcc/ChangeLog:

	* config/loongarch/loongarch.md
	(*bstrins_w_for_ior_ashift_and_extend): New template.
	(*bstrins_d_for_ior_ashift_and): New template.
	* config/loongarch/predicates.md (const_uimm63_operand): New
	predicate.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/bstrins-5.c: New test.
	* gcc.target/loongarch/bstrins-6.c: New test.

LoongArch: Optimize V4SImode vec_construct for load index length of two.

2025-11-22T06:46:47+00:00

Under the V4SImode, the vec_construct with the load index {0, 1, 0, 1}
use vldrepl.d, the vec_construct with the load index {0, 1, 0, 0} use
vldrepl.d and vshuf4i, reduced the usage of scalar load and vinsgr2vr.

gcc/ChangeLog:

	* config/loongarch/lsx.md (lsx_vshuf4i_mem_w_0): Add template.
	(lsx_vldrepl_merge_w_0): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lsx/lsx-vec-construct-opt.c:

aarch64: Extract aarch64_indirect_branch_asm for sibcall codegen

2025-11-22T03:27:24+00:00

Extract indirect branch assembly generation into a new function
aarch64_indirect_branch_asm, paralleling the existing
aarch64_indirect_call_asm function.  Replace the open-coded versions in
the sibcall patterns (*sibcall_insn and *sibcall_value_insn) so there
is a common helper for indirect branches where things like SLS mitigation
need to be handled.

gcc/ChangeLog:

	* config/aarch64/aarch64-protos.h (aarch64_indirect_branch_asm):
	Declare.
	* config/aarch64/aarch64.cc (aarch64_indirect_branch_asm): New
	function to generate indirect branch with SLS barrier.
	* config/aarch64/aarch64.md (*sibcall_insn): Use
	aarch64_indirect_branch_asm.
	(*sibcall_value_insn): Likewise.

Signed-off-by: Kees Cook

i386: Remove cond_{ashl,lshr,ashr}v{64,16,32}qi expanders [PR122598]

2025-11-21T13:06:05+00:00

As mentioned in the PR, the COND_SH{L,R} internal fns are expanded without
fallback, their expansion must succeed, and furthermore they don't
differentiate between scalar and vector shift counts, so again both have
to be supported.  That is the case of the {ashl,lshr,ashr}v*[hsd]i
patterns which use nonimmediate_or_const_vec_dup_operand predicate for
the shift count, so if the argument isn't const vec dup, it can be always
legitimized by loading into a vector register.
This is not the case of the QImode element conditional vector shifts,
there is no fallback for those and we emit individual element shifts
in that case when not conditional and shift count is not a constant.

So, I'm afraid we can't announce such an expander because then the
vectorizer etc. count with it being fully available.

As I've tried to show in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122598#c9
even without this pattern we can sometimes emit
        vgf2p8affineqb  $0, .LC0(%rip), %ymm0, %ymm0{%k1}
etc. instructions.

2025-11-21  Jakub Jelinek  

	PR target/122598
	* config/i386/predicates.md (const_vec_dup_operand): Remove.
	* config/i386/sse.md (cond< with VI1_AVX512VL iterator):
	Remove.

	* gcc.target/i386/pr122598.c: New test.

arc: emit clobber of CC for -mcpu=em x >> 31

2025-11-21T09:45:01+00:00

Address PR target/120375

Devices without a barrel shifter end up using a sequence of
instructions. These can use the condition codes and/or loop count
register, so those need to be marked as 'clobbered'. These clobbers were
previously added only after split1, which is too late. This patch adds
these clobbers from the beginning, in the define_expand.

Previously, define_insn_and_split *si3_nobs would match any shift or
rotate instruction and would generate the necessary patterns to emulate a
barrel shifter, but it did not have any output assembly for itself.
In many cases this would create a loop with parallel clobbers. This pattern
is then matched by the si3_loop pattern.

In the no-barrel-shifter.c test tree code:

;; no-barrel-shifter.c:9:     int sign = (x >> 31) & 1;
_2 = x.0_1 >> 31;

in the expand pass becomes the following pattern that matches *lshrsi3_nobs:

(insn 18 17 19 4 (set (reg:SI 153 [ _2 ])
        (lshiftrt:SI (reg/v:SI 156 [ x ])
            (const_int 31 [0x1f]))) "test2.c":9:24 -1
     (nil))

This pattern misses the necessary clobbers and remains untouched until the
split1 pass. Together with the later branch it becomes

;; no-barrel-shifter.c:9:     int sign = (x >> 31) & 1;
	add.f	0,r0,r0
;; no-barrel-shifter.c:14:     if (mag == 0x7f800000)
	beq.d	.L8
;; no-barrel-shifter.c:9:     int sign = (x >> 31) & 1;
	rlc	r0,0

Leading to an issue: the add.f instructions overwrites CC but beq expects
CC to contain an earlier value indicating mag == 0x7f800000.

Now, these are combined in define_insn_and_split si3_loop that is
explicitly emitted in the define_expand and already contains the clobbers.
This can then be split into another pattern or remain the loop pattern.

In the expand pass, the same example now becomes:

(insn 18 17 19 4 (parallel [
            (set (reg:SI 153 [ _2 ])
                (lshiftrt:SI (reg/v:SI 156 [ x ])
                    (const_int 31 [0x1f])))
            (clobber (reg:SI 60 lp_count))
            (clobber (reg:CC 61 cc))
        ]) "test2.c":9:24 -1
     (nil))

Because the correct clobbers are now taken into account, the branch condition
is reevaluated by using breq instead of br.

;; no-barrel-shifter.c:9:     int sign = (x >> 31) & 1;
	add.f	0,r0,r0
	rlc	r0,0
;; no-barrel-shifter.c:14:     if (mag == 0x7f800000)
	breq	r2,2139095040,.L8

Regtested for arc.

	PR target/120375

gcc/ChangeLog:

	* config/arc/arc.md (*si3_nobs): merged with si3_loop.
	(si3_loop): splits to relevant pattern or emits loop assembly.
	(si3_cnt1_clobber): Removes clobber for shift or rotate by
	const1.

gcc/testsuite/ChangeLog:

	* gcc.target/arc/no-barrel-shifter.c: New test.

Co-authored-by: Keith Packard 
Signed-off-by: Loeka Rogge

arc: Use correct input operand for *extvsi_n_0 define_insn_and_split

2025-11-21T08:52:19+00:00

Correct the split condition of the instruction to happen after
reload. Relax operand 1 constrain too.

gcc/ChangeLog:

	* config/arc/arc.md: Modify define_insn_and_split "*extvsi_n_0"

gcc/testsuite/ChangeLog:

	* gcc.target/arc/extvsi-3.c: New test.

Co-authored-by: Michiel Derhaeg 
Signed-off-by: Claudiu Zissulescu

LoongArch: Add more numbers supported for {x}vldi

2025-11-21T06:42:11+00:00

When the most significant bit of the 13 bit immediate value in LoongArch
{x}vldi isntruction is set 1, it can generate different numbers based on
the algorithm.  This patch adds to support these numbers to be
generated by {x}vldi instruction.

gcc/ChangeLog:

	* config/loongarch/constraints.md: Update constraint YI to support
	more numbers.
	* config/loongarch/loongarch-protos.h
	(loongarch_const_vector_vrepli): Rename.
	(loongarch_const_vector_vldi): Ditto.
	* config/loongarch/loongarch.cc (VLDI_NEG_MASK): New macro.
	(loongarch_parse_vldi_const): New function to check if numbers can
	be generated by {x}vldi instruction.
	(loongarch_const_vector_vrepli): Rename.
	(loongarch_const_vector_vldi): Use above function.
	(loongarch_const_insns): Call renamed function.
	(loongarch_split_vector_move_p): Ditto.
	(loongarch_output_move): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lasx/lasx-builtin.c: Replace xvrepli
	with xvldi.
	* gcc.target/loongarch/vector/lasx/lasx-vec-init-2.c: Fix test.
	* gcc.target/loongarch/vector/lsx/lsx-builtin.c: Repalce vrepli with
	vldi.
	* gcc.target/loongarch/vrepli.c: Ditto.
	* gcc.target/loongarch/vector/lasx/lasx-xvldi-2.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-vldi-2.c: New test.

LoongArch: Fix operands[2] predicate of lsx_vreplvei_mirror.

2025-11-21T06:38:34+00:00

UNSPEC_LSX_VREPLVEI_MIRROR describes the mirroring operation that copies
the lower 64 bits of a 128-bit register to the upper 64 bits. So in any
mode, the value range of op2 can only be 0 or 1 for the vreplvei.d insn.

gcc/ChangeLog:

	* config/loongarch/lsx.md: Fix predicate.

RISC-V: Add RTL pass to combine cm.popret with zero return value

2025-11-20T15:38:14+00:00

This patch implements a new RTL pass that combines "li a0, 0" and
"cm.popret" into a single "cm.popretz" instruction for the Zcmp
extension.

This optimization cannot be done during prologue/epilogue expansion
because it would cause shrink-wrapping to generate incorrect code as
documented in PR113715. The dedicated RTL pass runs after shrink-wrap
but before branch shortening, safely performing this combination.

Changes since v2:
- Apply Jeff's comment
  - Use CONST0_RTX rather than const0_rtx, this make this pass able to
    handle (const_double:SF 0.0) as well.
- Adding test case for float/double zero return value.
Changes since v1:
- Tweak the testcase.

gcc/ChangeLog:

	* config/riscv/riscv-opt-popretz.cc: New file.
	* config/riscv/riscv-passes.def: Insert pass_combine_popretz before
	pass_shorten_branches.
	* config/riscv/riscv-protos.h (make_pass_combine_popretz): New
	declaration.
	* config/riscv/t-riscv: Add riscv-opt-popretz.o build rule.
	* config.gcc (riscv*): Add riscv-opt-popretz.o to extra_objs.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/pr113715.c: New test.
	* gcc.target/riscv/rv32e_zcmp.c: Update expected output for
	test_popretz.
	* gcc.target/riscv/rv32i_zcmp.c: Likewise.