<feed xmlns='http://www.w3.org/2005/Atom'>
<title>gcc.git/gcc/config, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/'/>
<entry>
<title>LoongArch: extract the base address to promote the combine of RTX.</title>
<updated>2025-11-22T07:38:23+00:00</updated>
<author>
<name>zhaozhou</name>
<email>zhaozhou@loongson.cn</email>
</author>
<published>2025-11-14T03:05:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=799ca4cda74dec3d567f2c6f417ab09cb53b5025'/>
<id>799ca4cda74dec3d567f2c6f417ab09cb53b5025</id>
<content type='text'>
When use 256 bits vec for move src to dest, extract the base address
what plus operation to promote the combine of RTX.

gcc/ChangeLog:

	* config/loongarch/loongarch.cc: Extract plus operation.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lasx/lasx-struct-move.c: New test.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When use 256 bits vec for move src to dest, extract the base address
what plus operation to promote the combine of RTX.

gcc/ChangeLog:

	* config/loongarch/loongarch.cc: Extract plus operation.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lasx/lasx-struct-move.c: New test.
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Optimize statement to use bstrins.{w|d}</title>
<updated>2025-11-22T07:13:25+00:00</updated>
<author>
<name>Deng Jianbo</name>
<email>dengjianbo@loongson.cn</email>
</author>
<published>2025-11-14T02:22:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=0483d99124fbd2c6dfebb0c735fdf077b37838da'/>
<id>0483d99124fbd2c6dfebb0c735fdf077b37838da</id>
<content type='text'>
For statement (a &lt;&lt; imm1) | (b &amp; imm2), in case the imm2 equals to
(1 &lt;&lt; imm1) - 1, it can be optimized to use bstrins.{w|d} instruction.

gcc/ChangeLog:

	* config/loongarch/loongarch.md
	(*bstrins_w_for_ior_ashift_and_extend): New template.
	(*bstrins_d_for_ior_ashift_and): New template.
	* config/loongarch/predicates.md (const_uimm63_operand): New
	predicate.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/bstrins-5.c: New test.
	* gcc.target/loongarch/bstrins-6.c: New test.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For statement (a &lt;&lt; imm1) | (b &amp; imm2), in case the imm2 equals to
(1 &lt;&lt; imm1) - 1, it can be optimized to use bstrins.{w|d} instruction.

gcc/ChangeLog:

	* config/loongarch/loongarch.md
	(*bstrins_w_for_ior_ashift_and_extend): New template.
	(*bstrins_d_for_ior_ashift_and): New template.
	* config/loongarch/predicates.md (const_uimm63_operand): New
	predicate.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/bstrins-5.c: New test.
	* gcc.target/loongarch/bstrins-6.c: New test.
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Optimize V4SImode vec_construct for load index length of two.</title>
<updated>2025-11-22T06:46:47+00:00</updated>
<author>
<name>zhaozhou</name>
<email>zhaozhou@loongson.cn</email>
</author>
<published>2025-11-14T03:18:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=4962d1309be98585ed05980eb7064dd5cc0d113a'/>
<id>4962d1309be98585ed05980eb7064dd5cc0d113a</id>
<content type='text'>
Under the V4SImode, the vec_construct with the load index {0, 1, 0, 1}
use vldrepl.d, the vec_construct with the load index {0, 1, 0, 0} use
vldrepl.d and vshuf4i, reduced the usage of scalar load and vinsgr2vr.

gcc/ChangeLog:

	* config/loongarch/lsx.md (lsx_vshuf4i_mem_w_0): Add template.
	(lsx_vldrepl_merge_w_0): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lsx/lsx-vec-construct-opt.c:
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Under the V4SImode, the vec_construct with the load index {0, 1, 0, 1}
use vldrepl.d, the vec_construct with the load index {0, 1, 0, 0} use
vldrepl.d and vshuf4i, reduced the usage of scalar load and vinsgr2vr.

gcc/ChangeLog:

	* config/loongarch/lsx.md (lsx_vshuf4i_mem_w_0): Add template.
	(lsx_vldrepl_merge_w_0): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lsx/lsx-vec-construct-opt.c:
</pre>
</div>
</content>
</entry>
<entry>
<title>aarch64: Extract aarch64_indirect_branch_asm for sibcall codegen</title>
<updated>2025-11-22T03:27:24+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2025-11-21T18:24:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=59a5fecfb260456dd60be687491717f3dbdb354f'/>
<id>59a5fecfb260456dd60be687491717f3dbdb354f</id>
<content type='text'>
Extract indirect branch assembly generation into a new function
aarch64_indirect_branch_asm, paralleling the existing
aarch64_indirect_call_asm function.  Replace the open-coded versions in
the sibcall patterns (*sibcall_insn and *sibcall_value_insn) so there
is a common helper for indirect branches where things like SLS mitigation
need to be handled.

gcc/ChangeLog:

	* config/aarch64/aarch64-protos.h (aarch64_indirect_branch_asm):
	Declare.
	* config/aarch64/aarch64.cc (aarch64_indirect_branch_asm): New
	function to generate indirect branch with SLS barrier.
	* config/aarch64/aarch64.md (*sibcall_insn): Use
	aarch64_indirect_branch_asm.
	(*sibcall_value_insn): Likewise.

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Extract indirect branch assembly generation into a new function
aarch64_indirect_branch_asm, paralleling the existing
aarch64_indirect_call_asm function.  Replace the open-coded versions in
the sibcall patterns (*sibcall_insn and *sibcall_value_insn) so there
is a common helper for indirect branches where things like SLS mitigation
need to be handled.

gcc/ChangeLog:

	* config/aarch64/aarch64-protos.h (aarch64_indirect_branch_asm):
	Declare.
	* config/aarch64/aarch64.cc (aarch64_indirect_branch_asm): New
	function to generate indirect branch with SLS barrier.
	* config/aarch64/aarch64.md (*sibcall_insn): Use
	aarch64_indirect_branch_asm.
	(*sibcall_value_insn): Likewise.

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>i386: Remove cond_{ashl,lshr,ashr}v{64,16,32}qi expanders [PR122598]</title>
<updated>2025-11-21T13:06:05+00:00</updated>
<author>
<name>Jakub Jelinek</name>
<email>jakub@redhat.com</email>
</author>
<published>2025-11-21T13:06:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=1c0897caa516bc564258266860e3b75054b9e78e'/>
<id>1c0897caa516bc564258266860e3b75054b9e78e</id>
<content type='text'>
As mentioned in the PR, the COND_SH{L,R} internal fns are expanded without
fallback, their expansion must succeed, and furthermore they don't
differentiate between scalar and vector shift counts, so again both have
to be supported.  That is the case of the {ashl,lshr,ashr}v*[hsd]i
patterns which use nonimmediate_or_const_vec_dup_operand predicate for
the shift count, so if the argument isn't const vec dup, it can be always
legitimized by loading into a vector register.
This is not the case of the QImode element conditional vector shifts,
there is no fallback for those and we emit individual element shifts
in that case when not conditional and shift count is not a constant.

So, I'm afraid we can't announce such an expander because then the
vectorizer etc. count with it being fully available.

As I've tried to show in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122598#c9
even without this pattern we can sometimes emit
        vgf2p8affineqb  $0, .LC0(%rip), %ymm0, %ymm0{%k1}
etc. instructions.

2025-11-21  Jakub Jelinek  &lt;jakub@redhat.com&gt;

	PR target/122598
	* config/i386/predicates.md (const_vec_dup_operand): Remove.
	* config/i386/sse.md (cond&lt;&lt;insn&gt;&lt;mode&gt; with VI1_AVX512VL iterator):
	Remove.

	* gcc.target/i386/pr122598.c: New test.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As mentioned in the PR, the COND_SH{L,R} internal fns are expanded without
fallback, their expansion must succeed, and furthermore they don't
differentiate between scalar and vector shift counts, so again both have
to be supported.  That is the case of the {ashl,lshr,ashr}v*[hsd]i
patterns which use nonimmediate_or_const_vec_dup_operand predicate for
the shift count, so if the argument isn't const vec dup, it can be always
legitimized by loading into a vector register.
This is not the case of the QImode element conditional vector shifts,
there is no fallback for those and we emit individual element shifts
in that case when not conditional and shift count is not a constant.

So, I'm afraid we can't announce such an expander because then the
vectorizer etc. count with it being fully available.

As I've tried to show in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122598#c9
even without this pattern we can sometimes emit
        vgf2p8affineqb  $0, .LC0(%rip), %ymm0, %ymm0{%k1}
etc. instructions.

2025-11-21  Jakub Jelinek  &lt;jakub@redhat.com&gt;

	PR target/122598
	* config/i386/predicates.md (const_vec_dup_operand): Remove.
	* config/i386/sse.md (cond&lt;&lt;insn&gt;&lt;mode&gt; with VI1_AVX512VL iterator):
	Remove.

	* gcc.target/i386/pr122598.c: New test.
</pre>
</div>
</content>
</entry>
<entry>
<title>arc: emit clobber of CC for -mcpu=em x &gt;&gt; 31</title>
<updated>2025-11-21T09:45:01+00:00</updated>
<author>
<name>Loeka Rogge</name>
<email>loeka@synopsys.com</email>
</author>
<published>2025-11-21T09:45:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=c435bbdd22a7fc2d18129f261fd2050e72f98411'/>
<id>c435bbdd22a7fc2d18129f261fd2050e72f98411</id>
<content type='text'>
Address PR target/120375

Devices without a barrel shifter end up using a sequence of
instructions. These can use the condition codes and/or loop count
register, so those need to be marked as 'clobbered'. These clobbers were
previously added only after split1, which is too late. This patch adds
these clobbers from the beginning, in the define_expand.

Previously, define_insn_and_split *&lt;insn&gt;si3_nobs would match any shift or
rotate instruction and would generate the necessary patterns to emulate a
barrel shifter, but it did not have any output assembly for itself.
In many cases this would create a loop with parallel clobbers. This pattern
is then matched by the &lt;insn&gt;si3_loop pattern.

In the no-barrel-shifter.c test tree code:

;; no-barrel-shifter.c:9:     int sign = (x &gt;&gt; 31) &amp; 1;
_2 = x.0_1 &gt;&gt; 31;

in the expand pass becomes the following pattern that matches *lshrsi3_nobs:

(insn 18 17 19 4 (set (reg:SI 153 [ _2 ])
        (lshiftrt:SI (reg/v:SI 156 [ x ])
            (const_int 31 [0x1f]))) "test2.c":9:24 -1
     (nil))

This pattern misses the necessary clobbers and remains untouched until the
split1 pass. Together with the later branch it becomes

;; no-barrel-shifter.c:9:     int sign = (x &gt;&gt; 31) &amp; 1;
	add.f	0,r0,r0
;; no-barrel-shifter.c:14:     if (mag == 0x7f800000)
	beq.d	.L8
;; no-barrel-shifter.c:9:     int sign = (x &gt;&gt; 31) &amp; 1;
	rlc	r0,0

Leading to an issue: the add.f instructions overwrites CC but beq expects
CC to contain an earlier value indicating mag == 0x7f800000.

Now, these are combined in define_insn_and_split &lt;insn&gt;si3_loop that is
explicitly emitted in the define_expand and already contains the clobbers.
This can then be split into another pattern or remain the loop pattern.

In the expand pass, the same example now becomes:

(insn 18 17 19 4 (parallel [
            (set (reg:SI 153 [ _2 ])
                (lshiftrt:SI (reg/v:SI 156 [ x ])
                    (const_int 31 [0x1f])))
            (clobber (reg:SI 60 lp_count))
            (clobber (reg:CC 61 cc))
        ]) "test2.c":9:24 -1
     (nil))

Because the correct clobbers are now taken into account, the branch condition
is reevaluated by using breq instead of br.

;; no-barrel-shifter.c:9:     int sign = (x &gt;&gt; 31) &amp; 1;
	add.f	0,r0,r0
	rlc	r0,0
;; no-barrel-shifter.c:14:     if (mag == 0x7f800000)
	breq	r2,2139095040,.L8

Regtested for arc.

	PR target/120375

gcc/ChangeLog:

	* config/arc/arc.md (*&lt;insn&gt;si3_nobs): merged with &lt;insn&gt;si3_loop.
	(&lt;insn&gt;si3_loop): splits to relevant pattern or emits loop assembly.
	(&lt;insn&gt;si3_cnt1_clobber): Removes clobber for shift or rotate by
	const1.

gcc/testsuite/ChangeLog:

	* gcc.target/arc/no-barrel-shifter.c: New test.

Co-authored-by: Keith Packard &lt;keithp@keithp.com&gt;
Signed-off-by: Loeka Rogge &lt;loeka@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Address PR target/120375

Devices without a barrel shifter end up using a sequence of
instructions. These can use the condition codes and/or loop count
register, so those need to be marked as 'clobbered'. These clobbers were
previously added only after split1, which is too late. This patch adds
these clobbers from the beginning, in the define_expand.

Previously, define_insn_and_split *&lt;insn&gt;si3_nobs would match any shift or
rotate instruction and would generate the necessary patterns to emulate a
barrel shifter, but it did not have any output assembly for itself.
In many cases this would create a loop with parallel clobbers. This pattern
is then matched by the &lt;insn&gt;si3_loop pattern.

In the no-barrel-shifter.c test tree code:

;; no-barrel-shifter.c:9:     int sign = (x &gt;&gt; 31) &amp; 1;
_2 = x.0_1 &gt;&gt; 31;

in the expand pass becomes the following pattern that matches *lshrsi3_nobs:

(insn 18 17 19 4 (set (reg:SI 153 [ _2 ])
        (lshiftrt:SI (reg/v:SI 156 [ x ])
            (const_int 31 [0x1f]))) "test2.c":9:24 -1
     (nil))

This pattern misses the necessary clobbers and remains untouched until the
split1 pass. Together with the later branch it becomes

;; no-barrel-shifter.c:9:     int sign = (x &gt;&gt; 31) &amp; 1;
	add.f	0,r0,r0
;; no-barrel-shifter.c:14:     if (mag == 0x7f800000)
	beq.d	.L8
;; no-barrel-shifter.c:9:     int sign = (x &gt;&gt; 31) &amp; 1;
	rlc	r0,0

Leading to an issue: the add.f instructions overwrites CC but beq expects
CC to contain an earlier value indicating mag == 0x7f800000.

Now, these are combined in define_insn_and_split &lt;insn&gt;si3_loop that is
explicitly emitted in the define_expand and already contains the clobbers.
This can then be split into another pattern or remain the loop pattern.

In the expand pass, the same example now becomes:

(insn 18 17 19 4 (parallel [
            (set (reg:SI 153 [ _2 ])
                (lshiftrt:SI (reg/v:SI 156 [ x ])
                    (const_int 31 [0x1f])))
            (clobber (reg:SI 60 lp_count))
            (clobber (reg:CC 61 cc))
        ]) "test2.c":9:24 -1
     (nil))

Because the correct clobbers are now taken into account, the branch condition
is reevaluated by using breq instead of br.

;; no-barrel-shifter.c:9:     int sign = (x &gt;&gt; 31) &amp; 1;
	add.f	0,r0,r0
	rlc	r0,0
;; no-barrel-shifter.c:14:     if (mag == 0x7f800000)
	breq	r2,2139095040,.L8

Regtested for arc.

	PR target/120375

gcc/ChangeLog:

	* config/arc/arc.md (*&lt;insn&gt;si3_nobs): merged with &lt;insn&gt;si3_loop.
	(&lt;insn&gt;si3_loop): splits to relevant pattern or emits loop assembly.
	(&lt;insn&gt;si3_cnt1_clobber): Removes clobber for shift or rotate by
	const1.

gcc/testsuite/ChangeLog:

	* gcc.target/arc/no-barrel-shifter.c: New test.

Co-authored-by: Keith Packard &lt;keithp@keithp.com&gt;
Signed-off-by: Loeka Rogge &lt;loeka@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arc: Use correct input operand for *extvsi_n_0 define_insn_and_split</title>
<updated>2025-11-21T08:52:19+00:00</updated>
<author>
<name>Claudiu Zissulescu</name>
<email>claziss@gmail.com</email>
</author>
<published>2025-11-21T08:46:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=56595464ff06ae7bdba7848f2ae583f2cac5dcac'/>
<id>56595464ff06ae7bdba7848f2ae583f2cac5dcac</id>
<content type='text'>
Correct the split condition of the instruction to happen after
reload. Relax operand 1 constrain too.

gcc/ChangeLog:

	* config/arc/arc.md: Modify define_insn_and_split "*extvsi_n_0"

gcc/testsuite/ChangeLog:

	* gcc.target/arc/extvsi-3.c: New test.

Co-authored-by: Michiel Derhaeg &lt;michiel@synopsys.com&gt;
Signed-off-by: Claudiu Zissulescu &lt;claziss@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Correct the split condition of the instruction to happen after
reload. Relax operand 1 constrain too.

gcc/ChangeLog:

	* config/arc/arc.md: Modify define_insn_and_split "*extvsi_n_0"

gcc/testsuite/ChangeLog:

	* gcc.target/arc/extvsi-3.c: New test.

Co-authored-by: Michiel Derhaeg &lt;michiel@synopsys.com&gt;
Signed-off-by: Claudiu Zissulescu &lt;claziss@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Add more numbers supported for {x}vldi</title>
<updated>2025-11-21T06:42:11+00:00</updated>
<author>
<name>Deng Jianbo</name>
<email>dengjianbo@loongson.cn</email>
</author>
<published>2025-11-17T07:28:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=cabfea3350ac55de8697bebf99a1456e2be4172c'/>
<id>cabfea3350ac55de8697bebf99a1456e2be4172c</id>
<content type='text'>
When the most significant bit of the 13 bit immediate value in LoongArch
{x}vldi isntruction is set 1, it can generate different numbers based on
the algorithm.  This patch adds to support these numbers to be
generated by {x}vldi instruction.

gcc/ChangeLog:

	* config/loongarch/constraints.md: Update constraint YI to support
	more numbers.
	* config/loongarch/loongarch-protos.h
	(loongarch_const_vector_vrepli): Rename.
	(loongarch_const_vector_vldi): Ditto.
	* config/loongarch/loongarch.cc (VLDI_NEG_MASK): New macro.
	(loongarch_parse_vldi_const): New function to check if numbers can
	be generated by {x}vldi instruction.
	(loongarch_const_vector_vrepli): Rename.
	(loongarch_const_vector_vldi): Use above function.
	(loongarch_const_insns): Call renamed function.
	(loongarch_split_vector_move_p): Ditto.
	(loongarch_output_move): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lasx/lasx-builtin.c: Replace xvrepli
	with xvldi.
	* gcc.target/loongarch/vector/lasx/lasx-vec-init-2.c: Fix test.
	* gcc.target/loongarch/vector/lsx/lsx-builtin.c: Repalce vrepli with
	vldi.
	* gcc.target/loongarch/vrepli.c: Ditto.
	* gcc.target/loongarch/vector/lasx/lasx-xvldi-2.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-vldi-2.c: New test.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the most significant bit of the 13 bit immediate value in LoongArch
{x}vldi isntruction is set 1, it can generate different numbers based on
the algorithm.  This patch adds to support these numbers to be
generated by {x}vldi instruction.

gcc/ChangeLog:

	* config/loongarch/constraints.md: Update constraint YI to support
	more numbers.
	* config/loongarch/loongarch-protos.h
	(loongarch_const_vector_vrepli): Rename.
	(loongarch_const_vector_vldi): Ditto.
	* config/loongarch/loongarch.cc (VLDI_NEG_MASK): New macro.
	(loongarch_parse_vldi_const): New function to check if numbers can
	be generated by {x}vldi instruction.
	(loongarch_const_vector_vrepli): Rename.
	(loongarch_const_vector_vldi): Use above function.
	(loongarch_const_insns): Call renamed function.
	(loongarch_split_vector_move_p): Ditto.
	(loongarch_output_move): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lasx/lasx-builtin.c: Replace xvrepli
	with xvldi.
	* gcc.target/loongarch/vector/lasx/lasx-vec-init-2.c: Fix test.
	* gcc.target/loongarch/vector/lsx/lsx-builtin.c: Repalce vrepli with
	vldi.
	* gcc.target/loongarch/vrepli.c: Ditto.
	* gcc.target/loongarch/vector/lasx/lasx-xvldi-2.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-vldi-2.c: New test.
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Fix operands[2] predicate of lsx_vreplvei_mirror.</title>
<updated>2025-11-21T06:38:34+00:00</updated>
<author>
<name>zhaozhou</name>
<email>zhaozhou@loongson.cn</email>
</author>
<published>2025-11-14T03:09:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=1efe2bdc08796912cf872b21a0b960f1af673043'/>
<id>1efe2bdc08796912cf872b21a0b960f1af673043</id>
<content type='text'>
UNSPEC_LSX_VREPLVEI_MIRROR describes the mirroring operation that copies
the lower 64 bits of a 128-bit register to the upper 64 bits. So in any
mode, the value range of op2 can only be 0 or 1 for the vreplvei.d insn.

gcc/ChangeLog:

	* config/loongarch/lsx.md: Fix predicate.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
UNSPEC_LSX_VREPLVEI_MIRROR describes the mirroring operation that copies
the lower 64 bits of a 128-bit register to the upper 64 bits. So in any
mode, the value range of op2 can only be 0 or 1 for the vreplvei.d insn.

gcc/ChangeLog:

	* config/loongarch/lsx.md: Fix predicate.
</pre>
</div>
</content>
</entry>
<entry>
<title>RISC-V: Add RTL pass to combine cm.popret with zero return value</title>
<updated>2025-11-20T15:38:14+00:00</updated>
<author>
<name>Kito Cheng</name>
<email>kito.cheng@sifive.com</email>
</author>
<published>2025-11-05T09:55:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=c738d4ef524e66c1aa13e949e701473a7dcc4db6'/>
<id>c738d4ef524e66c1aa13e949e701473a7dcc4db6</id>
<content type='text'>
This patch implements a new RTL pass that combines "li a0, 0" and
"cm.popret" into a single "cm.popretz" instruction for the Zcmp
extension.

This optimization cannot be done during prologue/epilogue expansion
because it would cause shrink-wrapping to generate incorrect code as
documented in PR113715. The dedicated RTL pass runs after shrink-wrap
but before branch shortening, safely performing this combination.

Changes since v2:
- Apply Jeff's comment
  - Use CONST0_RTX rather than const0_rtx, this make this pass able to
    handle (const_double:SF 0.0) as well.
- Adding test case for float/double zero return value.
Changes since v1:
- Tweak the testcase.

gcc/ChangeLog:

	* config/riscv/riscv-opt-popretz.cc: New file.
	* config/riscv/riscv-passes.def: Insert pass_combine_popretz before
	pass_shorten_branches.
	* config/riscv/riscv-protos.h (make_pass_combine_popretz): New
	declaration.
	* config/riscv/t-riscv: Add riscv-opt-popretz.o build rule.
	* config.gcc (riscv*): Add riscv-opt-popretz.o to extra_objs.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/pr113715.c: New test.
	* gcc.target/riscv/rv32e_zcmp.c: Update expected output for
	test_popretz.
	* gcc.target/riscv/rv32i_zcmp.c: Likewise.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch implements a new RTL pass that combines "li a0, 0" and
"cm.popret" into a single "cm.popretz" instruction for the Zcmp
extension.

This optimization cannot be done during prologue/epilogue expansion
because it would cause shrink-wrapping to generate incorrect code as
documented in PR113715. The dedicated RTL pass runs after shrink-wrap
but before branch shortening, safely performing this combination.

Changes since v2:
- Apply Jeff's comment
  - Use CONST0_RTX rather than const0_rtx, this make this pass able to
    handle (const_double:SF 0.0) as well.
- Adding test case for float/double zero return value.
Changes since v1:
- Tweak the testcase.

gcc/ChangeLog:

	* config/riscv/riscv-opt-popretz.cc: New file.
	* config/riscv/riscv-passes.def: Insert pass_combine_popretz before
	pass_shorten_branches.
	* config/riscv/riscv-protos.h (make_pass_combine_popretz): New
	declaration.
	* config/riscv/t-riscv: Add riscv-opt-popretz.o build rule.
	* config.gcc (riscv*): Add riscv-opt-popretz.o to extra_objs.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/pr113715.c: New test.
	* gcc.target/riscv/rv32e_zcmp.c: Update expected output for
	test_popretz.
	* gcc.target/riscv/rv32i_zcmp.c: Likewise.
</pre>
</div>
</content>
</entry>
</feed>
