<feed xmlns='http://www.w3.org/2005/Atom'>
<title>gcc.git, branch basepoints/gcc-14</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/'/>
<entry>
<title>Bump BASE-VER.</title>
<updated>2023-04-17T11:52:56+00:00</updated>
<author>
<name>Jakub Jelinek</name>
<email>jakub@redhat.com</email>
</author>
<published>2023-04-17T11:52:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=48f0f297774bbc38ae8cb8bf12212e124fe479ad'/>
<id>48f0f297774bbc38ae8cb8bf12212e124fe479ad</id>
<content type='text'>
2023-04-17  Jakub Jelinek  &lt;jakub@redhat.com&gt;

	* BASE-VER: Set to 14.0.0.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
2023-04-17  Jakub Jelinek  &lt;jakub@redhat.com&gt;

	* BASE-VER: Set to 14.0.0.
</pre>
</div>
</content>
</entry>
<entry>
<title>ipa: Fix double reference-count decrements for the same edge (PR 107769, PR 109318)</title>
<updated>2023-04-17T11:04:49+00:00</updated>
<author>
<name>Martin Jambor</name>
<email>mjambor@suse.cz</email>
</author>
<published>2023-04-17T10:59:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=8e08c7886eed5824bebd0e011526ec302d622844'/>
<id>8e08c7886eed5824bebd0e011526ec302d622844</id>
<content type='text'>
It turns out that since addition of the code that can identify globals
which are only read from, the code that keeps track of the references
can decrement their count for the same calls, once during IPA-CP and
then again during inlining.  Fixed by adding a special flag to the
pass-through variant and simply wiping out the reference to the
refdesc structure from the constant ones.

Moreover, during debugging of the issue I have discovered that the
code removing references could remove a reference associated with the
same statement but of a wrong type.  In all cases it wanted to remove
an IPA_REF_ADDR reference so removing a lesser one instead should do
no harm in practice, but we should try to be consistent and so this
patch extends symtab_node::find_reference so that it searches for a
reference of a given type only.

gcc/ChangeLog:

2023-04-14  Martin Jambor  &lt;mjambor@suse.cz&gt;

	PR ipa/107769
	PR ipa/109318
	* cgraph.h (symtab_node::find_reference): Add parameter use_type.
	* ipa-prop.h (ipa_pass_through_data): New flag refdesc_decremented.
	(ipa_zap_jf_refdesc): New function.
	(ipa_get_jf_pass_through_refdesc_decremented): Likewise.
	(ipa_set_jf_pass_through_refdesc_decremented): Likewise.
	* ipa-cp.cc (ipcp_discover_new_direct_edges): Provide a value for
	the new parameter of find_reference.
	(adjust_references_in_caller): Likewise. Make sure the constant jump
	function is not used to decrement a refdec counter again.  Only
	decrement refdesc counters when the pass_through jump function allows
	it.  Added a detailed dump when decrementing refdesc counters.
	* ipa-prop.cc (ipa_print_node_jump_functions_for_edge): Dump new flag.
	(ipa_set_jf_simple_pass_through): Initialize the new flag.
	(ipa_set_jf_unary_pass_through): Likewise.
	(ipa_set_jf_arith_pass_through): Likewise.
	(remove_described_reference): Provide a value for the new parameter of
	find_reference.
	(update_jump_functions_after_inlining): Zap refdesc of new jfunc if
	the previous pass_through had a flag mandating that we do so.
	(propagate_controlled_uses): Likewise.  Only decrement refdesc
	counters when the pass_through jump function allows it.
	(ipa_edge_args_sum_t::duplicate): Provide a value for the new
	parameter of find_reference.
	(ipa_write_jump_function): Assert the new flag does not have to be
	streamed.
	* symtab.cc (symtab_node::find_reference): Add parameter use_type, use
	it in searching.

gcc/testsuite/ChangeLog:

2023-04-06  Martin Jambor  &lt;mjambor@suse.cz&gt;

	PR ipa/107769
	PR ipa/109318
	* gcc.dg/ipa/pr109318.c: New test.
	* gcc.dg/lto/pr107769_0.c: Likewise.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It turns out that since addition of the code that can identify globals
which are only read from, the code that keeps track of the references
can decrement their count for the same calls, once during IPA-CP and
then again during inlining.  Fixed by adding a special flag to the
pass-through variant and simply wiping out the reference to the
refdesc structure from the constant ones.

Moreover, during debugging of the issue I have discovered that the
code removing references could remove a reference associated with the
same statement but of a wrong type.  In all cases it wanted to remove
an IPA_REF_ADDR reference so removing a lesser one instead should do
no harm in practice, but we should try to be consistent and so this
patch extends symtab_node::find_reference so that it searches for a
reference of a given type only.

gcc/ChangeLog:

2023-04-14  Martin Jambor  &lt;mjambor@suse.cz&gt;

	PR ipa/107769
	PR ipa/109318
	* cgraph.h (symtab_node::find_reference): Add parameter use_type.
	* ipa-prop.h (ipa_pass_through_data): New flag refdesc_decremented.
	(ipa_zap_jf_refdesc): New function.
	(ipa_get_jf_pass_through_refdesc_decremented): Likewise.
	(ipa_set_jf_pass_through_refdesc_decremented): Likewise.
	* ipa-cp.cc (ipcp_discover_new_direct_edges): Provide a value for
	the new parameter of find_reference.
	(adjust_references_in_caller): Likewise. Make sure the constant jump
	function is not used to decrement a refdec counter again.  Only
	decrement refdesc counters when the pass_through jump function allows
	it.  Added a detailed dump when decrementing refdesc counters.
	* ipa-prop.cc (ipa_print_node_jump_functions_for_edge): Dump new flag.
	(ipa_set_jf_simple_pass_through): Initialize the new flag.
	(ipa_set_jf_unary_pass_through): Likewise.
	(ipa_set_jf_arith_pass_through): Likewise.
	(remove_described_reference): Provide a value for the new parameter of
	find_reference.
	(update_jump_functions_after_inlining): Zap refdesc of new jfunc if
	the previous pass_through had a flag mandating that we do so.
	(propagate_controlled_uses): Likewise.  Only decrement refdesc
	counters when the pass_through jump function allows it.
	(ipa_edge_args_sum_t::duplicate): Provide a value for the new
	parameter of find_reference.
	(ipa_write_jump_function): Assert the new flag does not have to be
	streamed.
	* symtab.cc (symtab_node::find_reference): Add parameter use_type, use
	it in searching.

gcc/testsuite/ChangeLog:

2023-04-06  Martin Jambor  &lt;mjambor@suse.cz&gt;

	PR ipa/107769
	PR ipa/109318
	* gcc.dg/ipa/pr109318.c: New test.
	* gcc.dg/lto/pr107769_0.c: Likewise.
</pre>
</div>
</content>
</entry>
<entry>
<title>aarch64: disable LDP via tuning structure for -mcpu=ampere1</title>
<updated>2023-04-17T10:15:04+00:00</updated>
<author>
<name>Philipp Tomsich</name>
<email>philipp.tomsich@vrull.eu</email>
</author>
<published>2023-03-23T18:47:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=f200c56787f2c6f93ffb739d57d01a294ab72f68'/>
<id>f200c56787f2c6f93ffb739d57d01a294ab72f68</id>
<content type='text'>
AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).

Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.

This commit:
 * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
 * allows -moverride=tune=... to override this

These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
   503.bwaves_r.      -0.88%
   507.cactuBSSN_r     0.35%
   508.namd_r          3.09%
   510.parest_r       -2.99%
   511.povray_r        5.54%
   519.lbm_r          15.83%
   521.wrf_r           0.56%
   526.blender_r       2.47%
   527.cam4_r          0.70%
   538.imagick_r       0.00%
   544.nab_r          -0.33%
   549.fotonik3d_r.   -0.42%
   554.roms_r          0.00%
   -------------------------
   = total             1.79%

Signed-off-by: Philipp Tomsich &lt;philipp.tomsich@vrull.eu&gt;
Co-Authored-By: Di Zhao &lt;di.zhao@amperecomputing.com&gt;

gcc/ChangeLog:

	* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
	Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
	* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
	Check for the above tuning option when processing loads.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).

Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.

This commit:
 * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
 * allows -moverride=tune=... to override this

These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
   503.bwaves_r.      -0.88%
   507.cactuBSSN_r     0.35%
   508.namd_r          3.09%
   510.parest_r       -2.99%
   511.povray_r        5.54%
   519.lbm_r          15.83%
   521.wrf_r           0.56%
   526.blender_r       2.47%
   527.cam4_r          0.70%
   538.imagick_r       0.00%
   544.nab_r          -0.33%
   549.fotonik3d_r.   -0.42%
   554.roms_r          0.00%
   -------------------------
   = total             1.79%

Signed-off-by: Philipp Tomsich &lt;philipp.tomsich@vrull.eu&gt;
Co-Authored-By: Di Zhao &lt;di.zhao@amperecomputing.com&gt;

gcc/ChangeLog:

	* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
	Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
	* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
	Check for the above tuning option when processing loads.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.
</pre>
</div>
</content>
</entry>
<entry>
<title>testsuite: Fix up vect-simd-clone-1[678]f.c tests some more</title>
<updated>2023-04-17T09:45:53+00:00</updated>
<author>
<name>Jakub Jelinek</name>
<email>jakub@redhat.com</email>
</author>
<published>2023-04-17T09:45:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=7ec03c41de320fb747fa2a90f5d3b6db3aa4dde1'/>
<id>7ec03c41de320fb747fa2a90f5d3b6db3aa4dde1</id>
<content type='text'>
With
make check-gcc check-g++ -j32 -k RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mavx,-m32/-mavx512f,-m32/-march=cascadelake,-m64,-m64/-mavx,-m64/-mavx512f,-m64/-march=cascadelake\}
+vect.exp=vect-simd-clone*'
the vect-simd-clone-1[678]f.c tests fail with -m32/-mavx512f and -m32/-march=cascadelake,
in that case there are zero matches rather than the 4 expected for ia32.
-m64/-mavx512f and -m64/-march=cascadelake works fine though (2 expected
matches).

So, the following patch just adds -mno-avx512f for x86 non-lp64.

2023-04-17  Jakub Jelinek  &lt;jakub@redhat.com&gt;

	* gcc.dg/vect/vect-simd-clone-16f.c: Add -mno-avx512f for non-lp64 x86.
	* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With
make check-gcc check-g++ -j32 -k RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mavx,-m32/-mavx512f,-m32/-march=cascadelake,-m64,-m64/-mavx,-m64/-mavx512f,-m64/-march=cascadelake\}
+vect.exp=vect-simd-clone*'
the vect-simd-clone-1[678]f.c tests fail with -m32/-mavx512f and -m32/-march=cascadelake,
in that case there are zero matches rather than the 4 expected for ia32.
-m64/-mavx512f and -m64/-march=cascadelake works fine though (2 expected
matches).

So, the following patch just adds -mno-avx512f for x86 non-lp64.

2023-04-17  Jakub Jelinek  &lt;jakub@redhat.com&gt;

	* gcc.dg/vect/vect-simd-clone-16f.c: Add -mno-avx512f for non-lp64 x86.
	* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
</pre>
</div>
</content>
</entry>
<entry>
<title>tree-optimization/109524 - ICE with VRP edge removal</title>
<updated>2023-04-17T09:02:29+00:00</updated>
<author>
<name>Richard Biener</name>
<email>rguenther@suse.de</email>
</author>
<published>2023-04-17T07:22:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=f66ae49bba7d3b8c999498a0e166c0a2f99ec61a'/>
<id>f66ae49bba7d3b8c999498a0e166c0a2f99ec61a</id>
<content type='text'>
VRP queues edges to process late for updating global ranges for
__builtin_unreachable.  But this interferes with edge removal
from substitute_and_fold.  The following deals with this by
looking up the edge with source/dest block indices which do not
become stale.

	PR tree-optimization/109524
	* tree-vrp.cc (remove_unreachable::m_list): Change to a
	vector of pairs of block indices.
	(remove_unreachable::maybe_register_block): Adjust.
	(remove_unreachable::remove_and_update_globals): Likewise.
	Deal with removed blocks.

	* g++.dg/pr109524.C: New testcase.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
VRP queues edges to process late for updating global ranges for
__builtin_unreachable.  But this interferes with edge removal
from substitute_and_fold.  The following deals with this by
looking up the edge with source/dest block indices which do not
become stale.

	PR tree-optimization/109524
	* tree-vrp.cc (remove_unreachable::m_list): Change to a
	vector of pairs of block indices.
	(remove_unreachable::maybe_register_block): Adjust.
	(remove_unreachable::remove_and_update_globals): Likewise.
	Deal with removed blocks.

	* g++.dg/pr109524.C: New testcase.
</pre>
</div>
</content>
</entry>
<entry>
<title>testsuite: update builtins-5-p9-runnable.c for BE</title>
<updated>2023-04-17T02:39:26+00:00</updated>
<author>
<name>Jiufu Guo</name>
<email>guojiufu@linux.ibm.com</email>
</author>
<published>2023-04-14T02:50:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=a1f25e04b8d10bbe5dcaf98bb7e17cdaec9f169d'/>
<id>a1f25e04b8d10bbe5dcaf98bb7e17cdaec9f169d</id>
<content type='text'>
Hi,

As PR108809 mentioned, vec_xl_len_r and vec_xst_len_r are tested
in gcc.target/powerpc/builtins-5-p9-runnable.c.
The vector operand of these two bifs are different from the view
of v16_int8 between BE and LE, even it is same from the view of
128bits(uint128/V1TI).

The test case gcc.target/powerpc/builtins-5-p9-runnable.c was
written for LE environment, this patch updates it for BE.

Tested on ppc64 BE and LE.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/testsuite/ChangeLog:

	PR testsuite/108809
	* gcc.target/powerpc/builtins-5-p9-runnable.c: Update for BE.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hi,

As PR108809 mentioned, vec_xl_len_r and vec_xst_len_r are tested
in gcc.target/powerpc/builtins-5-p9-runnable.c.
The vector operand of these two bifs are different from the view
of v16_int8 between BE and LE, even it is same from the view of
128bits(uint128/V1TI).

The test case gcc.target/powerpc/builtins-5-p9-runnable.c was
written for LE environment, this patch updates it for BE.

Tested on ppc64 BE and LE.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/testsuite/ChangeLog:

	PR testsuite/108809
	* gcc.target/powerpc/builtins-5-p9-runnable.c: Update for BE.
</pre>
</div>
</content>
</entry>
<entry>
<title>RISC-V: Fix testsuite fail on RV32</title>
<updated>2023-04-17T01:51:36+00:00</updated>
<author>
<name>Kito Cheng</name>
<email>kito.cheng@sifive.com</email>
</author>
<published>2023-04-14T07:34:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=2e6b57196dd0e1f4b308abb958f5f905f0ba6aba'/>
<id>2e6b57196dd0e1f4b308abb958f5f905f0ba6aba</id>
<content type='text'>
gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/scalar_move-2.c: Adjust include way
	for riscv_vector.h
	* gcc.target/riscv/rvv/base/spill-sp-adjust.c: Add missing
	-mabi.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/scalar_move-2.c: Adjust include way
	for riscv_vector.h
	* gcc.target/riscv/rvv/base/spill-sp-adjust.c: Add missing
	-mabi.
</pre>
</div>
</content>
</entry>
<entry>
<title>RISC-V: Add test cases for the RVV mask insn shortcut.</title>
<updated>2023-04-17T01:51:35+00:00</updated>
<author>
<name>Pan Li</name>
<email>pan2.li@intel.com</email>
</author>
<published>2023-04-14T03:25:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=0c4d366ef757da28800f786fb5ea02b6e4918719'/>
<id>0c4d366ef757da28800f786fb5ea02b6e4918719</id>
<content type='text'>
There are sorts of shortcut codegen for the RVV mask insn. For
example.

vmxor vd, va, va =&gt; vmclr vd.

We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.

Signed-off-by: Pan Li &lt;pan2.li@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are sorts of shortcut codegen for the RVV mask insn. For
example.

vmxor vd, va, va =&gt; vmclr vd.

We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.

Signed-off-by: Pan Li &lt;pan2.li@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Daily bump.</title>
<updated>2023-04-17T00:17:00+00:00</updated>
<author>
<name>GCC Administrator</name>
<email>gccadmin@gcc.gnu.org</email>
</author>
<published>2023-04-17T00:17:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=a167416a239a4afcc7e89d2ccdea3ffa318defac'/>
<id>a167416a239a4afcc7e89d2ccdea3ffa318defac</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[committed] [PR target/109508] Adjust conditional move expansion for SFB</title>
<updated>2023-04-16T15:56:21+00:00</updated>
<author>
<name>Jeff Law</name>
<email>jlaw@ventanamicro</email>
</author>
<published>2023-04-16T15:55:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/gcc.git/commit/?id=a647198fcf7463a42c8e035a429200e7998735dc'/>
<id>a647198fcf7463a42c8e035a429200e7998735dc</id>
<content type='text'>
Recently the conditional move expander's predicates were loosened for the
benefit of the THEAD processors.  In particular one operand that was
previously "register_operand" is now "reg_or_0_operand".  That's fine for
THEAD, but breaks for SFB which requires a register for that operand.

This results in an ICE when compiling the testcase an SFB target such as
the sifive s76.

This change adjusts the expansion code slightly to copy the value into
a register for SFB.

Bootstrapped and regression tested (c,c++,fortran only) with a toolchain
configured to enable SFB by default.

	PR target/109508
gcc/

	* config/riscv/riscv.cc (riscv_expand_conditional_move): For
	TARGET_SFB_ALU, force the true arm into a register.

gcc/testsuite
	* gcc.target/riscv/pr109508.c: New test.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recently the conditional move expander's predicates were loosened for the
benefit of the THEAD processors.  In particular one operand that was
previously "register_operand" is now "reg_or_0_operand".  That's fine for
THEAD, but breaks for SFB which requires a register for that operand.

This results in an ICE when compiling the testcase an SFB target such as
the sifive s76.

This change adjusts the expansion code slightly to copy the value into
a register for SFB.

Bootstrapped and regression tested (c,c++,fortran only) with a toolchain
configured to enable SFB by default.

	PR target/109508
gcc/

	* config/riscv/riscv.cc (riscv_expand_conditional_move): For
	TARGET_SFB_ALU, force the true arm into a register.

gcc/testsuite
	* gcc.target/riscv/pr109508.c: New test.
</pre>
</div>
</content>
</entry>
</feed>
