gcc.git, branch basepoints/gcc-14

Bump BASE-VER.

2023-04-17T11:52:56+00:00

2023-04-17  Jakub Jelinek  

	* BASE-VER: Set to 14.0.0.

ipa: Fix double reference-count decrements for the same edge (PR 107769, PR 109318)

2023-04-17T11:04:49+00:00

It turns out that since addition of the code that can identify globals
which are only read from, the code that keeps track of the references
can decrement their count for the same calls, once during IPA-CP and
then again during inlining.  Fixed by adding a special flag to the
pass-through variant and simply wiping out the reference to the
refdesc structure from the constant ones.

Moreover, during debugging of the issue I have discovered that the
code removing references could remove a reference associated with the
same statement but of a wrong type.  In all cases it wanted to remove
an IPA_REF_ADDR reference so removing a lesser one instead should do
no harm in practice, but we should try to be consistent and so this
patch extends symtab_node::find_reference so that it searches for a
reference of a given type only.

gcc/ChangeLog:

2023-04-14  Martin Jambor  

	PR ipa/107769
	PR ipa/109318
	* cgraph.h (symtab_node::find_reference): Add parameter use_type.
	* ipa-prop.h (ipa_pass_through_data): New flag refdesc_decremented.
	(ipa_zap_jf_refdesc): New function.
	(ipa_get_jf_pass_through_refdesc_decremented): Likewise.
	(ipa_set_jf_pass_through_refdesc_decremented): Likewise.
	* ipa-cp.cc (ipcp_discover_new_direct_edges): Provide a value for
	the new parameter of find_reference.
	(adjust_references_in_caller): Likewise. Make sure the constant jump
	function is not used to decrement a refdec counter again.  Only
	decrement refdesc counters when the pass_through jump function allows
	it.  Added a detailed dump when decrementing refdesc counters.
	* ipa-prop.cc (ipa_print_node_jump_functions_for_edge): Dump new flag.
	(ipa_set_jf_simple_pass_through): Initialize the new flag.
	(ipa_set_jf_unary_pass_through): Likewise.
	(ipa_set_jf_arith_pass_through): Likewise.
	(remove_described_reference): Provide a value for the new parameter of
	find_reference.
	(update_jump_functions_after_inlining): Zap refdesc of new jfunc if
	the previous pass_through had a flag mandating that we do so.
	(propagate_controlled_uses): Likewise.  Only decrement refdesc
	counters when the pass_through jump function allows it.
	(ipa_edge_args_sum_t::duplicate): Provide a value for the new
	parameter of find_reference.
	(ipa_write_jump_function): Assert the new flag does not have to be
	streamed.
	* symtab.cc (symtab_node::find_reference): Add parameter use_type, use
	it in searching.

gcc/testsuite/ChangeLog:

2023-04-06  Martin Jambor  

	PR ipa/107769
	PR ipa/109318
	* gcc.dg/ipa/pr109318.c: New test.
	* gcc.dg/lto/pr107769_0.c: Likewise.

aarch64: disable LDP via tuning structure for -mcpu=ampere1

2023-04-17T10:15:04+00:00

AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).

Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.

This commit:
 * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
 * allows -moverride=tune=... to override this

These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
   503.bwaves_r.      -0.88%
   507.cactuBSSN_r     0.35%
   508.namd_r          3.09%
   510.parest_r       -2.99%
   511.povray_r        5.54%
   519.lbm_r          15.83%
   521.wrf_r           0.56%
   526.blender_r       2.47%
   527.cam4_r          0.70%
   538.imagick_r       0.00%
   544.nab_r          -0.33%
   549.fotonik3d_r.   -0.42%
   554.roms_r          0.00%
   -------------------------
   = total             1.79%

Signed-off-by: Philipp Tomsich 
Co-Authored-By: Di Zhao 

gcc/ChangeLog:

	* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
	Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
	* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
	Check for the above tuning option when processing loads.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.

testsuite: Fix up vect-simd-clone-1[678]f.c tests some more

2023-04-17T09:45:53+00:00

With
make check-gcc check-g++ -j32 -k RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mavx,-m32/-mavx512f,-m32/-march=cascadelake,-m64,-m64/-mavx,-m64/-mavx512f,-m64/-march=cascadelake\}
+vect.exp=vect-simd-clone*'
the vect-simd-clone-1[678]f.c tests fail with -m32/-mavx512f and -m32/-march=cascadelake,
in that case there are zero matches rather than the 4 expected for ia32.
-m64/-mavx512f and -m64/-march=cascadelake works fine though (2 expected
matches).

So, the following patch just adds -mno-avx512f for x86 non-lp64.

2023-04-17  Jakub Jelinek  

	* gcc.dg/vect/vect-simd-clone-16f.c: Add -mno-avx512f for non-lp64 x86.
	* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.

tree-optimization/109524 - ICE with VRP edge removal

2023-04-17T09:02:29+00:00

VRP queues edges to process late for updating global ranges for
__builtin_unreachable.  But this interferes with edge removal
from substitute_and_fold.  The following deals with this by
looking up the edge with source/dest block indices which do not
become stale.

	PR tree-optimization/109524
	* tree-vrp.cc (remove_unreachable::m_list): Change to a
	vector of pairs of block indices.
	(remove_unreachable::maybe_register_block): Adjust.
	(remove_unreachable::remove_and_update_globals): Likewise.
	Deal with removed blocks.

	* g++.dg/pr109524.C: New testcase.

testsuite: update builtins-5-p9-runnable.c for BE

2023-04-17T02:39:26+00:00

Hi,

As PR108809 mentioned, vec_xl_len_r and vec_xst_len_r are tested
in gcc.target/powerpc/builtins-5-p9-runnable.c.
The vector operand of these two bifs are different from the view
of v16_int8 between BE and LE, even it is same from the view of
128bits(uint128/V1TI).

The test case gcc.target/powerpc/builtins-5-p9-runnable.c was
written for LE environment, this patch updates it for BE.

Tested on ppc64 BE and LE.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/testsuite/ChangeLog:

	PR testsuite/108809
	* gcc.target/powerpc/builtins-5-p9-runnable.c: Update for BE.

RISC-V: Fix testsuite fail on RV32

2023-04-17T01:51:36+00:00

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/scalar_move-2.c: Adjust include way
	for riscv_vector.h
	* gcc.target/riscv/rvv/base/spill-sp-adjust.c: Add missing
	-mabi.

RISC-V: Add test cases for the RVV mask insn shortcut.

2023-04-17T01:51:35+00:00

There are sorts of shortcut codegen for the RVV mask insn. For
example.

vmxor vd, va, va => vmclr vd.

We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.

Signed-off-by: Pan Li

Daily bump.

2023-04-17T00:17:00+00:00

[committed] [PR target/109508] Adjust conditional move expansion for SFB

2023-04-16T15:56:21+00:00

Recently the conditional move expander's predicates were loosened for the
benefit of the THEAD processors.  In particular one operand that was
previously "register_operand" is now "reg_or_0_operand".  That's fine for
THEAD, but breaks for SFB which requires a register for that operand.

This results in an ICE when compiling the testcase an SFB target such as
the sifive s76.

This change adjusts the expansion code slightly to copy the value into
a register for SFB.

Bootstrapped and regression tested (c,c++,fortran only) with a toolchain
configured to enable SFB by default.

	PR target/109508
gcc/

	* config/riscv/riscv.cc (riscv_expand_conditional_move): For
	TARGET_SFB_ALU, force the true arm into a register.

gcc/testsuite
	* gcc.target/riscv/pr109508.c: New test.