summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJiamei Xie <xiejiamei@hygon.cn>2025-10-14 20:14:11 +0800
committerSam James <sam@gentoo.org>2025-11-04 12:23:32 +0000
commit945cb7d935485a334a22225f5d2bc4b2f3d19286 (patch)
tree551d6c3889a784ee3a0bdd6b84271e75c200b7db
parent99338e3841dd8dcc36e35c274e1063c33b5e5e87 (diff)
x86: fix wmemset ifunc stray '!' (bug 33542)release/2.36/master
The ifunc selector for wmemset had a stray '!' in the X86_ISA_CPU_FEATURES_ARCH_P(...) check: if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, !)) This effectively negated the predicate and caused the AVX2/AVX512 paths to be skipped, making the dispatcher fall back to the SSE2 implementation even on CPUs where AVX2/AVX512 are available. The regression leads to noticeable throughput loss for wmemset. Remove the stray '!' so the AVX_Fast_Unaligned_Load capability is tested as intended and the correct AVX2/EVEX variants are selected. Impact: - On AVX2/AVX512-capable x86_64, wmemset no longer incorrectly falls back to SSE2; perf now shows __wmemset_evex/avx2 variants. Testing: - benchtests/bench-wmemset shows improved bandwidth across sizes. - perf confirm the selected symbol is no longer SSE2. Signed-off-by: xiejiamei <xiejiamei@hygon.com> Signed-off-by: Li jing <lijing@hygon.cn> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 4d86b6cdd8132e0410347e07262239750f86dfb4)
-rw-r--r--sysdeps/x86_64/multiarch/ifunc-wmemset.h2
1 files changed, 1 insertions, 1 deletions
diff --git a/sysdeps/x86_64/multiarch/ifunc-wmemset.h b/sysdeps/x86_64/multiarch/ifunc-wmemset.h
index 3810c719c6..9e58bc00b5 100644
--- a/sysdeps/x86_64/multiarch/ifunc-wmemset.h
+++ b/sysdeps/x86_64/multiarch/ifunc-wmemset.h
@@ -35,7 +35,7 @@ IFUNC_SELECTOR (void)
if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
&& X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
- AVX_Fast_Unaligned_Load, !))
+ AVX_Fast_Unaligned_Load,))
{
if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL))
{