math: Use tgamma from CORE-MATH

The current implementation precision shows the following accuracy, on one range ([-20,20]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20,20] * FE_TONEAREST 0: 4504877808 45.05% 1: 4402224940 44.02% 2: 947652295 9.48% 3: 131076831 1.31% 4: 13222216 0.13% 5: 910045 0.01% 6: 35253 0.00% 7: 606 0.00% 8: 6 0.00% * FE_UPWARD 0: 3477307921 34.77% 1: 4838637866 48.39% 2: 1413942684 14.14% 3: 240762564 2.41% 4: 27113094 0.27% 5: 2130934 0.02% 6: 102599 0.00% 7: 2324 0.00% 8: 14 0.00% * FE_DOWNWARD 0: 3923545410 39.24% 1: 4745067290 47.45% 2: 1137899814 11.38% 3: 171596912 1.72% 4: 20013805 0.20% 5: 1773899 0.02% 6: 99911 0.00% 7: 2928 0.00% 8: 31 0.00% * FE_TOWARDZERO 0: 3697160741 36.97% 1: 4731951491 47.32% 2: 1303092738 13.03% 3: 231969191 2.32% 4: 32344517 0.32% 5: 3283092 0.03% 6: 193010 0.00% 7: 5175 0.00% 8: 45 0.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 237.7960 175.4090 26.24% x86_64v2 232.9320 163.4460 29.83% x86_64v3 193.0680 89.7721 53.50% aarch64 113.6340 56.7350 50.07% power10 92.0617 26.6137 71.09% Latency master patched improvement x86_64 266.7190 208.0130 22.01% x86_64v2 263.6070 200.0280 24.12% x86_64v3 214.0260 146.5180 31.54% aarch64 114.4760 58.5235 48.88% power10 84.3718 35.7473 57.63% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
author: Adhemerval Zanella <adhemerval.zanella@linaro.org> 2025-10-10 15:15:28 -0300
committer: Adhemerval Zanella <adhemerval.zanella@linaro.org> 2025-10-27 09:34:04 -0300
commit: 1cae0550e8e0024b348d6962827d47f2db5df475 (patch)
tree: 01a1ef322acd3cbd25cc115bdba583edeca5163c /sysdeps/i386
parent: d67d2f468872c3fe9d3ba2b60eab0e421f906ff2 (diff)
1 files changed, 1 insertions, 6 deletions
diff --git a/sysdeps/i386/Makefile b/sysdeps/i386/Makefile
index ec1dfd98ee..9497c47997 100644
--- a/sysdeps/i386/Makefile
+++ b/sysdeps/i386/Makefile
@@ -6,15 +6,10 @@ asm-CPPFLAGS += -DGAS_SYNTAX
 long-double-fcts = yes
 
 ifeq ($(subdir),math)
-# These functions change the rounding mode internally and need to
-# update both the SSE2 rounding mode and the 387 rounding mode.  See
-# the handling of MATH_SET_BOTH_ROUNDING_MODES in
-# sysdeps/i386/fpu/fenv_private.h.
-CFLAGS-e_gamma_r.c += -DMATH_SET_BOTH_ROUNDING_MODES
-
 # The CORE-MATH implementation assumes FLT_EVAL_METHOD == 0 to provide
 # correctly rounded results.
 CFLAGS-e_lgamma_r.c += -fexcess-precision=standard
+CFLAGS-e_gamma_r.c += -fexcess-precision=standard
 endif
 
 ifeq ($(subdir),gmon)
author	Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-10 15:15:28 -0300
committer	Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-27 09:34:04 -0300
commit	1cae0550e8e0024b348d6962827d47f2db5df475 (patch)
tree	01a1ef322acd3cbd25cc115bdba583edeca5163c /sysdeps/i386
parent	d67d2f468872c3fe9d3ba2b60eab0e421f906ff2 (diff)