glibc.git/sysdeps/x86_64/fpu/multiarch/s_modff-avx.c, branch master

math: Don't redirect inlined builtin math functions

2025-11-17T14:17:07+00:00

When we want to inline builtin math functions, like truncf, for

  extern float truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__));
  extern float __truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__));

  float (truncf) (float) asm ("__truncf");

compiler may redirect truncf calls to __truncf, instead of inlining it
(for instance, clang).  The USE_TRUNCF_BUILTIN is 1 to indicate that
truncf should be inlined.  In this case, we don't want the truncf
redirection:

  1. For each math function which may be inlined, we define

  #if USE_TRUNCF_BUILTIN
   # define NO_truncf_BUILTIN inline_truncf
   #else
   # define NO_truncf_BUILTIN truncf
   #endif

in .

  2. Include  in include/math.h.

  3. Change MATH_REDIRECT to

   #define MATH_REDIRECT(FUNC, PREFIX, ARGS)		\
    float (NO_ ## FUNC ## f ## _BUILTIN) (ARGS (float))	\
      asm (PREFIX #FUNC "f");

With this change If USE_TRUNCF_BUILTIN is 0, we get

  float (truncf) (float) asm ("__truncf");
  truncf will be redirected to __truncf.

And for USE_TRUNCF_BUILTIN 1, we get:

  float (inline_truncf) (float) asm ("__truncf");

In both cases either truncf will be inlined or the internal alias
(__truncf) will be called.

It is not required for all math-use-builtin symbol, only the one
defined in math.h.  It also allows to remove all the math-use-builtin
inclusion, since it is now implicitly included by math.h.

For MIPS, some math-use-builtin headers include sysdep.h and this
in turn includes a lot of extra headers that do not allow ldbl-128
code to override alias definition (math.h will include
some stdlib.h definition).  The math-use-builtin only requires
the __mips_isa_rev, so move the defintion to sgidefs.h.

Signed-off-by: H.J. Lu 
Co-authored-by: Adhemerval Zanella  
Reviewed-by: H.J. Lu

math: Fix x86_64 build for -Os (BZ 33367)

2025-09-11T13:23:33+00:00

The compiler might not inline the trunc function call for
USE_TRUNC_BUILTIN [1].

This patch adds an optimized __trunc/__truncf for x86 used
on modf ifunc variant to avoid the trunc libcall.

Checked on x86_64, x86_64-v2, x86_64-v3, and x86_64-v4. Used -O2 and
-Os options. Performed a full make check on x86_64 with both
 optimizations.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121861
Reviewed-by: H.J. Lu

x86-64: Properly compile ISA optimized modf and modff

2025-07-18T17:22:19+00:00

There are 3 variants of modf and modff: SSE2, SSE4.1 and AVX.  s_modf.c
and s_modff.c include the generic implementation compiled with the minimum
x86 ISA level.  The IFUNC selector is used only if the minimum ISA level
is less than AVX.  SSE4.1 variant is included only if the ISA level is
less than SSE4.1.  AVX variant is included only the ISA level is less than
AVX.

AVX variant should be compiled with -mavx, not -msse2avx -DSSE2AVX which
are used to encode SSE assembly sources with EVEX encoding.

The routines that are shared between libc and libm should use different
rules to avoid using the same MODULE_NAME, to avoid potential issues
like BZ #33165 where __stack_chk_fail not being routed to the internal
symbol.

Tested with -march=x86-64, -march=x86-64-v2, -march=x86-64-v3 and
-march=x86-64-v4.

This fixes BZ #33165 and BZ #33173.

Co-authored-by: Adhemerval Zanella 
Signed-off-by: H.J. Lu 
Reviewed-by: Adhemerval Zanella

x86_64: Optimize modf/modff for x86_64-v2

2025-07-11T16:01:31+00:00

The SSE4.1 provides a direct instruction for trunc, which improves
modf/modff performance with a less text size.  On Ryzen 9 (zen3) with
gcc 14.2.1:

x86_64-v2
reciprocal-throughput        master        patch       difference
workload-0_1                 7.9610       7.7914            2.13%
workload-1_maxint            9.4323       7.8021           17.28%
workload-maxint_maxfloat     8.7379       7.8049           10.68%
workload-integral            7.9492       7.7991            1.89%

latency                      master        patch       difference
workload-0_1                 7.9511      10.8910          -36.97%
workload-1_maxint           15.8278      10.9048           31.10%
workload-maxint_maxfloat    11.3495      10.9139            3.84%
workload-integral           11.5938      10.9071            5.92%

x86_64-v3
reciprocal-throughput        master        patch       difference
workload-0_1                 8.7522       7.9781            8.84%
workload-1_maxint            9.6690       7.9872           17.39%
workload-maxint_maxfloat     8.7634       7.9857            8.87%
workload-integral            8.7397       7.9893            8.59%

latency                      master        patch       difference
workload-0_1                 8.7447       9.5589           -9.31%
workload-1_maxint           13.7480       9.5690           30.40%
workload-maxint_maxfloat    10.0092       9.5680            4.41%
workload-integral            9.7518       9.5743            1.82%

For x86_64-v1 the optimization is done through a new ifunc selector.
The avx is to follow other SSE4_1 optimization (like trunc) to avoid
the ifunc for x86_64-v3.

Checked on x86_64-linux-gnu.
Tested-by: Carlos O'Donell 
Reviewed-by: Carlos O'Donell