summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-11-21ppc64le: Restore optimized strncmp for power10release/2.41/masterSachin Monga
This patch addresses the actual cause of CVE-2025-5745 The vector non-volatile registers are not used anymore for 32 byte load and comparison operation Additionally, the assembler workaround used earlier for the instruction lxvp is replaced with actual instruction. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Co-authored-by: Paul Murphy <paumurph@redhat.com> (cherry picked from commit 2ea943f7d487d6a4166658b32af7c5365889fc34)
2025-11-21ppc64le: Restore optimized strcmp for power10Sachin Monga
This patch addresses the actual cause of CVE-2025-5702 The vector non-volatile registers are not used anymore for 32 byte load and comparison operation Additionally, the assembler workaround used earlier for the instruction lxvp is replaced with actual instruction. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Co-authored-by: Paul Murphy <paumurph@redhat.com> (cherry picked from commit 9a40b1cda519cc4f532acb6d020390829df3d81b)
2025-11-18AArch64: Fix and improve SVE pow(f) special casesPierre Blanchard
powf: Update scalar special case function to best use new interface. pow: Make specialcase NOINLINE to prevent str/ldr leaking in fast path. Remove depency in sv_call2, as new callback impl is not a performance gain. Replace with vectorised specialcase since structure of scalar routine is fairly simple. Throughput gain of about 5-10% on V1 for large values and 25% for subnormal `x`. Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com> (cherry picked from commit bb6519de1e6fe73d79bc71588ec4e5668907f080)
2025-11-18AArch64: fix SVE tanpi(f) [BZ #33642]Pierre Blanchard
Fixed svld1rq using incorrect predicates (BZ #33642). Next to no performance variations (tested on V1). Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com> (cherry picked from commit e889160273a4c2b68870c9adf341955867d76a7d)
2025-11-18AArch64: Fix instability in AdvSIMD sinhJoe Ramsay
Previously presence of special-cases in one lane could affect the results in other lanes due to unconditional scalar fallback. The old WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has been removed from AOR, making it easier to spot and fix this. No measured change in performance. This patch applies cleanly as far back as 2.41, however there are conflicts with 2.40 where sinh was first introduced. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit e45af510bc816e860c8e2e1d4a652b4fe15c4b34)
2025-11-18AArch64: Fix instability in AdvSIMD tanJoe Ramsay
Previously presence of special-cases in one lane could affect the results in other lanes due to unconditional scalar fallback. The old WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has been removed from AOR, making it easier to spot and fix this. 4% improvement in throughput with GCC 14 on Neoverse V1. This bug is present as far back as 2.39 (where tan was first introduced). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit 6c22823da57aa5218f717f569c04c9573c0448c5)
2025-11-18AArch64: Optimise SVE scalar callbacksJoe Ramsay
Instead of using SVE instructions to marshall special results into the correct lane, just write the entire vector (and the predicate) to memory, then use cheaper scalar operations. Geomean speedup of 16% in special intervals on Neoverse with GCC 14. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit 5b82fb18827e962af9f080fdf3c1a69802783f67)
2025-11-13aarch64: fix includes in SME testsYury Khrustalev
Use the correct include for the SIGCHLD macro: signal.h Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit a9c426bcca59a9e228c4fbe75e75154217ec4ada) (cherry picked from commit 17c3eab387c3ceb6972e57888a89b1480793f81a)
2025-11-13aarch64: Do not link conform tests with -Wl,-z,force-bti (bug 33601)Florian Weimer
If the toolchain does not default to generate BTI markers in GCC, the main program for conform runtime tests will not have the BTI marker that -Wl,-z,force-bti requires. Without -Wl,-z,force-bti, the link editor will not tell the dynamic linker to enable BTI, and the missing BTI marker is harmless. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
2025-11-10aarch64: fix cfi directives around __libc_arm_za_disableYury Khrustalev
Incorrect CFI directive corrupted call stack information and prevented debuggers from correctly displaying call stack information. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 2f77aec043f61e8533487850b11941a640ae2dea) (cherry picked from commit de1fe81f471496366580ad728b8986a3424b2fd7)
2025-11-10aarch64: tests for SMEYury Khrustalev
This commit adds tests for the following use cases relevant to handing of the SME state: - fork() and vfork() - clone() and clone3() - signal handler While most cases are trivial, the case of clone3() is more complicated since the clone3() symbol is not public in Glibc. To avoid having to check all possible ways clone3() may be called via other public functions (e.g. vfork() or pthread_create()), we put together a test that links directly with clone3.o. All the existing functions that have calls to clone3() may not actually use it, in which case the outcome of such tests would be unexpected. Having a direct call to the clone3() symbol in the test allows to check precisely what we need to test: that the __arm_za_disable() function is indeed called and has the desired effect. Linking to clone3.o also requires linking to __arm_za_disable.o that in turn requires the _dl_hwcap2 hidden symbol which to provide in the test and initialise it before using. Co-authored-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit ecb0fc2f0f839f36cd2a106283142c9df8ea8214) (cherry picked from commit 71874f167aa5bb1538ff7e394beaacee28ebe65f)
2025-11-10aarch64: clear ZA state of SME before clone and clone3 syscallsYury Khrustalev
This change adds a call to the __arm_za_disable() function immediately before the SVC instruction inside clone() and clone3() wrappers. It also adds a macro for inline clone() used in fork() and adds the same call to the vfork implementation. This sets the ZA state of SME to "off" on return from these functions (for both the child and the parent). The __arm_za_disable() function is described in [1] (8.1.3). Note that the internal Glibc name for this function is __libc_arm_za_disable(). When this change was originally proposed [2,3], it generated a long discussion where several questions and concerns were raised. Here we will address these concerns and explain why this change is useful and, in fact, necessary. In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent to this change, mainly, the chapters 6.2 and 6.6), should have a call to the __arm_za_disable() function in clone() and clone3() wrappers. The following explains in detail why this is the case. When we consider using the __arm_za_disable() function inside the clone() and clone3() libc wrappers, we talk about the C library subroutines clone() and clone3() rather than the syscalls with similar names. In the current version of Glibc, clone() is public and clone3() is private, but it being private is not pertinent to this discussion. We will begin with stating that this change is NOT a bug fix for something in the kernel. The requirement to call __arm_za_disable() does NOT come from the kernel. It also is NOT needed to satisfy a contract between the kernel and userspace. This is why it is not for the kernel documentation to describe this requirement. This requirement is instead needed to satisfy a pure userspace scheme outlined in [1] and to make sure that software that uses Glibc (or any other C library that has correct handling of SME states (see below)) conforms to [1] without having to unnecessarily become SME-aware thus losing portability. To recap (see [1] (6.2)), SME extension defines SME state which is part of processor state. Part of this SME state is ZA state that is necessary to manage ZA storage register in the context of the ZA lazy saving scheme [1] (6.6). This scheme exists because it would be challenging to handle ZA storage of SME in either callee-saved or caller-saved manner. There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA bit and the TPIDR2_EL0 register (see [1] (6.6.3)): - "off":       PSTATE.ZA == 0 - "active":    PSTATE.ZA == 1 TPIDR2_EL0 == null - "dormant":   PSTATE.ZA == 1 TPIDR2_EL0 != null As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface depending on the permitted ZA-states on entry and on normal return from a call to this subroutine. Callers of a subroutine must know and respect the ZA-interface of the subroutines they are using. Using a subroutine in a way that is not permitted by its ZA-interface is undefined behaviour. In particular, clone() and clone3() (the C library functions) have the ZA-private interface. This means that the permitted ZA-states on entry are "off" and "dormant" and that the permitted states on return are "off" or "dormant" (but if and only if it was "dormant" on entry). This means that both functions in question should correctly handle both "off" and "dormant" ZA-states on entry. The conforming states on return are "off" and "dormant" (if inbound state was already "dormant"). This change ensures that the ZA-state on return is always "off". Note, that, in the context of clone() and clone3(), "on return" means a point when execution resumes at certain address after transferring from clone() or clone3(). For the caller (we may refer to it as "parent") this is the return address in the link register where the RET instruction jumps. For the "child", this is the target branch address. So, the "off" state on return is permitted and conformant. Why can't we retain the "dormant" state? In theory, we can, but we shouldn't, here is why. Every subroutine with a private-ZA interface, including clone() and clone3(), must comply with the lazy saving scheme [1] (6.7.2). This puts additional responsibility on a subroutine if ZA-state on return is "dormant" because this state has special meaning. The "caller" (that is the place in code where execution is transferred to, so this include both "parent" and "child") may check the ZA-state and use it as per the spec of the "dormant" state that is outlined in [1] (6.6.6 and 6.6.7). Conforming to this would require more code inside of clone() and clone3() which hardly is desirable. For the return to "parent" this could be achieved in theory, but given that neither clone() nor clone3() are supposed to be used in the middle of an SME operation, if wouldn't be useful. For the "return" to "child" this would be particularly difficult to achieve given the complexity of these functions and their interfaces. Most importantly, it would be illegal and somewhat meaningless to allow a "child" to start execution in the "dormant" ZA-state because the very essence of the "dormant" state implies that there is a place to return and that there is some outer context that we are allowed to interact with. To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the execution resumes after a call to clone() or clone3() is correct and also the most simple way to conform to [1]. Can there be situations when we can avoid calling __arm_za_disable()? Calling __arm_za_disable() implies certain (sufficiently small) overhead, so one might rightly ponder avoiding making a call to this function when we can afford not to. The most trivial cases like this (e.g. when the calling thread doesn't have access to SME or to the TPIDR2_EL0 register) are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning about other possible use cases would require making code inside clone() and clone3() more complicated and it would defeat the point of trying to make an optimisation of not calling __arm_za_disable(). Why can't the kernel do this instead? The handling of SME state by the kernel is described in [4]. In short, kernel must not impose a specific ZA-interface onto a userspace function. Interaction with the kernel happens (among other thing) via system calls. In Glibc many of the system calls (notably, including SYS_clone and SYS_clone3) are used via wrappers, and the kernel has no control of them and, moreover, it cannot dictate how these wrappers should behave because it is simply outside of the kernel's remit. However, in certain cases, the kernel may ensure that a "child" doesn't start in an incorrect state. This is what is done by the recent change included in 6.16 kernel [5]. This is not enough to ensure that code that uses clone() and clone3() function conforms to [1] when it runs on a system that provides SME, hence this change. [1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst [2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com [3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com [4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html [5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 27effb3d50424fb9634be77a2acd614b0386ff25) (cherry picked from commit 256030b9842a10b1f22851b1de0c119761417544)
2025-11-10aarch64: define macro for calling __libc_arm_za_disableYury Khrustalev
A common sequence of instructions is used in several places in assembly files, so define it in one place as an assembly macro. Note that PAC instructions are not included in the new macro because they are redundant given how we call the arm_za_disable function (return address is not saved on stack, so no need to sign it). (based on commits 6de12fc9ad56bc19fa6fcbd8ee502f29b5170d47 and c0f0db2d59e0908057205b22b21dd9d626d780c1) Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-11-10aarch64: update tests for SMEYury Khrustalev
Add test that checks that ZA state is disabled after setjmp and sigsetjmp Update existing SME test that uses setjmp Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit 251f93262483b9c1184f5b72993d77a5d1c95f68)
2025-11-10aarch64: Disable ZA state of SME in setjmp and sigsetjmpYury Khrustalev
Due to the nature of the ZA state, setjmp() should clear it in the same manner as it is already done by longjmp. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit a7f6fd976c17b82dc198290b4ab7087f35855a0e)
2025-11-04x86: fix wmemset ifunc stray '!' (bug 33542)Jiamei Xie
The ifunc selector for wmemset had a stray '!' in the X86_ISA_CPU_FEATURES_ARCH_P(...) check: if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, !)) This effectively negated the predicate and caused the AVX2/AVX512 paths to be skipped, making the dispatcher fall back to the SSE2 implementation even on CPUs where AVX2/AVX512 are available. The regression leads to noticeable throughput loss for wmemset. Remove the stray '!' so the AVX_Fast_Unaligned_Load capability is tested as intended and the correct AVX2/EVEX variants are selected. Impact: - On AVX2/AVX512-capable x86_64, wmemset no longer incorrectly falls back to SSE2; perf now shows __wmemset_evex/avx2 variants. Testing: - benchtests/bench-wmemset shows improved bandwidth across sizes. - perf confirm the selected symbol is no longer SSE2. Signed-off-by: xiejiamei <xiejiamei@hygon.com> Signed-off-by: Li jing <lijing@hygon.cn> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 4d86b6cdd8132e0410347e07262239750f86dfb4)
2025-10-08x86: Detect Intel Nova Lake ProcessorSunil K Pandey
Detect Intel Nova Lake Processor and tune it similar to Intel Panther Lake. https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit a114e29ddd530962d2b44aa9d89f1f6075abe7fa)
2025-10-08x86: Detect Intel Wildcat Lake ProcessorSunil K Pandey
Detect Intel Wildcat Lake Processor and tune it similar to Intel Panther Lake. https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit f8dd52901b72805a831d5a4cb7d971e4a3c9970b)
2025-09-19nss: Group merge does not react to ERANGE during merge (bug 33361)Florian Weimer
The break statement in CHECK_MERGE is expected to exit the surrounding while loop, not the do-while loop with in the macro. Remove the do-while loop from the macro. It is not needed to turn the macro expansion into a single statement due to the way CHECK_MERGE is used (and the statement expression would cover this anyway). Reviewed-by: Collin Funk <collin.funk1@gmail.com> (cherry picked from commit 0fceed254559836b57ee05188deac649bc505d05)
2025-09-03AArch64: Fix SVE powf routine [BZ #33299]Pierre Blanchard
Fix a bug in predicate logic introduced in last change. A slight performance improvement from relying on all true predicates during conversion from single to double. This fixes BZ #33299. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit aac077645a645bba0d67f3250e82017c539d0f4b)
2025-08-20Optimize __libc_tsd_* thread variable accessFlorian Weimer
These variables are not exported, and libc.so TLS is initial-exec anyway. Declare these variables as hidden and use the initial-exec TLS model. Reviewed-by: Frédéric Bérat <fberat@redhat.com> (cherry picked from commit a894f04d877653bea1639fc9a4adf73bd9347bf4)
2025-08-19i386: Add GLIBC_ABI_GNU_TLS version [BZ #33221]H.J. Lu
On i386, programs and shared libraries with __thread usage may fail silently at run-time against glibc without the TLS run-time fix for: https://sourceware.org/bugzilla/show_bug.cgi?id=32996 Add GLIBC_ABI_GNU_TLS version to indicate that glibc has the working GNU TLS run-time. Linker can add the GLIBC_ABI_GNU_TLS version to binaries which depend on the working TLS run-time so that such programs and shared libraries will fail to load and run at run-time against libc.so without the GLIBC_ABI_GNU_TLS version, instead of fail silently at random. This fixes BZ #33221. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit ed1b7a5a489ab555a27fad9c101ebe2e1c1ba881)
2025-08-19i386: Also add GLIBC_ABI_GNU2_TLS version [BZ #33129]H.J. Lu
Since the GNU2 TLS run-time bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31372 affects both i386 and x86-64, also add GLIBC_ABI_GNU2_TLS version to i386 to indicate the working GNU2 TLS run-time. For x86-64, the additional GNU2 TLS run-time bug fix is needed for https://sourceware.org/bugzilla/show_bug.cgi?id=31501 Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit bd4628f3f18ac312408782eea450429c6f044860)
2025-08-18i386: Update ___tls_get_addr to preserve vector registersH.J. Lu
Compiler generates the following instruction sequence for dynamic TLS access: leal tls_var@tlsgd(,%ebx,1), %eax call ___tls_get_addr@PLT CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS, AX, CX, and DX, are unchanged after CALL. But ___tls_get_addr is a normal function which doesn't preserve any vector registers. 1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal. 2. Change ___tls_get_addr to a wrapper function with implementations for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers. 3. dl-tlsdesc-dynamic.h has: _dl_tlsdesc_dynamic: /* Like all TLS resolvers, preserve call-clobbered registers. We need two scratch regs anyway. */ subl $32, %esp cfi_adjust_cfa_offset (32) It is wrong to use movl %ebx, -28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl -28(%esp), %ebx to preserve EBX on stack. Fix it with: movl %ebx, 28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl 28(%esp), %ebx 4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly. 5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with traditional TLS variant to verify the fix. 6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h. This fixes BZ #32996. Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 848f0e46f03f22404ed9a8aabf3fd5ce8809a1be)
2025-08-18elf: Preserve _rtld_global layout for the release branchFlorian Weimer
Backporting commit 9d6577fdff801a856383e69322f15e63424ad312 ("elf: Introduce _dl_debug_change_state") removed the _ns_debug member. Keep it to preseve struct layout.
2025-08-18elf: Test dlopen (NULL, RTLD_LAZY) from an ELF constructorFlorian Weimer
This call must not complete initialization of all shared objects in the global scope because the ELF constructor which makes the call likely has not finished initialization. Calling more constructors at this point would expose those to a partially constructed dependency. This completes the revert of commit 9897ced8e78db5d813166a7ccccfd5a ("elf: Run constructors on cyclic recursive dlopen (bug 31986)"). (cherry picked from commit d604f9c500570e80febfcc6a52b63a002b466f35)
2025-08-18elf: Fix handling of symbol versions which hash to zero (bug 29190)Florian Weimer
This was found through code inspection. No application impact is known. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 46d31980943d8be2f421c1e3276b265c7552636e)
2025-08-15x86-64: Add GLIBC_ABI_DT_X86_64_PLT [BZ #33212]H.J. Lu
When the linker -z mark-plt option is used to add DT_X86_64_PLT, DT_X86_64_PLTSZ and DT_X86_64_PLTENT, the r_addend field of the R_X86_64_JUMP_SLOT relocation stores the offset of the indirect branch instruction. However, glibc versions without the commit: commit f8587a61892cbafd98ce599131bf4f103466f084 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri May 20 19:21:48 2022 -0700 x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT According to x86-64 psABI, r_addend should be ignored for R_X86_64_GLOB_DAT and R_X86_64_JUMP_SLOT. Since linkers always set their r_addends to 0, we can ignore their r_addends. Reviewed-by: Fangrui Song <maskray@google.com> won't ignore the r_addend value in the R_X86_64_JUMP_SLOT relocation. Such programs and shared libraries will fail at run-time randomly. Add GLIBC_ABI_DT_X86_64_PLT version to indicate that glibc is compatible with DT_X86_64_PLT. The linker can add the glibc GLIBC_ABI_DT_X86_64_PLT version dependency whenever -z mark-plt is passed to the linker. The resulting programs and shared libraries will fail to load at run-time against libc.so without the GLIBC_ABI_DT_X86_64_PLT version, instead of fail randomly. This fixes BZ #33212. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit 399384e0c8193e31aea014220ccfa24300ae5938)
2025-08-15x86-64: Add GLIBC_ABI_GNU2_TLS version [BZ #33129]H.J. Lu
Programs and shared libraries compiled with -mtls-dialect=gnu2 may fail silently at run-time against glibc without the GNU2 TLS run-time fix for: https://sourceware.org/bugzilla/show_bug.cgi?id=31372 Add GLIBC_ABI_GNU2_TLS version to indicate that glibc has the working GNU2 TLS run-time. Linker can add the GLIBC_ABI_GNU2_TLS version to binaries which depend on the working GNU2 TLS run-time: https://sourceware.org/bugzilla/show_bug.cgi?id=33130 so that such programs and shared libraries will fail to load and run at run-time against libc.so without the GLIBC_ABI_GNU2_TLS version, instead of fail silently at random. This fixes BZ #33129. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit 9df8fa397d515dc86ff5565f6c45625e672d539e)
2025-08-15elf: Compile _dl_debug_state separately (bug 33224)Florian Weimer
This ensures that the compiler will not inline it, so that debuggers which do not use the Systemtap probes can reliably set a breakpoint on it. Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org> Tested-by: Andreas K. Huettel <dilfridge@gentoo.org> (cherry picked from commit 620f0730f311635cd0e175a3ae4d0fc700c76366)
2025-08-15elf: Restore support for _r_debug interpositions and copy relocationsFlorian Weimer
The changes in commit a93d9e03a31ec14405cb3a09aa95413b67067380 ("Extend struct r_debug to support multiple namespaces [BZ #15971]") break the dyninst dynamic instrumentation tool. It brings its own definition of _r_debug (rather than a declaration). Furthermore, it turns out it is rather hard to use the proposed handshake for accessing _r_debug via DT_DEBUG. If applications want to access _r_debug, they can do so directly if the relevant code has been built as PIC. To protect against harm from accidental copy relocations due to linker relaxations, this commit restores copy relocation support by adjusting both copies if interposition or copy relocations are in play. Therefore, it is possible to use a hidden reference in ld.so to access _r_debug. Only perform the copy relocation initialization if libc has been loaded. Otherwise, the ld.so search scope can be empty, and the lookup of the _r_debug symbol mail fail. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit ea85e7d55087075376a29261e722e4fae14ecbe7)
2025-08-15elf: Introduce _dl_debug_change_stateFlorian Weimer
It combines updating r_state with the debugger notification. The second change to _dl_open introduces an additional debugger notification for dlmopen, but debuggers are expected to ignore it. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 8329939a37f483a16013dd8af8303cbcb86d92cb)
2025-08-15elf: Introduce separate _r_debug_array variableFlorian Weimer
It replaces the ns_debug member of the namespaces. Previously, the base namespace had an unused ns_debug member. This change also fixes a concurrency issue: Now _dl_debug_initialize only updates r_next of the previous namespace's r_debug after the new r_debug is initialized, so that only the initialized version is observed. (Client code accessing _r_debug will benefit from load dependency tracking in CPUs even without explicit barriers.) Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 7278d11f3a0cd528188c719bab75575b0aea2c6e)
2025-08-14Use TLS initial-exec model for __libc_tsd_CTYPE_* thread variables [BZ #33234]Jens Remus
Commit 10a66a8e421b ("Remove <libc-tsd.h>") removed the TLS initial-exec (IE) model attribute from the __libc_tsd_CTYPE_* thread variable declarations and definitions. Commit a894f04d8776 ("Optimize __libc_tsd_* thread variable access") restored it on declarations. Restore the TLS initial-exec model attribute on __libc_tsd_CTYPE_* thread variable definitions. This resolves test tst-locale1 failure on s390 32-bit, when using a GNU linker without the fix from GNU binutils commit aefebe82dc89 ("IBM zSystems: Fix offset relative to static TLS"). Reviewed-by: Florian Weimer <fweimer@redhat.com> (cherry picked from commit e5363e6f460c2d58809bf10fc96d70fd1ef8b5b2)
2025-08-14ctype: Fallback initialization of TLS using relocations (bug 19341, bug 32483)Florian Weimer
This ensures that the ctype data pointers in TLS are valid in secondary namespaces even without initialization via __ctype_init. Reviewed-by: Frédéric Bérat <fberat@redhat.com> (cherry picked from commit 2745db8dd3ec31045acd761b612516490085bc20)
2025-08-14Use proper extern declaration for _nl_C_LC_CTYPE_{class,toupper,tolower}Florian Weimer
The existing initializers already contain explicit casts. Keep them due to int/uint32_t mismatch. Reviewed-by: Frédéric Bérat <fberat@redhat.com> (cherry picked from commit e0c0f856f58ceb68800a964c36c15c606e7a8c4c)
2025-08-14Remove <libc-tsd.h>Florian Weimer
Use __thread variables directly instead. The macros do not save any typing. It seems unlikely that a future port will lack __thread variable support. Some of the __libc_tsd_* variables are referenced from assembler files, so keep their names. Previously, <libc-tls.h> included <tls.h>, which in turn included <errno.h>, so a few direct includes of <errno.h> are now required. Reviewed-by: Frédéric Bérat <fberat@redhat.com> (cherry picked from commit 10a66a8e421b09682b774c795ef1da402235dddc)
2025-08-11AArch64: Improve codegen SVE log1p helperLuna Lamb
Improve codegen by packing coefficients. 4% and 2% improvement in throughput microbenchmark on Neoverse V1, for acosh and atanh respectively. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit 6849c5b791edd216f2ec3fdbe4d138bc69b9b333)
2025-08-11AArch64: Optimise SVE FP64 HyperbolicsDylan Fleming
Reworke SVE FP64 hyperbolics to use the SVE FEXPA instruction. Also update the special case handelling for large inputs to be entirely vectorised. Performance improvements on Neoverse V1: cosh_sve: 19% for |x| < 709, 5x otherwise sinh_sve: 24% for |x| < 709, 5.9x otherwise tanh_sve: 12% for |x| < 19, 9x otherwise Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit dee22d2a81ab59afc165fb6dcb45d723f13582a0)
2025-08-11AArch64: Optimize SVE exp functionsDylan Fleming
Improve performance of SVE exps by making better use of the SVE FEXPA instruction. Performance improvement on Neoverse V1: exp2_sve: 21% exp2f_sve: 24% exp10f_sve: 23% expm1_sve: 25% Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit 1e3d1ddf977ecd653de8d0d10eb083d80ac21cf3)
2025-08-11AArch64: Improve codegen in SVE log1pLuna Lamb
Improves memory access, reformat evaluation scheme to pack coefficients. 5% improvement in throughput microbenchmark on Neoverse V1. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit da196e6134ede64728006518352d75b6c3902fec)
2025-08-11AArch64: Optimize inverse trig functionsDylan Fleming
Improve performance of Inverse trig functions by altering how coefficients are loaded. Performance improvement on Neoverse V1: SVE acos 14% AdvSIMD acos 6% AdvSIMD asin 6% SVE asin 5% AdvSIMD asinf 2% AdvSIMD atanf 22% SVE atanf 20% SVE atan 11% AdvSIMD atan 5% SVE atan2 7% SVE atan2f 4% AdvSIMD atan2f 3% AdvSIMD atan2 2% Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit 1e84509e0041c0a83997aba602a585bb3b8285f0)
2025-08-11AArch64: Optimize algorithm in users of SVE expf helperPierre Blanchard
Polynomial order was unnecessarily high, unlocking multiple optimizations. Max error for new SVE expf is 0.88 +0.5ULP. Max error for new SVE coshf is 2.56 +0.5ULP. Performance improvement on Neoverse V1: expf (30%), coshf (26%). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit cf56eb28fa277d9dbb301654682ca89f71c30a48)
2025-08-11AArch64: Avoid memset ifunc in cpu-features.c [BZ #33112]Wilco Dijkstra
During early startup memcpy or memset must not be called since many targets use ifuncs for them which won't be initialized yet. Security hardening may use -ftrivial-auto-var-init=zero which inserts calls to memset. Redirect memset to memset_generic by including dl-symbol-redir-ifunc.h in cpu-features.c. This fixes BZ #33112. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 681a24ae4d0cb8ed92de98b4da660308840b09ba)
2025-08-01nptl: Fix SYSCALL_CANCEL for return values larger than INT_MAX (BZ 33245)Adhemerval Zanella
The SYSCALL_CANCEL calls __syscall_cancel, which in turn calls __internal_syscall_cancel with an 'int' return instead of the expected 'long int'. This causes issues with syscalls that return values larger than INT_MAX, such as copy_file_range [1]. Checked on x86_64-linux-gnu. [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=79139 Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org> (cherry picked from commit 7107bebf19286f42dcb0a97581137a5893c16206)
2025-08-01elf: Handle ld.so with LOAD segment gaps in _dl_find_object (bug 31943)Florian Weimer
Detect if ld.so not contiguous and handle that case in _dl_find_object. Set l_find_object_processed even for initially loaded link maps, otherwise dlopen of an initially loaded object adds it to _dlfo_loaded_mappings (where maps are expected to be contiguous), in addition to _dlfo_nodelete_mappings. Test elf/tst-link-map-contiguous-ldso iterates over the loader image, reading every word to make sure memory is actually mapped. It only does that if the l_contiguous flag is set for the link map. Otherwise, it finds gaps with mmap and checks that _dl_find_object does not return the ld.so mapping for them. The test elf/tst-link-map-contiguous-main does the same thing for the libc.so shared object. This only works if the kernel loaded the main program because the glibc dynamic loader may fill the gaps with PROT_NONE mappings in some cases, making it contiguous, but accesses to individual words may still fault. Test elf/tst-link-map-contiguous-libc is again slightly different because the dynamic loader always fills the gaps with PROT_NONE mappings, so a different form of probing has to be used. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 20681be149b9eb1b6c1f4246bf4bd801221c86cd)
2025-08-01elf: Extract rtld_setup_phdr function from dl_mainFlorian Weimer
Remove historic binutils reference from comment and update how this data is used by applications. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 2cac9559e06044ba520e785c151fbbd25011865f)
2025-08-01elf: Do not add a copy of _dl_find_object to libc.soFlorian Weimer
This reduces code size and dependencies on ld.so internals from libc.so. Fixes commit f4c142bb9fe6b02c0af8cfca8a920091e2dba44b ("arm: Use _dl_find_object on __gnu_Unwind_Find_exidx (BZ 31405)"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 96429bcc91a14f71b177ddc5e716de3069060f2c)
2025-08-01stdlib: resolve a double lock init issue after fork [BZ #32994]Davide Cavalca
The __abort_fork_reset_child (introduced in d40ac01cbbc66e6d9dbd8e3485605c63b2178251) call resets the lock after the fork. This causes a DRD regression in valgrind (https://bugs.kde.org/show_bug.cgi?id=503668), as it's effectively a double initialization, despite it being actually ok in this case. As suggested in https://sourceware.org/bugzilla/show_bug.cgi?id=32994#c2 we replace it here with a memcpy of another initialized lock instead, which makes valgrind happy. Reviewed-by: Florian Weimer <fweimer@redhat.com> (cherry picked from commit d9a348d0927c7a1aec5caf3df3fcd36956b3eb23)
2025-07-24iconv: iconv -o should not create executable files (bug 33164)Florian Weimer
The mistake is that open must use 0666 to pick up the umask, and not 0777 (which is required by mkdir). Fixes commit 8ef3cff9d1ceafe369f982d980678d749fb93bd2 ("iconv: Support in-place conversions (bug 10460, bug 32033)"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit cdcf24ee14c27b77744ff52ab3ae852821207eb0)