summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-11-21ppc64le: Restore optimized strncmp for power10release/2.39/masterSachin Monga
This patch addresses the actual cause of CVE-2025-5745 The vector non-volatile registers are not used anymore for 32 byte load and comparison operation Additionally, the assembler workaround used earlier for the instruction lxvp is replaced with actual instruction. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Co-authored-by: Paul Murphy <paumurph@redhat.com> (cherry picked from commit 2ea943f7d487d6a4166658b32af7c5365889fc34)
2025-11-21ppc64le: Restore optimized strcmp for power10Sachin Monga
This patch addresses the actual cause of CVE-2025-5702 The vector non-volatile registers are not used anymore for 32 byte load and comparison operation Additionally, the assembler workaround used earlier for the instruction lxvp is replaced with actual instruction. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Co-authored-by: Paul Murphy <paumurph@redhat.com> (cherry picked from commit 9a40b1cda519cc4f532acb6d020390829df3d81b)
2025-11-18AArch64: Fix instability in AdvSIMD tanJoe Ramsay
Previously presence of special-cases in one lane could affect the results in other lanes due to unconditional scalar fallback. The old WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has been removed from AOR, making it easier to spot and fix this. 4% improvement in throughput with GCC 14 on Neoverse V1. This bug is present as far back as 2.39 (where tan was first introduced). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit 6c22823da57aa5218f717f569c04c9573c0448c5)
2025-11-18AArch64: Optimise SVE scalar callbacksJoe Ramsay
Instead of using SVE instructions to marshall special results into the correct lane, just write the entire vector (and the predicate) to memory, then use cheaper scalar operations. Geomean speedup of 16% in special intervals on Neoverse with GCC 14. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit 5b82fb18827e962af9f080fdf3c1a69802783f67)
2025-11-14aarch64: fix includes in SME testsYury Khrustalev
Use the correct include for the SIGCHLD macro: signal.h Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit a9c426bcca59a9e228c4fbe75e75154217ec4ada) (cherry picked from commit 17c3eab387c3ceb6972e57888a89b1480793f81a) (cherry picked from commit 215e9155ea06064342151d05446ae51da16e0f65) (cherry picked from commit a66680adf3b2266c177f94f3f63e4b182e6362fe)
2025-11-14aarch64: fix cfi directives around __libc_arm_za_disableYury Khrustalev
Incorrect CFI directive corrupted call stack information and prevented debuggers from correctly displaying call stack information. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 2f77aec043f61e8533487850b11941a640ae2dea) (cherry picked from commit de1fe81f471496366580ad728b8986a3424b2fd7) (cherry picked from commit 5bf8ee7ad559fd60bedd3f5ec831d0b12b5000b8) (cherry picked from commit 1c0ad5ea63b2e5d39eb55f734b0e2bea7f766523)
2025-11-14aarch64: tests for SMEYury Khrustalev
This commit adds tests for the following use cases relevant to handing of the SME state: - fork() and vfork() - clone() and clone3() - signal handler While most cases are trivial, the case of clone3() is more complicated since the clone3() symbol is not public in Glibc. To avoid having to check all possible ways clone3() may be called via other public functions (e.g. vfork() or pthread_create()), we put together a test that links directly with clone3.o. All the existing functions that have calls to clone3() may not actually use it, in which case the outcome of such tests would be unexpected. Having a direct call to the clone3() symbol in the test allows to check precisely what we need to test: that the __arm_za_disable() function is indeed called and has the desired effect. Linking to clone3.o also requires linking to __arm_za_disable.o that in turn requires the _dl_hwcap2 hidden symbol which to provide in the test and initialise it before using. Co-authored-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit ecb0fc2f0f839f36cd2a106283142c9df8ea8214) (cherry picked from commit 71874f167aa5bb1538ff7e394beaacee28ebe65f) (cherry picked from commit e4ffcf32b9213352917dcf7dc43adcaa0ff76503) (cherry picked from commit df3bcda9ac85eb720d27bbbc469cdc898cc9b02b)
2025-11-14aarch64: clear ZA state of SME before clone and clone3 syscallsYury Khrustalev
This change adds a call to the __arm_za_disable() function immediately before the SVC instruction inside clone() and clone3() wrappers. It also adds a macro for inline clone() used in fork() and adds the same call to the vfork implementation. This sets the ZA state of SME to "off" on return from these functions (for both the child and the parent). The __arm_za_disable() function is described in [1] (8.1.3). Note that the internal Glibc name for this function is __libc_arm_za_disable(). When this change was originally proposed [2,3], it generated a long discussion where several questions and concerns were raised. Here we will address these concerns and explain why this change is useful and, in fact, necessary. In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent to this change, mainly, the chapters 6.2 and 6.6), should have a call to the __arm_za_disable() function in clone() and clone3() wrappers. The following explains in detail why this is the case. When we consider using the __arm_za_disable() function inside the clone() and clone3() libc wrappers, we talk about the C library subroutines clone() and clone3() rather than the syscalls with similar names. In the current version of Glibc, clone() is public and clone3() is private, but it being private is not pertinent to this discussion. We will begin with stating that this change is NOT a bug fix for something in the kernel. The requirement to call __arm_za_disable() does NOT come from the kernel. It also is NOT needed to satisfy a contract between the kernel and userspace. This is why it is not for the kernel documentation to describe this requirement. This requirement is instead needed to satisfy a pure userspace scheme outlined in [1] and to make sure that software that uses Glibc (or any other C library that has correct handling of SME states (see below)) conforms to [1] without having to unnecessarily become SME-aware thus losing portability. To recap (see [1] (6.2)), SME extension defines SME state which is part of processor state. Part of this SME state is ZA state that is necessary to manage ZA storage register in the context of the ZA lazy saving scheme [1] (6.6). This scheme exists because it would be challenging to handle ZA storage of SME in either callee-saved or caller-saved manner. There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA bit and the TPIDR2_EL0 register (see [1] (6.6.3)): - "off":       PSTATE.ZA == 0 - "active":    PSTATE.ZA == 1 TPIDR2_EL0 == null - "dormant":   PSTATE.ZA == 1 TPIDR2_EL0 != null As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface depending on the permitted ZA-states on entry and on normal return from a call to this subroutine. Callers of a subroutine must know and respect the ZA-interface of the subroutines they are using. Using a subroutine in a way that is not permitted by its ZA-interface is undefined behaviour. In particular, clone() and clone3() (the C library functions) have the ZA-private interface. This means that the permitted ZA-states on entry are "off" and "dormant" and that the permitted states on return are "off" or "dormant" (but if and only if it was "dormant" on entry). This means that both functions in question should correctly handle both "off" and "dormant" ZA-states on entry. The conforming states on return are "off" and "dormant" (if inbound state was already "dormant"). This change ensures that the ZA-state on return is always "off". Note, that, in the context of clone() and clone3(), "on return" means a point when execution resumes at certain address after transferring from clone() or clone3(). For the caller (we may refer to it as "parent") this is the return address in the link register where the RET instruction jumps. For the "child", this is the target branch address. So, the "off" state on return is permitted and conformant. Why can't we retain the "dormant" state? In theory, we can, but we shouldn't, here is why. Every subroutine with a private-ZA interface, including clone() and clone3(), must comply with the lazy saving scheme [1] (6.7.2). This puts additional responsibility on a subroutine if ZA-state on return is "dormant" because this state has special meaning. The "caller" (that is the place in code where execution is transferred to, so this include both "parent" and "child") may check the ZA-state and use it as per the spec of the "dormant" state that is outlined in [1] (6.6.6 and 6.6.7). Conforming to this would require more code inside of clone() and clone3() which hardly is desirable. For the return to "parent" this could be achieved in theory, but given that neither clone() nor clone3() are supposed to be used in the middle of an SME operation, if wouldn't be useful. For the "return" to "child" this would be particularly difficult to achieve given the complexity of these functions and their interfaces. Most importantly, it would be illegal and somewhat meaningless to allow a "child" to start execution in the "dormant" ZA-state because the very essence of the "dormant" state implies that there is a place to return and that there is some outer context that we are allowed to interact with. To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the execution resumes after a call to clone() or clone3() is correct and also the most simple way to conform to [1]. Can there be situations when we can avoid calling __arm_za_disable()? Calling __arm_za_disable() implies certain (sufficiently small) overhead, so one might rightly ponder avoiding making a call to this function when we can afford not to. The most trivial cases like this (e.g. when the calling thread doesn't have access to SME or to the TPIDR2_EL0 register) are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning about other possible use cases would require making code inside clone() and clone3() more complicated and it would defeat the point of trying to make an optimisation of not calling __arm_za_disable(). Why can't the kernel do this instead? The handling of SME state by the kernel is described in [4]. In short, kernel must not impose a specific ZA-interface onto a userspace function. Interaction with the kernel happens (among other thing) via system calls. In Glibc many of the system calls (notably, including SYS_clone and SYS_clone3) are used via wrappers, and the kernel has no control of them and, moreover, it cannot dictate how these wrappers should behave because it is simply outside of the kernel's remit. However, in certain cases, the kernel may ensure that a "child" doesn't start in an incorrect state. This is what is done by the recent change included in 6.16 kernel [5]. This is not enough to ensure that code that uses clone() and clone3() function conforms to [1] when it runs on a system that provides SME, hence this change. [1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst [2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com [3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com [4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html [5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 27effb3d50424fb9634be77a2acd614b0386ff25) (cherry picked from commit 256030b9842a10b1f22851b1de0c119761417544) (cherry picked from commit 889ae4bdbb4a6fbf37c2303da8cdae3d18880d9e) (cherry picked from commit 899ebf35691f01c357fc582b3e88db87accb5ee1)
2025-11-14aarch64: define macro for calling __libc_arm_za_disableYury Khrustalev
A common sequence of instructions is used in several places in assembly files, so define it in one place as an assembly macro. Note that PAC instructions are not included in the new macro because they are redundant given how we call the arm_za_disable function (return address is not saved on stack, so no need to sign it). (based on commits 6de12fc9ad56bc19fa6fcbd8ee502f29b5170d47 and c0f0db2d59e0908057205b22b21dd9d626d780c1) Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 1a0ee267147d002d66af29bcf3f5002d19b3c75a) (cherry picked from commit 7af8db46d2493521cb8e0c13907f601a61236ebf)
2025-11-14aarch64: update tests for SMEYury Khrustalev
Add test that checks that ZA state is disabled after setjmp and sigsetjmp Update existing SME test that uses setjmp Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit 251f93262483b9c1184f5b72993d77a5d1c95f68) (cherry picked from commit 97076e0cf14c635ef1d4ce1241e41c2c497533c8) (cherry picked from commit 51bcc73d95051e74512bb32fe17096b3db329cf3)
2025-11-14aarch64: Disable ZA state of SME in setjmp and sigsetjmpYury Khrustalev
Due to the nature of the ZA state, setjmp() should clear it in the same manner as it is already done by longjmp. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> (cherry picked from commit a7f6fd976c17b82dc198290b4ab7087f35855a0e) (cherry picked from commit 1f57ffdf35334ab245544cbb88f3abf2e6c77c54) (cherry picked from commit cdc1665bb188319b5f16ee05c04de7f2ba580e27)
2025-11-14linux: Also check pkey_get for ENOSYS on tst-pkey (BZ 31996)Adhemerval Zanella
The powerpc pkey_get/pkey_set support was only added for 64-bit [1], and tst-pkey only checks if the support was present with pkey_alloc (which does not fail on powerpc32, at least running a 64-bit kernel). Checked on powerpc-linux-gnu. [1] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a803367bab167f5ec4fde1f0d0ec447707c29520 Reviewed-By: Andreas K. Huettel <dilfridge@gentoo.org> (cherry picked from commit 6b7e2e1d6139b1fb61b911ab897a956042bf7f89)
2025-11-13aarch64: Do not link conform tests with -Wl,-z,force-bti (bug 33601)Florian Weimer
If the toolchain does not default to generate BTI markers in GCC, the main program for conform runtime tests will not have the BTI marker that -Wl,-z,force-bti requires. Without -Wl,-z,force-bti, the link editor will not tell the dynamic linker to enable BTI, and the missing BTI marker is harmless. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com> (cherry picked from commit 75b6b263e928eaca01d836f6bb8b539346b6bb2d)
2025-11-04x86: fix wmemset ifunc stray '!' (bug 33542)Jiamei Xie
The ifunc selector for wmemset had a stray '!' in the X86_ISA_CPU_FEATURES_ARCH_P(...) check: if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, !)) This effectively negated the predicate and caused the AVX2/AVX512 paths to be skipped, making the dispatcher fall back to the SSE2 implementation even on CPUs where AVX2/AVX512 are available. The regression leads to noticeable throughput loss for wmemset. Remove the stray '!' so the AVX_Fast_Unaligned_Load capability is tested as intended and the correct AVX2/EVEX variants are selected. Impact: - On AVX2/AVX512-capable x86_64, wmemset no longer incorrectly falls back to SSE2; perf now shows __wmemset_evex/avx2 variants. Testing: - benchtests/bench-wmemset shows improved bandwidth across sizes. - perf confirm the selected symbol is no longer SSE2. Signed-off-by: xiejiamei <xiejiamei@hygon.com> Signed-off-by: Li jing <lijing@hygon.cn> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 4d86b6cdd8132e0410347e07262239750f86dfb4)
2025-10-08x86: Detect Intel Nova Lake ProcessorSunil K Pandey
Detect Intel Nova Lake Processor and tune it similar to Intel Panther Lake. https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit a114e29ddd530962d2b44aa9d89f1f6075abe7fa)
2025-10-08x86: Detect Intel Wildcat Lake ProcessorSunil K Pandey
Detect Intel Wildcat Lake Processor and tune it similar to Intel Panther Lake. https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit f8dd52901b72805a831d5a4cb7d971e4a3c9970b)
2025-09-19nss: Group merge does not react to ERANGE during merge (bug 33361)Florian Weimer
The break statement in CHECK_MERGE is expected to exit the surrounding while loop, not the do-while loop with in the macro. Remove the do-while loop from the macro. It is not needed to turn the macro expansion into a single statement due to the way CHECK_MERGE is used (and the statement expression would cover this anyway). Reviewed-by: Collin Funk <collin.funk1@gmail.com> (cherry picked from commit 0fceed254559836b57ee05188deac649bc505d05)
2025-08-26Rename new tst-sem17 test to tst-sem18Joseph Myers
As noted by Adhemerval, we already have a tst-sem17 in nptl. Tested for x86_64. (cherry picked from commit c7dcf594f4c52fa7e2cc76918c8aa9abb98e9625)
2025-08-26Avoid uninitialized result in sem_open when file does not existJoseph Myers
A static analyzer apparently reported an uninitialized use of the variable result in sem_open in the case where the file is required to exist but does not exist. The report appears to be correct; set result to SEM_FAILED in that case, and add a test for it. Note: the test passes for me even without the sem_open fix, I guess because result happens to get value SEM_FAILED (i.e. 0) when uninitialized. Tested for x86_64. (cherry picked from commit f745d78e2628cd5b13ca119ae0c0e21d08ad1906)
2025-08-26elf: handle addition overflow in _dl_find_object_update_1 [BZ #32245]Aurelien Jarno
The remaining_to_add variable can be 0 if (current_used + count) wraps, This is caught by GCC 14+ on hppa, which determines from there that target_seg could be be NULL when remaining_to_add is zero, which in turns causes a -Wstringop-overflow warning: In file included from ../include/atomic.h:49, from dl-find_object.c:20: In function '_dlfo_update_init_seg', inlined from '_dl_find_object_update_1' at dl-find_object.c:689:30, inlined from '_dl_find_object_update' at dl-find_object.c:805:13: ../sysdeps/unix/sysv/linux/hppa/atomic-machine.h:44:4: error: '__atomic_store_4' writing 4 bytes into a region of size 0 overflows the destination [-Werror=stringop-overflow=] 44 | __atomic_store_n ((mem), (val), __ATOMIC_RELAXED); \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ dl-find_object.c:644:3: note: in expansion of macro 'atomic_store_relaxed' 644 | atomic_store_relaxed (&seg->size, new_seg_size); | ^~~~~~~~~~~~~~~~~~~~ In function '_dl_find_object_update': cc1: note: destination object is likely at address zero In practice, this is not possible as it represent counts of link maps. Link maps have sizes larger than 1 byte, so the sum of any two link map counts will always fit within a size_t without wrapping around. This patch therefore adds a check on remaining_to_add == 0 and tell GCC that this can not happen using __builtin_unreachable. Thanks to Andreas Schwab for the investigation. Closes: BZ #32245 Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Tested-by: John David Anglin <dave.anglin@bell.net> Reviewed-by: Florian Weimer <fweimer@redhat.com> (cherry picked from commit 6c915c73d08028987232f6dc718f218c61113240)
2025-08-20Optimize __libc_tsd_* thread variable accessFlorian Weimer
These variables are not exported, and libc.so TLS is initial-exec anyway. Declare these variables as hidden and use the initial-exec TLS model. Reviewed-by: Frédéric Bérat <fberat@redhat.com> (cherry picked from commit a894f04d877653bea1639fc9a4adf73bd9347bf4)
2025-08-19i386: Add GLIBC_ABI_GNU_TLS version [BZ #33221]H.J. Lu
On i386, programs and shared libraries with __thread usage may fail silently at run-time against glibc without the TLS run-time fix for: https://sourceware.org/bugzilla/show_bug.cgi?id=32996 Add GLIBC_ABI_GNU_TLS version to indicate that glibc has the working GNU TLS run-time. Linker can add the GLIBC_ABI_GNU_TLS version to binaries which depend on the working TLS run-time so that such programs and shared libraries will fail to load and run at run-time against libc.so without the GLIBC_ABI_GNU_TLS version, instead of fail silently at random. This fixes BZ #33221. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit ed1b7a5a489ab555a27fad9c101ebe2e1c1ba881)
2025-08-19i386: Also add GLIBC_ABI_GNU2_TLS version [BZ #33129]H.J. Lu
Since the GNU2 TLS run-time bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31372 affects both i386 and x86-64, also add GLIBC_ABI_GNU2_TLS version to i386 to indicate the working GNU2 TLS run-time. For x86-64, the additional GNU2 TLS run-time bug fix is needed for https://sourceware.org/bugzilla/show_bug.cgi?id=31501 Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit bd4628f3f18ac312408782eea450429c6f044860)
2025-08-19debug: Fix tst-longjmp_chk3 build failure on HurdFlorian Weimer
Explicitly include <unistd.h> for _exit and getpid. (cherry picked from commit 4836a9af89f1b4d482e6c72ff67e36226d36434c)
2025-08-19debug: Wire up tst-longjmp_chk3Florian Weimer
The test was added in commit ac8cc9e300a002228eb7e660df3e7b333d9a7414 without all the required Makefile scaffolding. Tweak the test so that it actually builds (including with dynamic SIGSTKSZ). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 4b7cfcc3fbfab55a1bbb32a2da69c048060739d6)
2025-08-18i386: Update ___tls_get_addr to preserve vector registersH.J. Lu
Compiler generates the following instruction sequence for dynamic TLS access: leal tls_var@tlsgd(,%ebx,1), %eax call ___tls_get_addr@PLT CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS, AX, CX, and DX, are unchanged after CALL. But ___tls_get_addr is a normal function which doesn't preserve any vector registers. 1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal. 2. Change ___tls_get_addr to a wrapper function with implementations for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers. 3. dl-tlsdesc-dynamic.h has: _dl_tlsdesc_dynamic: /* Like all TLS resolvers, preserve call-clobbered registers. We need two scratch regs anyway. */ subl $32, %esp cfi_adjust_cfa_offset (32) It is wrong to use movl %ebx, -28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl -28(%esp), %ebx to preserve EBX on stack. Fix it with: movl %ebx, 28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl 28(%esp), %ebx 4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly. 5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with traditional TLS variant to verify the fix. 6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h. This fixes BZ #32996. Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 848f0e46f03f22404ed9a8aabf3fd5ce8809a1be)
2025-08-18elf: Preserve _rtld_global layout for the release branchFlorian Weimer
Backporting commit 97017da5ef946c6d38c252f56c8cb7c205b732fa ("elf: Introduce _dl_debug_change_state") removed the _ns_debug member. Keep it to preseve struct layout.
2025-08-18elf: Compile _dl_debug_state separately (bug 33224)Florian Weimer
This ensures that the compiler will not inline it, so that debuggers which do not use the Systemtap probes can reliably set a breakpoint on it. Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org> Tested-by: Andreas K. Huettel <dilfridge@gentoo.org> (cherry picked from commit 620f0730f311635cd0e175a3ae4d0fc700c76366)
2025-08-18elf: Restore support for _r_debug interpositions and copy relocationsFlorian Weimer
The changes in commit a93d9e03a31ec14405cb3a09aa95413b67067380 ("Extend struct r_debug to support multiple namespaces [BZ #15971]") break the dyninst dynamic instrumentation tool. It brings its own definition of _r_debug (rather than a declaration). Furthermore, it turns out it is rather hard to use the proposed handshake for accessing _r_debug via DT_DEBUG. If applications want to access _r_debug, they can do so directly if the relevant code has been built as PIC. To protect against harm from accidental copy relocations due to linker relaxations, this commit restores copy relocation support by adjusting both copies if interposition or copy relocations are in play. Therefore, it is possible to use a hidden reference in ld.so to access _r_debug. Only perform the copy relocation initialization if libc has been loaded. Otherwise, the ld.so search scope can be empty, and the lookup of the _r_debug symbol mail fail. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit ea85e7d55087075376a29261e722e4fae14ecbe7)
2025-08-18elf: Introduce _dl_debug_change_stateFlorian Weimer
It combines updating r_state with the debugger notification. The second change to _dl_open introduces an additional debugger notification for dlmopen, but debuggers are expected to ignore it. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 8329939a37f483a16013dd8af8303cbcb86d92cb)
2025-08-18elf: Introduce separate _r_debug_array variableFlorian Weimer
It replaces the ns_debug member of the namespaces. Previously, the base namespace had an unused ns_debug member. This change also fixes a concurrency issue: Now _dl_debug_initialize only updates r_next of the previous namespace's r_debug after the new r_debug is initialized, so that only the initialized version is observed. (Client code accessing _r_debug will benefit from load dependency tracking in CPUs even without explicit barriers.) Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 7278d11f3a0cd528188c719bab75575b0aea2c6e)
2025-08-18elf: Test dlopen (NULL, RTLD_LAZY) from an ELF constructorFlorian Weimer
This call must not complete initialization of all shared objects in the global scope because the ELF constructor which makes the call likely has not finished initialization. Calling more constructors at this point would expose those to a partially constructed dependency. This completes the revert of commit 9897ced8e78db5d813166a7ccccfd5a ("elf: Run constructors on cyclic recursive dlopen (bug 31986)"). (cherry picked from commit d604f9c500570e80febfcc6a52b63a002b466f35)
2025-08-18elf: Fix handling of symbol versions which hash to zero (bug 29190)Florian Weimer
This was found through code inspection. No application impact is known. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 46d31980943d8be2f421c1e3276b265c7552636e)
2025-08-18elf: Second ld.so relocation only if libc.so has been loadedFlorian Weimer
Commit 8f8dd904c4a2207699bb666f30acceb5209c8d3f (“elf: rtld_multiple_ref is always true”) removed some code that happened to enable compatibility with programs that do not link against libc.so. Such programs cannot call dlopen or any dynamic linker functions (except __tls_get_addr), so this is not really useful. Still ld.so should not crash with a null-pointer dereference or undefined symbol reference in these cases. In the main relocation loop, call _dl_relocate_object unconditionally because it already checks if the object has been relocated. If libc.so was loaded, self-relocate ld.so against it and call __rtld_mutex_init and __rtld_malloc_init_real to activate the full implementations. Those are available only if libc.so is there, so skip these initialization steps if libc.so is absent. Without libc.so, the global scope can be completely empty. This can cause ld.so self-relocation to fail because if it uses symbol-based relocations, which is why the second ld.so self-relocation is not performed if libc.so is missing. The previous concern regarding GOT updates through self-relocation no longer applies because function pointers are updated explicitly through __rtld_mutex_init and __rtld_malloc_init_real, and not through relocation. However, the second ld.so self-relocation is still delayed, in case there are other symbols being used. Fixes commit 8f8dd904c4a2207699bb666f30acceb5209c8d3f (“elf: rtld_multiple_ref is always true”). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 706209867f1ba89c458033408d419e92d8055f58)
2025-08-18elf: Reorder audit events in dlcose to match _dl_fini (bug 32066)Florian Weimer
This was discovered after extending elf/tst-audit23 to cover dlclose of the dlmopen namespace. Auditors already experience the new order during process shutdown (_dl_fini), so no LAV_CURRENT bump or backwards compatibility code seems necessary. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 495b96e064da605630a23092d1e484ade4bdc093)
2025-08-18elf: Call la_objclose for proxy link maps in _dl_fini (bug 32065)Florian Weimer
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit c4b160744cb39eca20dc36b39c7fa6e10352706c)
2025-08-18elf: Signal la_objopen for the proxy link map in dlmopen (bug 31985)Florian Weimer
Previously, the ld.so link map was silently added to the namespace. This change produces an auditing event for it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 8f36b1469677afe37168f9af1b77402d7a70c673)
2025-08-18elf: Add the endswith function to <endswith.h>Florian Weimer
And include <stdbool.h> for a definition of bool. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit a20bc2f6233a726c7df8eaa332b6e498bd59321f)
2025-08-18elf: Update DSO list, write audit log to elf/tst-audit23.outFlorian Weimer
After commit 1d5024f4f052c12e404d42d3b5bfe9c3e9fd27c4 ("support: Build with exceptions and asynchronous unwind tables [BZ #30587]"), libgcc_s is expected to show up in the DSO list on 32-bit Arm. Do not update max_objs because vdso is not tracked (and which is the reason why the test currently passes even with libgcc_s present). Also write the log output from the auditor to standard output, for easier test debugging. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 4a50fdf8b2c1106b50cd9056b4c6f3a72cdeed5f)
2025-08-18elf: Switch to main malloc after final ld.so self-relocationFlorian Weimer
Before commit ee1ada1bdb8074de6e1bdc956ab19aef7b6a7872 ("elf: Rework exception handling in the dynamic loader [BZ #25486]"), the previous order called the main calloc to allocate a shadow GOT/PLT array for auditing support. This happened before libc.so.6 ELF constructors were run, so a user malloc could run without libc.so.6 having been initialized fully. One observable effect was that environ was NULL at this point. It does not seem to be possible at present to trigger such an allocation, but it seems more robust to delay switching to main malloc after ld.so self-relocation is complete. The elf/tst-rtld-no-malloc-audit test case fails with a 2.34-era glibc that does not have this fix. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit c1560f3f75c0e892b5522c16f91b4e303f677094)
2025-08-18elf: Introduce _dl_relocate_object_no_relroFlorian Weimer
And make _dl_protect_relro apply RELRO conditionally. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit f2326c2ec0a0a8db7bc7f4db8cce3002768fc3b6)
2025-08-18elf: Do not define consider_profiling, consider_symbind as macrosFlorian Weimer
This avoids surprises when refactoring the code if these identifiers are re-used later in the file. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit a79642204537dec8a1e1c58d1e0a074b3c624f46)
2025-08-18elf: rtld_multiple_ref is always trueFlorian Weimer
For a long time, libc.so.6 has dependend on ld.so, which means that there is a reference to ld.so in all processes, and rtld_multiple_ref is always true. In fact, if rtld_multiple_ref were false, some of the ld.so setup code would not run. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit 8f8dd904c4a2207699bb666f30acceb5209c8d3f)
2025-08-18Revert "elf: Run constructors on cyclic recursive dlopen (bug 31986)"Florian Weimer
This reverts commit 9897ced8e78db5d813166a7ccccfd5a42c69ef20. Adjust the test expectations in elf/tst-dlopen-auditdup-auditmod.c accordingly. (cherry picked from commit 95129e6b8fabdaa8cd8a4a5cc20be0f4cb0ba59f)
2025-08-18elf: Fix map_complete Systemtap probe in dl_open_workerFlorian Weimer
The refactoring did not take the change of variable into account. Fixes commit 43db5e2c0672cae7edea7c9685b22317eae25471 ("elf: Signal RT_CONSISTENT after relocation processing in dlopen (bug 31986)"). (cherry picked from commit ac73067cb7a328bf106ecd041c020fc61be7e087)
2025-08-18elf: Signal RT_CONSISTENT after relocation processing in dlopen (bug 31986)Florian Weimer
Previously, a la_activity audit event was generated before relocation processing completed. This does did not match what happened during initial startup in elf/rtld.c (towards the end of dl_main). It also caused various problems if an auditor tried to open the same shared object again using dlmopen: If it was the directly loaded object, it had a search scope associated with it, so the early exit in dl_open_worker_begin was taken even though the object was unrelocated. This caused the r_state == RT_CONSISTENT assert to fail. Avoidance of the assert also depends on reversing the order of r_state update and auditor event (already implemented in a previous commit). At the later point, args->map can be NULL due to failure, so use the assigned namespace ID instead if that is available. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 43db5e2c0672cae7edea7c9685b22317eae25471)
2025-08-18elf: Signal LA_ACT_CONSISTENT to auditors after RT_CONSISTENT switchFlorian Weimer
Auditors can call into the dynamic loader again if LA_ACT_CONSISTENT, and those recursive calls could observe r_state != RT_CONSISTENT. We should consider failing dlopen/dlmopen/dlclose if r_state != RT_CONSISTENT. The dynamic linker is probably not in a state in which it can handle reentrant calls. This needs further investigation. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit e096b7a1896886eb7dd2732ccbf1184b0eec9a63)
2025-08-18elf: Run constructors on cyclic recursive dlopen (bug 31986)Florian Weimer
This is conceptually similar to the reported bug, but does not depend on auditing. The fix is simple: just complete execution of the constructors. This exposed the fact that the link map for statically linked executables does not have l_init_called set, even though constructors have run. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 9897ced8e78db5d813166a7ccccfd5a42c69ef20)
2025-08-18ldconfig: Move endswithn into a new header fileAdam Sampson
is_gdb_python_file is doing a similar test, so it can use this helper function as well. Signed-off-by: Adam Sampson <ats@offog.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit ed2b8d3a866eb37e069f6a71bdf10421cd4c5e54)
2025-08-15x86-64: Add GLIBC_ABI_DT_X86_64_PLT [BZ #33212]H.J. Lu
When the linker -z mark-plt option is used to add DT_X86_64_PLT, DT_X86_64_PLTSZ and DT_X86_64_PLTENT, the r_addend field of the R_X86_64_JUMP_SLOT relocation stores the offset of the indirect branch instruction. However, glibc versions without the commit: commit f8587a61892cbafd98ce599131bf4f103466f084 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri May 20 19:21:48 2022 -0700 x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT According to x86-64 psABI, r_addend should be ignored for R_X86_64_GLOB_DAT and R_X86_64_JUMP_SLOT. Since linkers always set their r_addends to 0, we can ignore their r_addends. Reviewed-by: Fangrui Song <maskray@google.com> won't ignore the r_addend value in the R_X86_64_JUMP_SLOT relocation. Such programs and shared libraries will fail at run-time randomly. Add GLIBC_ABI_DT_X86_64_PLT version to indicate that glibc is compatible with DT_X86_64_PLT. The linker can add the glibc GLIBC_ABI_DT_X86_64_PLT version dependency whenever -z mark-plt is passed to the linker. The resulting programs and shared libraries will fail to load at run-time against libc.so without the GLIBC_ABI_DT_X86_64_PLT version, instead of fail randomly. This fixes BZ #33212. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> (cherry picked from commit 399384e0c8193e31aea014220ccfa24300ae5938)