<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glibc.git, branch release/2.40/master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/'/>
<entry>
<title>ppc64le: Restore optimized strncmp for power10</title>
<updated>2025-11-21T06:40:03+00:00</updated>
<author>
<name>Sachin Monga</name>
<email>smonga@linux.ibm.com</email>
</author>
<published>2025-11-21T06:39:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=4bfd05e2b8f69a1178272c60c17961a61e4e09b3'/>
<id>4bfd05e2b8f69a1178272c60c17961a61e4e09b3</id>
<content type='text'>
This patch addresses the actual cause of CVE-2025-5745

The vector non-volatile registers are not used anymore for
32 byte load and comparison operation

Additionally, the assembler workaround used earlier for the
instruction lxvp is replaced with actual instruction.

Signed-off-by: Sachin Monga &lt;smonga@linux.ibm.com&gt;
Co-authored-by: Paul Murphy &lt;paumurph@redhat.com&gt;
(cherry picked from commit 2ea943f7d487d6a4166658b32af7c5365889fc34)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch addresses the actual cause of CVE-2025-5745

The vector non-volatile registers are not used anymore for
32 byte load and comparison operation

Additionally, the assembler workaround used earlier for the
instruction lxvp is replaced with actual instruction.

Signed-off-by: Sachin Monga &lt;smonga@linux.ibm.com&gt;
Co-authored-by: Paul Murphy &lt;paumurph@redhat.com&gt;
(cherry picked from commit 2ea943f7d487d6a4166658b32af7c5365889fc34)
</pre>
</div>
</content>
</entry>
<entry>
<title>ppc64le: Restore optimized strcmp for power10</title>
<updated>2025-11-20T15:46:42+00:00</updated>
<author>
<name>Sachin Monga</name>
<email>smonga@linux.ibm.com</email>
</author>
<published>2025-10-07T08:17:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=a460dc7bd9dee24a2a305d87b8646ba683cd57de'/>
<id>a460dc7bd9dee24a2a305d87b8646ba683cd57de</id>
<content type='text'>
This patch addresses the actual cause of CVE-2025-5702

The vector non-volatile registers are not used anymore for
32 byte load and comparison operation

Additionally, the assembler workaround used earlier for the
instruction lxvp is replaced with actual instruction.

Signed-off-by: Sachin Monga &lt;smonga@linux.ibm.com&gt;
Co-authored-by: Paul Murphy &lt;paumurph@redhat.com&gt;
(cherry picked from commit 9a40b1cda519cc4f532acb6d020390829df3d81b)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch addresses the actual cause of CVE-2025-5702

The vector non-volatile registers are not used anymore for
32 byte load and comparison operation

Additionally, the assembler workaround used earlier for the
instruction lxvp is replaced with actual instruction.

Signed-off-by: Sachin Monga &lt;smonga@linux.ibm.com&gt;
Co-authored-by: Paul Murphy &lt;paumurph@redhat.com&gt;
(cherry picked from commit 9a40b1cda519cc4f532acb6d020390829df3d81b)
</pre>
</div>
</content>
</entry>
<entry>
<title>AArch64: Fix and improve SVE pow(f) special cases</title>
<updated>2025-11-18T16:21:55+00:00</updated>
<author>
<name>Pierre Blanchard</name>
<email>pierre.blanchard@arm.com</email>
</author>
<published>2025-11-18T15:09:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=a07f8630bf8b99ebc7f1df829e4c46b84a3ff4cc'/>
<id>a07f8630bf8b99ebc7f1df829e4c46b84a3ff4cc</id>
<content type='text'>
powf:

Update scalar special case function to best use new interface.

pow:

Make specialcase NOINLINE to prevent str/ldr leaking in fast path.
Remove depency in sv_call2, as new callback impl is not a
performance gain.
Replace with vectorised specialcase since structure of scalar
routine is fairly simple.

Throughput gain of about 5-10% on V1 for large values and 25% for subnormal `x`.

Reviewed-by: Wilco Dijkstra  &lt;Wilco.Dijkstra@arm.com&gt;
(cherry picked from commit bb6519de1e6fe73d79bc71588ec4e5668907f080)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
powf:

Update scalar special case function to best use new interface.

pow:

Make specialcase NOINLINE to prevent str/ldr leaking in fast path.
Remove depency in sv_call2, as new callback impl is not a
performance gain.
Replace with vectorised specialcase since structure of scalar
routine is fairly simple.

Throughput gain of about 5-10% on V1 for large values and 25% for subnormal `x`.

Reviewed-by: Wilco Dijkstra  &lt;Wilco.Dijkstra@arm.com&gt;
(cherry picked from commit bb6519de1e6fe73d79bc71588ec4e5668907f080)
</pre>
</div>
</content>
</entry>
<entry>
<title>AArch64: Fix instability in AdvSIMD tan</title>
<updated>2025-11-18T16:18:33+00:00</updated>
<author>
<name>Joe Ramsay</name>
<email>Joe.Ramsay@arm.com</email>
</author>
<published>2025-11-06T18:26:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=b06fc96c20c30c28205725f3f60df4f5a7458bd3'/>
<id>b06fc96c20c30c28205725f3f60df4f5a7458bd3</id>
<content type='text'>
Previously presence of special-cases in one lane could affect the
results in other lanes due to unconditional scalar fallback. The old
WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has
been removed from AOR, making it easier to spot and fix this. 4%
improvement in throughput with GCC 14 on Neoverse V1. This bug is
present as far back as 2.39 (where tan was first introduced).

Reviewed-by: Wilco Dijkstra  &lt;Wilco.Dijkstra@arm.com&gt;
(cherry picked from commit 6c22823da57aa5218f717f569c04c9573c0448c5)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously presence of special-cases in one lane could affect the
results in other lanes due to unconditional scalar fallback. The old
WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has
been removed from AOR, making it easier to spot and fix this. 4%
improvement in throughput with GCC 14 on Neoverse V1. This bug is
present as far back as 2.39 (where tan was first introduced).

Reviewed-by: Wilco Dijkstra  &lt;Wilco.Dijkstra@arm.com&gt;
(cherry picked from commit 6c22823da57aa5218f717f569c04c9573c0448c5)
</pre>
</div>
</content>
</entry>
<entry>
<title>AArch64: Optimise SVE scalar callbacks</title>
<updated>2025-11-18T16:18:23+00:00</updated>
<author>
<name>Joe Ramsay</name>
<email>Joe.Ramsay@arm.com</email>
</author>
<published>2025-11-06T15:36:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=303363922cf027713ed887241cd9b59cf705fe11'/>
<id>303363922cf027713ed887241cd9b59cf705fe11</id>
<content type='text'>
Instead of using SVE instructions to marshall special results into the
correct lane, just write the entire vector (and the predicate) to
memory, then use cheaper scalar operations.

Geomean speedup of 16% in special intervals on Neoverse with GCC 14.

Reviewed-by: Wilco Dijkstra  &lt;Wilco.Dijkstra@arm.com&gt;
(cherry picked from commit 5b82fb18827e962af9f080fdf3c1a69802783f67)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of using SVE instructions to marshall special results into the
correct lane, just write the entire vector (and the predicate) to
memory, then use cheaper scalar operations.

Geomean speedup of 16% in special intervals on Neoverse with GCC 14.

Reviewed-by: Wilco Dijkstra  &lt;Wilco.Dijkstra@arm.com&gt;
(cherry picked from commit 5b82fb18827e962af9f080fdf3c1a69802783f67)
</pre>
</div>
</content>
</entry>
<entry>
<title>aarch64: fix includes in SME tests</title>
<updated>2025-11-13T14:18:42+00:00</updated>
<author>
<name>Yury Khrustalev</name>
<email>yury.khrustalev@arm.com</email>
</author>
<published>2025-11-11T11:40:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=a66680adf3b2266c177f94f3f63e4b182e6362fe'/>
<id>a66680adf3b2266c177f94f3f63e4b182e6362fe</id>
<content type='text'>
Use the correct include for the SIGCHLD macro: signal.h

Reviewed-by: Wilco Dijkstra  &lt;Wilco.Dijkstra@arm.com&gt;
(cherry picked from commit a9c426bcca59a9e228c4fbe75e75154217ec4ada)
(cherry picked from commit 17c3eab387c3ceb6972e57888a89b1480793f81a)
(cherry picked from commit 215e9155ea06064342151d05446ae51da16e0f65)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use the correct include for the SIGCHLD macro: signal.h

Reviewed-by: Wilco Dijkstra  &lt;Wilco.Dijkstra@arm.com&gt;
(cherry picked from commit a9c426bcca59a9e228c4fbe75e75154217ec4ada)
(cherry picked from commit 17c3eab387c3ceb6972e57888a89b1480793f81a)
(cherry picked from commit 215e9155ea06064342151d05446ae51da16e0f65)
</pre>
</div>
</content>
</entry>
<entry>
<title>aarch64: fix cfi directives around __libc_arm_za_disable</title>
<updated>2025-11-13T14:18:42+00:00</updated>
<author>
<name>Yury Khrustalev</name>
<email>yury.khrustalev@arm.com</email>
</author>
<published>2025-10-28T11:01:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=1c0ad5ea63b2e5d39eb55f734b0e2bea7f766523'/>
<id>1c0ad5ea63b2e5d39eb55f734b0e2bea7f766523</id>
<content type='text'>
Incorrect CFI directive corrupted call stack information
and prevented debuggers from correctly displaying call
stack information.

Reviewed-by: Adhemerval Zanella  &lt;adhemerval.zanella@linaro.org&gt;
(cherry picked from commit 2f77aec043f61e8533487850b11941a640ae2dea)
(cherry picked from commit de1fe81f471496366580ad728b8986a3424b2fd7)
(cherry picked from commit 5bf8ee7ad559fd60bedd3f5ec831d0b12b5000b8)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Incorrect CFI directive corrupted call stack information
and prevented debuggers from correctly displaying call
stack information.

Reviewed-by: Adhemerval Zanella  &lt;adhemerval.zanella@linaro.org&gt;
(cherry picked from commit 2f77aec043f61e8533487850b11941a640ae2dea)
(cherry picked from commit de1fe81f471496366580ad728b8986a3424b2fd7)
(cherry picked from commit 5bf8ee7ad559fd60bedd3f5ec831d0b12b5000b8)
</pre>
</div>
</content>
</entry>
<entry>
<title>aarch64: tests for SME</title>
<updated>2025-11-13T14:18:42+00:00</updated>
<author>
<name>Yury Khrustalev</name>
<email>yury.khrustalev@arm.com</email>
</author>
<published>2025-09-26T09:03:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=df3bcda9ac85eb720d27bbbc469cdc898cc9b02b'/>
<id>df3bcda9ac85eb720d27bbbc469cdc898cc9b02b</id>
<content type='text'>
This commit adds tests for the following use cases relevant to handing of
the SME state:

 - fork() and vfork()
 - clone() and clone3()
 - signal handler

While most cases are trivial, the case of clone3() is more complicated since
the clone3() symbol is not public in Glibc.

To avoid having to check all possible ways clone3() may be called via other
public functions (e.g. vfork() or pthread_create()), we put together a test
that links directly with clone3.o. All the existing functions that have calls
to clone3() may not actually use it, in which case the outcome of such tests
would be unexpected. Having a direct call to the clone3() symbol in the test
allows to check precisely what we need to test: that the __arm_za_disable()
function is indeed called and has the desired effect.

Linking to clone3.o also requires linking to __arm_za_disable.o that in
turn requires the _dl_hwcap2 hidden symbol which to provide in the test
and initialise it before using.

Co-authored-by: Adhemerval Zanella Netto &lt;adhemerval.zanella@linaro.org&gt;
Reviewed-by: Adhemerval Zanella  &lt;adhemerval.zanella@linaro.org&gt;
(cherry picked from commit ecb0fc2f0f839f36cd2a106283142c9df8ea8214)
(cherry picked from commit 71874f167aa5bb1538ff7e394beaacee28ebe65f)
(cherry picked from commit e4ffcf32b9213352917dcf7dc43adcaa0ff76503)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit adds tests for the following use cases relevant to handing of
the SME state:

 - fork() and vfork()
 - clone() and clone3()
 - signal handler

While most cases are trivial, the case of clone3() is more complicated since
the clone3() symbol is not public in Glibc.

To avoid having to check all possible ways clone3() may be called via other
public functions (e.g. vfork() or pthread_create()), we put together a test
that links directly with clone3.o. All the existing functions that have calls
to clone3() may not actually use it, in which case the outcome of such tests
would be unexpected. Having a direct call to the clone3() symbol in the test
allows to check precisely what we need to test: that the __arm_za_disable()
function is indeed called and has the desired effect.

Linking to clone3.o also requires linking to __arm_za_disable.o that in
turn requires the _dl_hwcap2 hidden symbol which to provide in the test
and initialise it before using.

Co-authored-by: Adhemerval Zanella Netto &lt;adhemerval.zanella@linaro.org&gt;
Reviewed-by: Adhemerval Zanella  &lt;adhemerval.zanella@linaro.org&gt;
(cherry picked from commit ecb0fc2f0f839f36cd2a106283142c9df8ea8214)
(cherry picked from commit 71874f167aa5bb1538ff7e394beaacee28ebe65f)
(cherry picked from commit e4ffcf32b9213352917dcf7dc43adcaa0ff76503)
</pre>
</div>
</content>
</entry>
<entry>
<title>aarch64: clear ZA state of SME before clone and clone3 syscalls</title>
<updated>2025-11-13T14:18:42+00:00</updated>
<author>
<name>Yury Khrustalev</name>
<email>yury.khrustalev@arm.com</email>
</author>
<published>2025-09-25T14:54:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=899ebf35691f01c357fc582b3e88db87accb5ee1'/>
<id>899ebf35691f01c357fc582b3e88db87accb5ee1</id>
<content type='text'>
This change adds a call to the __arm_za_disable() function immediately
before the SVC instruction inside clone() and clone3() wrappers. It also
adds a macro for inline clone() used in fork() and adds the same call to
the vfork implementation. This sets the ZA state of SME to "off" on return
from these functions (for both the child and the parent).

The __arm_za_disable() function is described in [1] (8.1.3). Note that
the internal Glibc name for this function is __libc_arm_za_disable().

When this change was originally proposed [2,3], it generated a long
discussion where several questions and concerns were raised. Here we
will address these concerns and explain why this change is useful and,
in fact, necessary.

In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent
to this change, mainly, the chapters 6.2 and 6.6), should have a call to the
__arm_za_disable() function in clone() and clone3() wrappers. The following
explains in detail why this is the case.

When we consider using the __arm_za_disable() function inside the clone()
and clone3() libc wrappers, we talk about the C library subroutines clone()
and clone3() rather than the syscalls with similar names. In the current
version of Glibc, clone() is public and clone3() is private, but it being
private is not pertinent to this discussion.

We will begin with stating that this change is NOT a bug fix for something
in the kernel. The requirement to call __arm_za_disable() does NOT come from
the kernel. It also is NOT needed to satisfy a contract between the kernel
and userspace. This is why it is not for the kernel documentation to describe
this requirement. This requirement is instead needed to satisfy a pure userspace
scheme outlined in [1] and to make sure that software that uses Glibc (or any
other C library that has correct handling of SME states (see below)) conforms
to [1] without having to unnecessarily become SME-aware thus losing portability.

To recap (see [1] (6.2)), SME extension defines SME state which is part of
processor state. Part of this SME state is ZA state that is necessary to
manage ZA storage register in the context of the ZA lazy saving scheme [1]
(6.6). This scheme exists because it would be challenging to handle ZA
storage of SME in either callee-saved or caller-saved manner.

There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA
bit and the TPIDR2_EL0 register (see [1] (6.6.3)):

- "off":       PSTATE.ZA == 0
- "active":    PSTATE.ZA == 1 TPIDR2_EL0 == null
- "dormant":   PSTATE.ZA == 1 TPIDR2_EL0 != null

As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface
depending on the permitted ZA-states on entry and on normal return from
a call to this subroutine. Callers of a subroutine must know and respect
the ZA-interface of the subroutines they are using. Using a subroutine
in a way that is not permitted by its ZA-interface is undefined behaviour.

In particular, clone() and clone3() (the C library functions) have the
ZA-private interface. This means that the permitted ZA-states on entry
are "off" and "dormant" and that the permitted states on return are "off"
or "dormant" (but if and only if it was "dormant" on entry).

This means that both functions in question should correctly handle both
"off" and "dormant" ZA-states on entry. The conforming states on return
are "off" and "dormant" (if inbound state was already "dormant").

This change ensures that the ZA-state on return is always "off". Note,
that, in the context of clone() and clone3(), "on return" means a point
when execution resumes at certain address after transferring from clone()
or clone3(). For the caller (we may refer to it as "parent") this is the
return address in the link register where the RET instruction jumps. For
the "child", this is the target branch address.

So, the "off" state on return is permitted and conformant. Why can't we
retain the "dormant" state? In theory, we can, but we shouldn't, here is
why.

Every subroutine with a private-ZA interface, including clone() and clone3(),
must comply with the lazy saving scheme [1] (6.7.2). This puts additional
responsibility on a subroutine if ZA-state on return is "dormant" because
this state has special meaning. The "caller" (that is the place in code
where execution is transferred to, so this include both "parent" and "child")
may check the ZA-state and use it as per the spec of the "dormant" state that
is outlined in [1] (6.6.6 and 6.6.7).

Conforming to this would require more code inside of clone() and clone3()
which hardly is desirable.

For the return to "parent" this could be achieved in theory, but given that
neither clone() nor clone3() are supposed to be used in the middle of an
SME operation, if wouldn't be useful. For the "return" to "child" this
would be particularly difficult to achieve given the complexity of these
functions and their interfaces. Most importantly, it would be illegal
and somewhat meaningless to allow a "child" to start execution in the
"dormant" ZA-state because the very essence of the "dormant" state implies
that there is a place to return and that there is some outer context that
we are allowed to interact with.

To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the
execution resumes after a call to clone() or clone3() is correct and also
the most simple way to conform to [1].

Can there be situations when we can avoid calling __arm_za_disable()?

Calling __arm_za_disable() implies certain (sufficiently small) overhead,
so one might rightly ponder avoiding making a call to this function when
we can afford not to. The most trivial cases like this (e.g. when the
calling thread doesn't have access to SME or to the TPIDR2_EL0 register)
are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning
about other possible use cases would require making code inside clone() and
clone3() more complicated and it would defeat the point of trying to make
an optimisation of not calling __arm_za_disable().

Why can't the kernel do this instead?

The handling of SME state by the kernel is described in [4]. In short,
kernel must not impose a specific ZA-interface onto a userspace function.
Interaction with the kernel happens (among other thing) via system calls.
In Glibc many of the system calls (notably, including SYS_clone and
SYS_clone3) are used via wrappers, and the kernel has no control of them
and, moreover, it cannot dictate how these wrappers should behave because
it is simply outside of the kernel's remit.

However, in certain cases, the kernel may ensure that a "child" doesn't
start in an incorrect state. This is what is done by the recent change
included in 6.16 kernel [5]. This is not enough to ensure that code that
uses clone() and clone3() function conforms to [1] when it runs on a
system that provides SME, hence this change.

[1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
[2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com
[3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com
[4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html
[5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29

Reviewed-by: Adhemerval Zanella  &lt;adhemerval.zanella@linaro.org&gt;
(cherry picked from commit 27effb3d50424fb9634be77a2acd614b0386ff25)
(cherry picked from commit 256030b9842a10b1f22851b1de0c119761417544)
(cherry picked from commit 889ae4bdbb4a6fbf37c2303da8cdae3d18880d9e)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change adds a call to the __arm_za_disable() function immediately
before the SVC instruction inside clone() and clone3() wrappers. It also
adds a macro for inline clone() used in fork() and adds the same call to
the vfork implementation. This sets the ZA state of SME to "off" on return
from these functions (for both the child and the parent).

The __arm_za_disable() function is described in [1] (8.1.3). Note that
the internal Glibc name for this function is __libc_arm_za_disable().

When this change was originally proposed [2,3], it generated a long
discussion where several questions and concerns were raised. Here we
will address these concerns and explain why this change is useful and,
in fact, necessary.

In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent
to this change, mainly, the chapters 6.2 and 6.6), should have a call to the
__arm_za_disable() function in clone() and clone3() wrappers. The following
explains in detail why this is the case.

When we consider using the __arm_za_disable() function inside the clone()
and clone3() libc wrappers, we talk about the C library subroutines clone()
and clone3() rather than the syscalls with similar names. In the current
version of Glibc, clone() is public and clone3() is private, but it being
private is not pertinent to this discussion.

We will begin with stating that this change is NOT a bug fix for something
in the kernel. The requirement to call __arm_za_disable() does NOT come from
the kernel. It also is NOT needed to satisfy a contract between the kernel
and userspace. This is why it is not for the kernel documentation to describe
this requirement. This requirement is instead needed to satisfy a pure userspace
scheme outlined in [1] and to make sure that software that uses Glibc (or any
other C library that has correct handling of SME states (see below)) conforms
to [1] without having to unnecessarily become SME-aware thus losing portability.

To recap (see [1] (6.2)), SME extension defines SME state which is part of
processor state. Part of this SME state is ZA state that is necessary to
manage ZA storage register in the context of the ZA lazy saving scheme [1]
(6.6). This scheme exists because it would be challenging to handle ZA
storage of SME in either callee-saved or caller-saved manner.

There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA
bit and the TPIDR2_EL0 register (see [1] (6.6.3)):

- "off":       PSTATE.ZA == 0
- "active":    PSTATE.ZA == 1 TPIDR2_EL0 == null
- "dormant":   PSTATE.ZA == 1 TPIDR2_EL0 != null

As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface
depending on the permitted ZA-states on entry and on normal return from
a call to this subroutine. Callers of a subroutine must know and respect
the ZA-interface of the subroutines they are using. Using a subroutine
in a way that is not permitted by its ZA-interface is undefined behaviour.

In particular, clone() and clone3() (the C library functions) have the
ZA-private interface. This means that the permitted ZA-states on entry
are "off" and "dormant" and that the permitted states on return are "off"
or "dormant" (but if and only if it was "dormant" on entry).

This means that both functions in question should correctly handle both
"off" and "dormant" ZA-states on entry. The conforming states on return
are "off" and "dormant" (if inbound state was already "dormant").

This change ensures that the ZA-state on return is always "off". Note,
that, in the context of clone() and clone3(), "on return" means a point
when execution resumes at certain address after transferring from clone()
or clone3(). For the caller (we may refer to it as "parent") this is the
return address in the link register where the RET instruction jumps. For
the "child", this is the target branch address.

So, the "off" state on return is permitted and conformant. Why can't we
retain the "dormant" state? In theory, we can, but we shouldn't, here is
why.

Every subroutine with a private-ZA interface, including clone() and clone3(),
must comply with the lazy saving scheme [1] (6.7.2). This puts additional
responsibility on a subroutine if ZA-state on return is "dormant" because
this state has special meaning. The "caller" (that is the place in code
where execution is transferred to, so this include both "parent" and "child")
may check the ZA-state and use it as per the spec of the "dormant" state that
is outlined in [1] (6.6.6 and 6.6.7).

Conforming to this would require more code inside of clone() and clone3()
which hardly is desirable.

For the return to "parent" this could be achieved in theory, but given that
neither clone() nor clone3() are supposed to be used in the middle of an
SME operation, if wouldn't be useful. For the "return" to "child" this
would be particularly difficult to achieve given the complexity of these
functions and their interfaces. Most importantly, it would be illegal
and somewhat meaningless to allow a "child" to start execution in the
"dormant" ZA-state because the very essence of the "dormant" state implies
that there is a place to return and that there is some outer context that
we are allowed to interact with.

To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the
execution resumes after a call to clone() or clone3() is correct and also
the most simple way to conform to [1].

Can there be situations when we can avoid calling __arm_za_disable()?

Calling __arm_za_disable() implies certain (sufficiently small) overhead,
so one might rightly ponder avoiding making a call to this function when
we can afford not to. The most trivial cases like this (e.g. when the
calling thread doesn't have access to SME or to the TPIDR2_EL0 register)
are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning
about other possible use cases would require making code inside clone() and
clone3() more complicated and it would defeat the point of trying to make
an optimisation of not calling __arm_za_disable().

Why can't the kernel do this instead?

The handling of SME state by the kernel is described in [4]. In short,
kernel must not impose a specific ZA-interface onto a userspace function.
Interaction with the kernel happens (among other thing) via system calls.
In Glibc many of the system calls (notably, including SYS_clone and
SYS_clone3) are used via wrappers, and the kernel has no control of them
and, moreover, it cannot dictate how these wrappers should behave because
it is simply outside of the kernel's remit.

However, in certain cases, the kernel may ensure that a "child" doesn't
start in an incorrect state. This is what is done by the recent change
included in 6.16 kernel [5]. This is not enough to ensure that code that
uses clone() and clone3() function conforms to [1] when it runs on a
system that provides SME, hence this change.

[1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
[2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com
[3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com
[4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html
[5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29

Reviewed-by: Adhemerval Zanella  &lt;adhemerval.zanella@linaro.org&gt;
(cherry picked from commit 27effb3d50424fb9634be77a2acd614b0386ff25)
(cherry picked from commit 256030b9842a10b1f22851b1de0c119761417544)
(cherry picked from commit 889ae4bdbb4a6fbf37c2303da8cdae3d18880d9e)
</pre>
</div>
</content>
</entry>
<entry>
<title>aarch64: define macro for calling __libc_arm_za_disable</title>
<updated>2025-11-13T14:18:42+00:00</updated>
<author>
<name>Yury Khrustalev</name>
<email>yury.khrustalev@arm.com</email>
</author>
<published>2025-10-21T12:48:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=7af8db46d2493521cb8e0c13907f601a61236ebf'/>
<id>7af8db46d2493521cb8e0c13907f601a61236ebf</id>
<content type='text'>
A common sequence of instructions is used in several places
in assembly files, so define it in one place as an assembly
macro.

Note that PAC instructions are not included in the new macro
because they are redundant given how we call the arm_za_disable
function (return address is not saved on stack, so no need to
sign it).

(based on commits 6de12fc9ad56bc19fa6fcbd8ee502f29b5170d47
 and c0f0db2d59e0908057205b22b21dd9d626d780c1)

Reviewed-by: Carlos O'Donell &lt;carlos@redhat.com&gt;
(cherry picked from commit 1a0ee267147d002d66af29bcf3f5002d19b3c75a)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A common sequence of instructions is used in several places
in assembly files, so define it in one place as an assembly
macro.

Note that PAC instructions are not included in the new macro
because they are redundant given how we call the arm_za_disable
function (return address is not saved on stack, so no need to
sign it).

(based on commits 6de12fc9ad56bc19fa6fcbd8ee502f29b5170d47
 and c0f0db2d59e0908057205b22b21dd9d626d780c1)

Reviewed-by: Carlos O'Donell &lt;carlos@redhat.com&gt;
(cherry picked from commit 1a0ee267147d002d66af29bcf3f5002d19b3c75a)
</pre>
</div>
</content>
</entry>
</feed>
