llvm-project.git/libc/src/string, branch main

[libc] add an SVE implementation of strlen (#167259)

2025-11-11T00:15:34+00:00

This PR creates an SVE-based implementation for strlen by translating
from the AOR code in tree. Microbenchmark shows improvements against
NEON when N>=64. Although both implementations fall behind glibc by a
large margin,
this may be a good start point to explore SVE implementations.

Together with the PR:

1. Added two more tests of strlen with special nul symbols.
2. Added strlen's fuzzer and fix a typo in previous heap fuzzer.

```
=== strlen(16 bytes) ===
libc: 1.56115 ns/call, 9.54499 GiB/s
neon: 1.59393 ns/call, 9.34867 GiB/s
sve: 1.66097 ns/call, 8.97134 GiB/s

=== strlen(64 bytes) ===
libc: 2.06967 ns/call, 28.7991 GiB/s
neon: 2.59914 ns/call, 22.9325 GiB/s
sve: 2.58628 ns/call, 23.0465 GiB/s

=== strlen(256 bytes) ===
libc: 3.74165 ns/call, 63.7202 GiB/s
neon: 8.98243 ns/call, 26.5428 GiB/s
sve: 7.36426 ns/call, 32.3751 GiB/s

=== strlen(1024 bytes) ===
libc: 10.5327 ns/call, 90.5438 GiB/s
neon: 34.363 ns/call, 27.7529 GiB/s
sve: 26.9329 ns/call, 35.4092 GiB/s

=== strlen(4096 bytes) ===
libc: 37.7304 ns/call, 101.104 GiB/s
neon: 145.911 ns/call, 26.144 GiB/s
sve: 103.208 ns/call, 36.9612 GiB/s

=== strlen(1048576 bytes) ===
libc: 9623.4 ns/call, 101.478 GiB/s
neon: 36138.2 ns/call, 27.023 GiB/s
sve: 26605.6 ns/call, 36.7051 GiB/s
```

[libc] Fix stale char_ptr for find_first_character_wide read (#166594)

2025-11-06T14:48:14+00:00

On exit from the loop, char_ptr had not been updated to match block_ptr,
resulting in erroneous results. Moving all updates out of the loop fixes
that.

Adjust derefences to always be inside bounds checks.

[libc] Migrate ctype_utils to use char instead of int where applicable. (#166225)

2025-11-05T19:02:28+00:00

Functions like isalpha / tolower can operate on chars internally. This
allows us to get rid of unnecessary casts and open a way to creating
wchar_t overloads with the same names (e.g. for isalpha), that would
simplify templated code for conversion functions (see
315dfe5865962d8a3d60e21d1fffce5214fe54ef).

Add the int->char converstion to public entrypoints implementation
instead. We also need to introduce bounds check on the input argument
values - these functions' behavior is unspecified if the argument is
neither EOF nor fits in "unsigned char" range, but the tests we've had
verified that they always return false for small negative values. To
preserve this behavior, cover it explicitly.

Move LIBC_CONF_STRING_UNSAFE_WIDE_READ to top-level libc-configuration (#165046)

2025-10-27T21:39:46+00:00

This options sets a compile option when building sources inside the
string directory, and this option affects string_utils.h. But
string_utils.h is #included from more places than just the string
directory (such as from __support/CPP/string.h), leading to both
narrow-reads in those cases, but more seriously, ODR violations when the
two different string_length implementations are included int he same
program.

Having this option at the top level avoids this problem.

Revert "[libc] Implement branchless head-tail comparison for bcmp" (#162859)

2025-10-13T21:25:42+00:00

Reverts llvm/llvm-project#107540

This PR demonstrated improvements on micro-benchmarks but the gains did
not seem to materialize in production. We are reverting this change for
now to get more data. This PR might be reintegrated later once we're
more confident in its effects.

[libc] Use UMAXV.4S to reduce bcmp result.

2025-10-13T18:21:48+00:00

We can use UMAXV.4S to reduce the comparison result in a single
instruction. This improves performance by roughly 4% on Apple M1:

Summary
  bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 ran
    1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
    1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
    1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
    1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
    1.02 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
    1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
    1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
    1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
    1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
    1.05 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
    1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10

(1 = original, 2 = a variant of this patch that uses UMAXV.16B, 3 = this patch)

Reviewers: michaelrj-google, gchatelet, overmighty, SchrodingerZhu

Pull Request: https://github.com/llvm/llvm-project/pull/99260

[libc] Unify and extend no_sanitize attributes for strlen. (#161316)

2025-10-02T00:26:22+00:00

Fast strlen implementations (naive wide-reads, SIMD-based, and
x86_64/aarch64-optimized versions) all may perform
technically-out-of-bound reads, which leads to reports under ASan,
HWASan (on ARM machines), and also TSan (which also has the capability
to detect heap out-of-bound reads). So, we need to explicitly disable
instrumentation in all three cases.

Tragically, Clang didn't support `[[gnu::no_sanitize]]` syntax until
recently, and since we're supporting both GCC and Clang, we have to
revert to `__attribute__` syntax.

[libc][msvc] fix mathlib build on WoA (#161258)

2025-09-29T19:40:21+00:00

Fix build errors encountered when building math library on WoA.

1. Skip FEnv equality check for MSVC
2. Provide a placeholder type for vector types.

[libc] Update the memory helper functions for simd types (#160174)

2025-09-26T13:18:00+00:00

Summary:
This unifies the interface to just be a bunch of `load` and `store`
functions that optionally accept a mask / indices for gathers and
scatters with masks.

I had to rename this from `load` and `store` because it conflicts with
the other version in `op_generic`. I might just work around that with a
trait instead.

[libc] Clean up mask helpers after allowing implicit conversions (#158681)

2025-09-16T17:39:56+00:00

Summary:
I landed a change in clang that allows integral vectors to implicitly
convert to boolean ones. This means I can simplify the interface and
remove the need to cast to bool on every use. Also do some other
cleanups of the traits.