summaryrefslogtreecommitdiff
path: root/libc/src/string/memory_utils
AgeCommit message (Collapse)Author
2025-11-10[libc] add an SVE implementation of strlen (#167259)Schrodinger ZHU Yifan
This PR creates an SVE-based implementation for strlen by translating from the AOR code in tree. Microbenchmark shows improvements against NEON when N>=64. Although both implementations fall behind glibc by a large margin, this may be a good start point to explore SVE implementations. Together with the PR: 1. Added two more tests of strlen with special nul symbols. 2. Added strlen's fuzzer and fix a typo in previous heap fuzzer. ``` === strlen(16 bytes) === libc: 1.56115 ns/call, 9.54499 GiB/s neon: 1.59393 ns/call, 9.34867 GiB/s sve: 1.66097 ns/call, 8.97134 GiB/s === strlen(64 bytes) === libc: 2.06967 ns/call, 28.7991 GiB/s neon: 2.59914 ns/call, 22.9325 GiB/s sve: 2.58628 ns/call, 23.0465 GiB/s === strlen(256 bytes) === libc: 3.74165 ns/call, 63.7202 GiB/s neon: 8.98243 ns/call, 26.5428 GiB/s sve: 7.36426 ns/call, 32.3751 GiB/s === strlen(1024 bytes) === libc: 10.5327 ns/call, 90.5438 GiB/s neon: 34.363 ns/call, 27.7529 GiB/s sve: 26.9329 ns/call, 35.4092 GiB/s === strlen(4096 bytes) === libc: 37.7304 ns/call, 101.104 GiB/s neon: 145.911 ns/call, 26.144 GiB/s sve: 103.208 ns/call, 36.9612 GiB/s === strlen(1048576 bytes) === libc: 9623.4 ns/call, 101.478 GiB/s neon: 36138.2 ns/call, 27.023 GiB/s sve: 26605.6 ns/call, 36.7051 GiB/s ```
2025-10-13Revert "[libc] Implement branchless head-tail comparison for bcmp" (#162859)Guillaume Chatelet
Reverts llvm/llvm-project#107540 This PR demonstrated improvements on micro-benchmarks but the gains did not seem to materialize in production. We are reverting this change for now to get more data. This PR might be reintegrated later once we're more confident in its effects.
2025-10-13[libc] Use UMAXV.4S to reduce bcmp result.Peter Collingbourne
We can use UMAXV.4S to reduce the comparison result in a single instruction. This improves performance by roughly 4% on Apple M1: Summary bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 ran 1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.02 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.05 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 (1 = original, 2 = a variant of this patch that uses UMAXV.16B, 3 = this patch) Reviewers: michaelrj-google, gchatelet, overmighty, SchrodingerZhu Pull Request: https://github.com/llvm/llvm-project/pull/99260
2025-10-01[libc] Unify and extend no_sanitize attributes for strlen. (#161316)Alexey Samsonov
Fast strlen implementations (naive wide-reads, SIMD-based, and x86_64/aarch64-optimized versions) all may perform technically-out-of-bound reads, which leads to reports under ASan, HWASan (on ARM machines), and also TSan (which also has the capability to detect heap out-of-bound reads). So, we need to explicitly disable instrumentation in all three cases. Tragically, Clang didn't support `[[gnu::no_sanitize]]` syntax until recently, and since we're supporting both GCC and Clang, we have to revert to `__attribute__` syntax.
2025-09-29[libc][msvc] fix mathlib build on WoA (#161258)Schrodinger ZHU Yifan
Fix build errors encountered when building math library on WoA. 1. Skip FEnv equality check for MSVC 2. Provide a placeholder type for vector types.
2025-09-26[libc] Update the memory helper functions for simd types (#160174)Joseph Huber
Summary: This unifies the interface to just be a bunch of `load` and `store` functions that optionally accept a mask / indices for gathers and scatters with masks. I had to rename this from `load` and `store` because it conflicts with the other version in `op_generic`. I might just work around that with a trait instead.
2025-09-16[libc] Clean up mask helpers after allowing implicit conversions (#158681)Joseph Huber
Summary: I landed a change in clang that allows integral vectors to implicitly convert to boolean ones. This means I can simplify the interface and remove the need to cast to bool on every use. Also do some other cleanups of the traits.
2025-09-12[libc] Some MSVC compatibility changes for src/string/memory_utils. (#158393)lntue
2025-09-02[libc] Add more elementwise wrapper functions (#156515)Joseph Huber
Summary: Fills out some of the missing fundamental floating point operations. These just wrap the elementwise builtin of the same name.
2025-09-02 [libc] Implement generic SIMD helper 'simd.h' and implement strlen (#152605)Joseph Huber
Summary: This PR introduces a new 'simd.h' header that implements an interface similar to the proposed `stdx::simd` in C++. However, we instead wrap around the LLVM internal type. This makes heavy use of the clang vector extensions and boolean vectors, instead using primitive vector types instead of a class (many benefits to this). I use this interface to implement a generic strlen implementation, but propse we use this for math. Right now this requires a feature only introduced in clang-22.
2025-08-21Fix wide read defaultsJoseph Huber
2025-08-21Reapply "[libc] Enable wide-read memory operations by default on Linux ↵Joseph Huber
(#154602)" (#154640) Reland afterr the sanitizer and arm32 builds complained.
2025-08-20Revert "[libc] Enable wide-read memory operations by default on Linux (#154602)"Joseph Huber
This reverts commit c80d1483c6d787edf62ff9e86b1e97af5eb5abf9.
2025-08-20[libc] Enable wide-read memory operations by default on Linux (#154602)Joseph Huber
Summary: This patch changes the linux build to use the wide reads on the memory operations by default. These memory functions will now potentially read outside of the bounds explicitly allowed by the current function. While technically undefined behavior in the standard, plenty of C library implementations do this. it will not cause a segmentation fault on linux as long as you do not cross a page boundary, and because we are only *reading* memory it should not have atomic effects.
2025-08-19Add vector-based strlen implementation for x86_64 and aarch64 (#152389)Sterling-Augustine
These replace the default LIBC_CONF_STRING_UNSAFE_WIDE_READ implementation on x86_64 and aarch64. These are substantially faster than both the character-by-character implementation and the original unsafe_wide_read implementation. Some below I have been unable to performance-test the aarch64 version, but I suspect speedups similar to avx2. ``` Function: strlen Variant: char wide ull sse2 avx2 avx512 ============================================================================================================================================================= length=1, alignment=1: 13.18 20.47 (-55.24%) 20.21 (-53.27%) 32.50 (-146.54%) 26.05 (-97.61%) 18.03 (-36.74%) length=1, alignment=0: 12.80 34.92 (-172.89%) 20.01 (-56.39%) 17.52 (-36.86%) 17.78 (-38.92%) 18.04 (-40.94%) length=2, alignment=2: 9.91 19.02 (-91.95%) 12.64 (-27.52%) 11.06 (-11.59%) 9.48 ( 4.38%) 9.48 ( 4.34%) length=2, alignment=0: 9.56 26.88 (-181.24%) 12.64 (-32.31%) 11.06 (-15.73%) 11.06 (-15.72%) 11.83 (-23.80%) length=3, alignment=3: 8.31 10.45 (-25.84%) 8.28 ( 0.32%) 8.28 ( 0.36%) 6.21 ( 25.28%) 6.21 ( 25.24%) length=3, alignment=0: 8.39 14.53 (-73.20%) 8.28 ( 1.33%) 7.24 ( 13.69%) 7.56 ( 9.94%) 7.25 ( 13.65%) length=4, alignment=4: 9.84 21.76 (-121.24%) 15.55 (-58.11%) 6.57 ( 33.18%) 5.02 ( 48.98%) 6.00 ( 39.00%) length=4, alignment=0: 8.64 13.70 (-58.51%) 7.28 ( 15.73%) 6.37 ( 26.31%) 6.36 ( 26.36%) 6.36 ( 26.36%) length=5, alignment=5: 11.85 23.81 (-100.97%) 12.17 ( -2.67%) 5.68 ( 52.09%) 4.87 ( 58.94%) 6.48 ( 45.33%) length=5, alignment=0: 11.82 13.64 (-15.42%) 7.27 ( 38.45%) 6.36 ( 46.15%) 6.37 ( 46.11%) 6.36 ( 46.14%) length=6, alignment=6: 10.50 19.37 (-84.56%) 13.64 (-29.93%) 6.54 ( 37.71%) 6.89 ( 34.35%) 9.45 ( 10.01%) length=6, alignment=0: 14.96 14.05 ( 6.04%) 6.49 ( 56.62%) 5.68 ( 62.04%) 5.68 ( 62.04%) 13.15 ( 12.05%) length=7, alignment=7: 10.97 18.02 (-64.35%) 14.59 (-33.06%) 6.36 ( 41.96%) 5.46 ( 50.25%) 5.46 ( 50.25%) length=7, alignment=0: 10.96 15.76 (-43.77%) 15.37 (-40.15%) 6.96 ( 36.51%) 5.68 ( 48.22%) 7.04 ( 35.83%) length=4, alignment=0: 8.66 13.69 (-58.02%) 7.28 ( 16.00%) 6.37 ( 26.44%) 6.37 ( 26.52%) 6.61 ( 23.74%) length=4, alignment=7: 8.87 17.35 (-95.73%) 12.18 (-37.39%) 5.68 ( 35.94%) 4.87 ( 45.11%) 6.00 ( 32.36%) length=4, alignment=2: 8.67 10.05 (-15.91%) 7.28 ( 16.01%) 7.37 ( 15.02%) 5.46 ( 37.02%) 5.47 ( 36.89%) length=2, alignment=2: 5.64 10.01 (-77.64%) 7.29 (-29.34%) 6.37 (-13.04%) 5.46 ( 3.19%) 5.46 ( 3.19%) length=8, alignment=0: 12.78 16.52 (-29.33%) 18.27 (-43.00%) 11.82 ( 7.47%) 9.83 ( 23.03%) 11.46 ( 10.27%) length=8, alignment=7: 14.24 17.30 (-21.49%) 12.16 ( 14.59%) 5.68 ( 60.14%) 4.87 ( 65.83%) 6.23 ( 56.28%) length=8, alignment=3: 12.34 26.15 (-111.98%) 12.20 ( 1.14%) 6.50 ( 47.34%) 4.87 ( 60.54%) 6.18 ( 49.94%) length=5, alignment=3: 10.95 19.74 (-80.30%) 12.17 (-11.11%) 5.68 ( 48.16%) 4.87 ( 55.56%) 5.96 ( 45.55%) length=16, alignment=0: 20.33 29.29 (-44.08%) 36.18 (-77.97%) 5.68 ( 72.06%) 5.68 ( 72.08%) 10.60 ( 47.86%) length=16, alignment=7: 19.29 17.52 ( 9.16%) 12.98 ( 32.73%) 7.05 ( 63.47%) 4.87 ( 74.75%) 6.23 ( 67.71%) length=16, alignment=4: 20.54 25.18 (-22.56%) 15.42 ( 24.92%) 7.31 ( 64.43%) 4.87 ( 76.29%) 5.98 ( 70.88%) length=10, alignment=4: 14.59 21.26 (-45.71%) 12.17 ( 16.58%) 5.68 ( 61.07%) 4.87 ( 66.65%) 6.00 ( 58.91%) length=32, alignment=0: 35.46 22.00 ( 37.95%) 16.22 ( 54.26%) 7.32 ( 79.35%) 5.68 ( 83.98%) 7.01 ( 80.22%) length=32, alignment=7: 35.23 24.14 ( 31.48%) 16.22 ( 53.96%) 7.30 ( 79.28%) 8.76 ( 75.12%) 6.14 ( 82.58%) length=32, alignment=5: 35.16 28.56 ( 18.76%) 16.22 ( 53.87%) 7.30 ( 79.23%) 6.77 ( 80.75%) 9.82 ( 72.07%) length=21, alignment=5: 26.47 27.66 ( -4.49%) 15.04 ( 43.17%) 6.90 ( 73.95%) 4.87 ( 81.60%) 6.04 ( 77.18%) length=64, alignment=0: 66.45 25.16 ( 62.14%) 22.70 ( 65.83%) 12.99 ( 80.44%) 7.47 ( 88.77%) 8.70 ( 86.90%) length=64, alignment=7: 64.75 27.78 ( 57.10%) 22.72 ( 64.91%) 10.85 ( 83.25%) 7.46 ( 88.48%) 8.68 ( 86.60%) length=64, alignment=6: 67.26 28.58 ( 57.51%) 22.70 ( 66.24%) 11.26 ( 83.25%) 9.46 ( 85.94%) 13.90 ( 79.33%) length=42, alignment=6: 73.42 27.97 ( 61.91%) 19.46 ( 73.49%) 8.92 ( 87.84%) 6.49 ( 91.16%) 6.00 ( 91.83%) length=128, alignment=0: 172.07 39.18 ( 77.23%) 35.68 ( 79.26%) 13.02 ( 92.43%) 12.98 ( 92.46%) 9.76 ( 94.33%) length=128, alignment=7: 163.98 43.79 ( 73.30%) 36.03 ( 78.03%) 15.68 ( 90.44%) 11.35 ( 93.08%) 10.51 ( 93.59%) length=128, alignment=7: 185.86 40.27 ( 78.33%) 36.04 ( 80.61%) 13.78 ( 92.58%) 11.35 ( 93.89%) 10.49 ( 94.36%) length=85, alignment=7: 121.61 55.66 ( 54.23%) 32.34 ( 73.40%) 13.88 ( 88.59%) 7.30 ( 94.00%) 8.72 ( 92.83%) length=256, alignment=0: 295.54 66.48 ( 77.50%) 61.63 ( 79.15%) 19.54 ( 93.39%) 12.97 ( 95.61%) 12.45 ( 95.79%) length=256, alignment=7: 308.06 78.92 ( 74.38%) 61.63 ( 80.00%) 22.90 ( 92.57%) 12.97 ( 95.79%) 13.23 ( 95.71%) length=256, alignment=8: 295.32 65.83 ( 77.71%) 61.62 ( 79.13%) 23.19 ( 92.15%) 12.97 ( 95.61%) 13.50 ( 95.43%) length=170, alignment=8: 234.39 48.79 ( 79.18%) 43.79 ( 81.32%) 16.22 ( 93.08%) 13.97 ( 94.04%) 10.48 ( 95.53%) length=512, alignment=0: 563.75 116.89 ( 79.27%) 114.99 ( 79.60%) 62.71 ( 88.88%) 19.58 ( 96.53%) 17.76 ( 96.85%) length=512, alignment=7: 580.53 120.91 ( 79.17%) 114.47 ( 80.28%) 37.75 ( 93.50%) 19.55 ( 96.63%) 18.68 ( 96.78%) length=512, alignment=9: 584.05 128.35 ( 78.02%) 114.74 ( 80.35%) 39.09 ( 93.31%) 19.76 ( 96.62%) 18.71 ( 96.80%) length=341, alignment=9: 405.84 90.87 ( 77.61%) 78.79 ( 80.59%) 28.77 ( 92.91%) 14.60 ( 96.40%) 14.15 ( 96.51%) length=1024, alignment=0: 1143.61 247.03 ( 78.40%) 243.70 ( 78.69%) 75.59 ( 93.39%) 67.02 ( 94.14%) 28.99 ( 97.46%) length=1024, alignment=7: 1124.55 267.87 ( 76.18%) 259.16 ( 76.95%) 64.96 ( 94.22%) 33.05 ( 97.06%) 30.91 ( 97.25%) length=1024, alignment=10: 1459.58 257.79 ( 82.34%) 239.91 ( 83.56%) 65.00 ( 95.55%) 33.10 ( 97.73%) 30.33 ( 97.92%) length=682, alignment=10: 732.89 163.67 ( 77.67%) 170.54 ( 76.73%) 46.48 ( 93.66%) 24.32 ( 96.68%) 21.44 ( 97.07%) length=2048, alignment=0: 2141.96 451.61 ( 78.92%) 448.00 ( 79.08%) 133.24 ( 93.78%) 61.22 ( 97.14%) 80.08 ( 96.26%) length=2048, alignment=7: 2145.05 458.26 ( 78.64%) 449.99 ( 79.02%) 140.19 ( 93.46%) 60.26 ( 97.19%) 51.71 ( 97.59%) length=2048, alignment=11: 2162.61 463.37 ( 78.57%) 448.07 ( 79.28%) 140.29 ( 93.51%) 59.51 ( 97.25%) 51.59 ( 97.61%) length=1365, alignment=11: 1439.74 322.86 ( 77.58%) 310.84 ( 78.41%) 116.08 ( 91.94%) 42.43 ( 97.05%) 36.15 ( 97.49%) length=4096, alignment=0: 4278.68 871.60 ( 79.63%) 865.25 ( 79.78%) 252.50 ( 94.10%) 161.17 ( 96.23%) 94.97 ( 97.78%) length=4096, alignment=7: 4253.01 871.62 ( 79.51%) 864.21 ( 79.68%) 243.90 ( 94.27%) 171.17 ( 95.98%) 95.14 ( 97.76%) length=4096, alignment=12: 4252.18 879.66 ( 79.31%) 863.68 ( 79.69%) 244.26 ( 94.26%) 185.36 ( 95.64%) 93.61 ( 97.80%) length=2730, alignment=12: 2868.22 597.65 ( 79.16%) 586.22 ( 79.56%) 175.09 ( 93.90%) 120.35 ( 95.80%) 101.35 ( 96.47%) length=0, alignment=0: 4.87 8.11 (-66.73%) 6.49 (-33.34%) 5.80 (-19.26%) 5.68 (-16.67%) 6.86 (-40.91%) length=32, alignment=0: 33.82 22.36 ( 33.89%) 17.03 ( 49.66%) 7.30 ( 78.42%) 5.68 ( 83.22%) 7.50 ( 77.83%) length=64, alignment=0: 66.20 26.76 ( 59.58%) 23.22 ( 64.93%) 12.99 ( 80.37%) 7.34 ( 88.92%) 8.44 ( 87.25%) length=96, alignment=0: 130.26 31.62 ( 75.72%) 30.00 ( 76.97%) 11.39 ( 91.26%) 10.54 ( 91.91%) 8.68 ( 93.34%) length=128, alignment=0: 164.66 39.05 ( 76.29%) 35.68 ( 78.33%) 13.07 ( 92.07%) 12.97 ( 92.12%) 9.59 ( 94.18%) length=160, alignment=0: 196.63 45.18 ( 77.02%) 42.16 ( 78.56%) 14.65 ( 92.55%) 10.87 ( 94.47%) 9.31 ( 95.27%) length=192, alignment=0: 225.50 52.71 ( 76.63%) 49.61 ( 78.00%) 16.22 ( 92.81%) 11.36 ( 94.96%) 11.08 ( 95.09%) length=224, alignment=0: 261.08 57.57 ( 77.95%) 55.82 ( 78.62%) 17.84 ( 93.17%) 12.16 ( 95.34%) 11.51 ( 95.59%) length=256, alignment=0: 295.13 65.56 ( 77.79%) 62.59 ( 78.79%) 19.46 ( 93.41%) 13.12 ( 95.56%) 12.33 ( 95.82%) length=288, alignment=0: 325.69 72.16 ( 77.84%) 69.20 ( 78.75%) 21.08 ( 93.53%) 13.94 ( 95.72%) 12.32 ( 96.22%) length=320, alignment=0: 364.18 78.78 ( 78.37%) 75.69 ( 79.21%) 22.71 ( 93.77%) 14.70 ( 95.96%) 14.46 ( 96.03%) length=352, alignment=0: 391.40 84.87 ( 78.32%) 82.15 ( 79.01%) 24.50 ( 93.74%) 15.62 ( 96.01%) 14.27 ( 96.35%) length=384, alignment=0: 428.50 91.43 ( 78.66%) 88.70 ( 79.30%) 26.16 ( 93.90%) 17.29 ( 95.97%) 15.04 ( 96.49%) length=416, alignment=0: 457.30 98.23 ( 78.52%) 95.02 ( 79.22%) 27.81 ( 93.92%) 17.22 ( 96.23%) 15.05 ( 96.71%) length=448, alignment=0: 488.38 104.52 ( 78.60%) 101.87 ( 79.14%) 31.22 ( 93.61%) 18.07 ( 96.30%) 16.89 ( 96.54%) length=480, alignment=0: 526.44 109.61 ( 79.18%) 108.11 ( 79.46%) 31.11 ( 94.09%) 18.88 ( 96.41%) 17.10 ( 96.75%) length=512, alignment=0: 556.50 117.29 ( 78.92%) 113.78 ( 79.56%) 62.57 ( 88.76%) 19.88 ( 96.43%) 17.80 ( 96.80%) length=576, alignment=0: 622.17 152.93 ( 75.42%) 127.58 ( 79.49%) 39.34 ( 93.68%) 21.31 ( 96.58%) 19.99 ( 96.79%) length=640, alignment=0: 691.01 142.56 ( 79.37%) 161.78 ( 76.59%) 39.20 ( 94.33%) 22.98 ( 96.67%) 20.13 ( 97.09%) length=704, alignment=0: 756.90 156.31 ( 79.35%) 176.19 ( 76.72%) 45.03 ( 94.05%) 24.82 ( 96.72%) 22.33 ( 97.05%) length=768, alignment=0: 826.23 193.17 ( 76.62%) 188.41 ( 77.20%) 50.81 ( 93.85%) 27.46 ( 96.68%) 23.25 ( 97.19%) length=832, alignment=0: 890.17 204.81 ( 76.99%) 201.61 ( 77.35%) 53.77 ( 93.96%) 27.73 ( 96.88%) 25.06 ( 97.18%) length=896, alignment=0: 959.52 217.89 ( 77.29%) 213.86 ( 77.71%) 57.99 ( 93.96%) 29.53 ( 96.92%) 26.29 ( 97.26%) length=960, alignment=0: 1024.52 231.06 ( 77.45%) 227.05 ( 77.84%) 60.36 ( 94.11%) 32.29 ( 96.85%) 27.94 ( 97.27%) length=1024, alignment=0: 1086.71 244.17 ( 77.53%) 239.87 ( 77.93%) 64.72 ( 94.04%) 72.38 ( 93.34%) 28.72 ( 97.36%) length=1152, alignment=0: 1231.48 270.22 ( 78.06%) 266.47 ( 78.36%) 73.38 ( 94.04%) 40.24 ( 96.73%) 32.42 ( 97.37%) length=1280, alignment=0: 1349.29 295.45 ( 78.10%) 292.69 ( 78.31%) 111.80 ( 91.71%) 42.44 ( 96.85%) 34.59 ( 97.44%) length=1408, alignment=0: 1487.13 322.57 ( 78.31%) 318.18 ( 78.60%) 84.47 ( 94.32%) 44.35 ( 97.02%) 37.31 ( 97.49%) length=1536, alignment=0: 1623.52 347.98 ( 78.57%) 344.24 ( 78.80%) 108.31 ( 93.33%) 49.82 ( 96.93%) 39.94 ( 97.54%) length=1664, alignment=0: 1748.88 373.80 ( 78.63%) 370.03 ( 78.84%) 118.76 ( 93.21%) 52.89 ( 96.98%) 42.93 ( 97.55%) length=1792, alignment=0: 1886.22 399.59 ( 78.82%) 397.39 ( 78.93%) 127.32 ( 93.25%) 53.64 ( 97.16%) 45.39 ( 97.59%) length=1920, alignment=0: 2018.37 425.98 ( 78.89%) 422.31 ( 79.08%) 126.70 ( 93.72%) 57.08 ( 97.17%) 48.12 ( 97.62%) length=2048, alignment=0: 2167.09 451.70 ( 79.16%) 447.70 ( 79.34%) 141.68 ( 93.46%) 61.63 ( 97.16%) 79.06 ( 96.35%) length=2304, alignment=0: 2422.03 503.63 ( 79.21%) 502.23 ( 79.26%) 149.62 ( 93.82%) 73.10 ( 96.98%) 56.97 ( 97.65%) length=2560, alignment=0: 2678.68 556.84 ( 79.21%) 553.24 ( 79.35%) 161.06 ( 93.99%) 127.74 ( 95.23%) 58.81 ( 97.80%) length=2816, alignment=0: 2941.95 608.70 ( 79.31%) 604.03 ( 79.47%) 171.85 ( 94.16%) 87.11 ( 97.04%) 67.08 ( 97.72%) length=3072, alignment=0: 3229.89 660.14 ( 79.56%) 659.19 ( 79.59%) 183.85 ( 94.31%) 140.25 ( 95.66%) 73.01 ( 97.74%) length=3328, alignment=0: 3496.08 713.05 ( 79.60%) 710.00 ( 79.69%) 209.72 ( 94.00%) 138.78 ( 96.03%) 77.81 ( 97.77%) length=3584, alignment=0: 3756.52 766.19 ( 79.60%) 763.94 ( 79.66%) 214.16 ( 94.30%) 146.36 ( 96.10%) 83.43 ( 97.78%) length=3840, alignment=0: 4017.15 817.43 ( 79.65%) 819.77 ( 79.59%) 242.07 ( 93.97%) 164.56 ( 95.90%) 89.72 ( 97.77%) length=4096, alignment=0: 4281.59 867.87 ( 79.73%) 864.71 ( 79.80%) 243.33 ( 94.32%) 173.11 ( 95.96%) 95.65 ( 97.77%) length=4608, alignment=0: 4810.30 977.80 ( 79.67%) 985.03 ( 79.52%) 271.13 ( 94.36%) 190.62 ( 96.04%) 107.82 ( 97.76%) length=5120, alignment=0: 5380.16 1075.77 ( 80.00%) 1071.80 ( 80.08%) 294.27 ( 94.53%) 206.04 ( 96.17%) 141.90 ( 97.36%) length=5632, alignment=0: 5925.70 1195.61 ( 79.82%) 1193.68 ( 79.86%) 323.42 ( 94.54%) 223.55 ( 96.23%) 125.28 ( 97.89%) length=6144, alignment=0: 6402.20 1285.52 ( 79.92%) 1281.04 ( 79.99%) 342.68 ( 94.65%) 234.84 ( 96.33%) 167.01 ( 97.39%) length=6656, alignment=0: 6997.01 1387.32 ( 80.17%) 1384.21 ( 80.22%) 365.93 ( 94.77%) 269.89 ( 96.14%) 176.40 ( 97.48%) length=7168, alignment=0: 7454.76 1492.10 ( 79.98%) 1488.45 ( 80.03%) 391.92 ( 94.74%) 280.81 ( 96.23%) 187.73 ( 97.48%) length=7680, alignment=0: 8163.34 1608.43 ( 80.30%) 1615.98 ( 80.20%) 460.03 ( 94.36%) 299.86 ( 96.33%) 201.40 ( 97.53%) ```
2025-07-23[libc][NFC] Add stdint.h proxy header to fix dependency issue with ↵lntue
<stdint.h> includes. (#150303) https://github.com/llvm/llvm-project/issues/149993
2025-07-17[libc] Improve Cortex `memset` and `memcpy` functions (#149044)Guillaume Chatelet
The code for `memcpy` is the same as in #148204 but it fixes the build bot error by using `static_assert(cpp::always_false<decltype(access)>)` instead of `static_assert(false)` (older compilers fails on `static_assert(false)` in `constexpr` `else` bodies). The code for `memset` is new and vastly improves performance over the current byte per byte implementation. Both `memset` and `memcpy` implementations use prefetching for sizes >= 64. This lowers a bit the performance for sizes between 64 and 256 but improves throughput for greater sizes.
2025-07-16Revert "[libc][NFC] refactor Cortex `memcpy` code" (#149035)Guillaume Chatelet
Reverts llvm/llvm-project#148204 `libc-arm32-qemu-debian-dbg` is failing, reverting and investigating
2025-07-16[libc][NFC] refactor Cortex `memcpy` code (#148204)Guillaume Chatelet
This patch is in preparation for the Cortex `memset` implementation. It improves the codegen by generating a prefetch for large sizes.
2025-06-26[libc] Improve memcpy for ARM Cortex-M supporting unaligned accesses. (#144872)Guillaume Chatelet
This implementation has been compiled with the [pigweed toolchain](https://pigweed.dev/toolchain.html) and tested on: - Raspberry Pi Pico 2 with the following options\ `--target=armv8m.main-none-eabi` `-march=armv8m.main+fp+dsp` `-mcpu=cortex-m33` - Raspberry Pi Pico with the following options\ `--target=armv6m-none-eabi` `-march=armv6m` `-mcpu=cortex-m0+` They both compile down to a little bit more than 200 bytes and are between 2 and 10 times faster than byte per byte copies. For best performance the following options can be set in the `libc/config/baremetal/arm/config.json` ``` { "codegen": { "LIBC_CONF_KEEP_FRAME_POINTER": { "value": false } }, "general": { "LIBC_ADD_NULL_CHECKS": { "value": false } } } ```
2025-05-02[libc] Add support for string/memory_utils functions for AArch64 without HW ↵William
FP/SIMD (#137592) Add conditional compilation to add support for AArch64 without vector registers and/or hardware FPUs by using the generic implementation. **Context:** A few functions were hard-coded to use vector registers/hardware FPUs. This meant that libc would not compile on architectures that did not support these features. This fix falls back on the generic implementation if a feature is not supported.
2025-03-14[libc] Fix memmove macros for unreocognized targetsJoseph Huber
2025-03-14[libc] Default to `byte_per_byte` instead of erroring (#131340)Joseph Huber
Summary: Right now a lot of the memory functions error if we don't have specific handling for them. This is weird because we have a generic implementation that should just be used whenever someone hasn't written a more optimized version. This allows us to use the `libc` headers with more architectures from the `shared/` directory without worrying about it breaking.
2025-03-10[libc] Add `-Wno-sign-conversion` & re-attempt `-Wconversion` (#129811)Vinay Deshmukh
Relates to https://github.com/llvm/llvm-project/issues/119281#issuecomment-2699470459
2025-03-05Revert "[libc] Enable -Wconversion for tests. (#127523)"Augie Fackler
This reverts commit 1e6e845d49a336e9da7ca6c576ec45c0b419b5f6 because it changed the 1st parameter of adjust() to be unsigned, but libc itself calls adjust() with a negative argument in align_backward() in op_generic.h.
2025-03-04[libc] Fix casts for arm32 after Wconversion (#129771)Michael Jones
Followup to #127523 There were some test failures on arm32 after enabling Wconversion. There were some tests that were failing due to missing casts. Also I changed BigInt's `safe_get_at` back to being signed since it needed the ability to be negative.
2025-03-04[libc] Enable -Wconversion for tests. (#127523)Vinay Deshmukh
Relates to: #119281
2025-02-05[libc] Fix all imports of src/string/memory_utils (#114939)Krishna Pandey
Fixed imports for all files *within* `libc/src/string/memory_utils`. Note: This doesn't include **all** files that need to be fixed. Fixes #86579
2024-11-25[libc] suppress string warning in case intrinsics are defined as macros ↵Schrodinger ZHU Yifan
(#117640)
2024-11-13[libc] Rename libc/src/__support/endian.h to endian_internal.h (#115950)Daniel Thornburgh
This prevents a conflict with the Linux system endian.h when built in overlay mode for CPP files in __support. This issue appeared in PR #106259.
2024-10-22[libc][x86] copy one cache line at a time to prevent the use of `rep;movsb` ↵Guillaume Chatelet
(#113161) When using `-mprefer-vector-width=128` with `-march=sandybridge` copying 3 cache lines in one go (192B) gets converted into `rep;movsb` which translate into a 60% hit in performance. Consecutive calls to `__builtin_memcpy_inline` (implementation behind `builtin::Memcpy::block_offset`) are not coalesced by the compiler and so calling it three times in a row generates the desired assembly. It only differs in the interleaving of the loads and stores and does not affect performance. This is needed to reland https://github.com/llvm/llvm-project/pull/108939.
2024-10-06[libc] Clean up some include in `libc`. (#110980)c8ef
The patch primarily cleans up some incorrect includes. The `LIBC_INLINE` macro is defined in `attributes.h`, not `config.h`. There appears to be no need to change the CMake and Bazel build files.
2024-09-06[libc] Implement branchless head-tail comparison for bcmp (#107540)Vitaly Goldshteyn
Binary size changes: | Bytes (cache lines) | before | after | |---------------------|----------|---------| | sse4 | 419 (7) | 288 (5) | | avx | 430 (7) | 308 (5) | | avx512f | 589 (10) | 390 (7) | Benchmarks for different CPUs using https://github.com/google/fleetbench. - indus-cascadelake ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 1.96GB/s ± 1% 2.19GB/s ± 0% +11.49% (p=0.000 n=29+24) BM_LIBC_Bcmp_Fleet_L2 1.90GB/s ± 1% 2.14GB/s ± 1% +12.68% (p=0.000 n=29+24) BM_LIBC_Bcmp_Fleet_LLC 513MB/s ± 4% 531MB/s ± 4% +3.53% (p=0.000 n=24+24) BM_LIBC_Bcmp_Fleet_Cold 452MB/s ± 3% 456MB/s ± 4% ~ (p=0.103 n=30+30) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 2.98GB/s ± 1% 3.15GB/s ± 1% +5.59% (p=0.000 n=29+30) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 2.86GB/s ± 1% 3.07GB/s ± 1% +7.21% (p=0.000 n=29+30) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 738MB/s ± 7% 751MB/s ± 3% +1.68% (p=0.000 n=24+25) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 643MB/s ± 3% 642MB/s ± 4% ~ (p=0.522 n=29+30) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.08GB/s ± 0% 3.25GB/s ± 0% +5.35% (p=0.000 n=28+30) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 2.97GB/s ± 1% 3.17GB/s ± 1% +6.65% (p=0.000 n=29+30) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 901MB/s ±59% 871MB/s ±36% ~ (p=0.676 n=29+27) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 686MB/s ± 4% 686MB/s ± 3% ~ (p=0.934 n=29+30) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.63GB/s ± 0% 1.80GB/s ± 1% +10.19% (p=0.000 n=29+30) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.57GB/s ± 1% 1.75GB/s ± 1% +11.46% (p=0.000 n=29+30) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 451MB/s ±61% 427MB/s ±28% ~ (p=0.469 n=29+25) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 353MB/s ± 4% 354MB/s ± 5% ~ (p=0.467 n=30+30) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 1.91GB/s ± 1% 2.10GB/s ± 1% +9.90% (p=0.000 n=29+29) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 1.84GB/s ± 1% 2.03GB/s ± 1% +10.63% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 491MB/s ±24% 538MB/s ±24% +9.66% (p=0.000 n=24+27) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 417MB/s ± 4% 421MB/s ± 3% ~ (p=0.063 n=30+29) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 761MB/s ± 1% 867MB/s ± 1% +14.02% (p=0.000 n=28+30) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 748MB/s ± 1% 860MB/s ± 1% +15.04% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 227MB/s ±29% 260MB/s ±64% +14.70% (p=0.000 n=26+27) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 187MB/s ± 3% 191MB/s ± 5% +2.26% (p=0.000 n=30+30) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.48GB/s ± 1% 1.71GB/s ± 1% +15.26% (p=0.000 n=29+30) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.42GB/s ± 1% 1.67GB/s ± 1% +17.68% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 412MB/s ±34% 519MB/s ±80% +25.87% (p=0.000 n=27+30) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 336MB/s ± 4% 343MB/s ± 6% +2.05% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 2.87GB/s ± 0% 3.24GB/s ± 1% +12.88% (p=0.000 n=26+30) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 2.78GB/s ± 1% 3.20GB/s ± 1% +15.15% (p=0.000 n=26+30) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 926MB/s ±43% 1227MB/s ±76% +32.53% (p=0.000 n=27+30) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 716MB/s ± 4% 737MB/s ± 6% +3.02% (p=0.000 n=28+29) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.54GB/s ± 1% 1.56GB/s ± 0% +1.40% (p=0.000 n=29+30) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.47GB/s ± 1% 1.52GB/s ± 1% +2.97% (p=0.000 n=27+30) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 351MB/s ±23% 436MB/s ±83% +24.04% (p=0.005 n=24+29) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 283MB/s ± 4% 282MB/s ± 4% ~ (p=0.644 n=30+30) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 824MB/s ± 1% 1048MB/s ± 1% +27.18% (p=0.000 n=29+30) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 808MB/s ± 1% 1027MB/s ± 1% +27.12% (p=0.000 n=29+29) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 317MB/s ±79% 332MB/s ±74% ~ (p=0.338 n=30+29) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 207MB/s ± 5% 212MB/s ± 5% +2.27% (p=0.000 n=30+30) ``` - indus-skylake ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 2.06GB/s ± 2% 2.25GB/s ± 3% +9.66% (p=0.000 n=27+24) BM_LIBC_Bcmp_Fleet_L2 1.96GB/s ± 2% 2.17GB/s ± 2% +10.61% (p=0.000 n=30+24) BM_LIBC_Bcmp_Fleet_LLC 1.18GB/s ± 6% 1.32GB/s ± 5% +12.27% (p=0.000 n=28+28) BM_LIBC_Bcmp_Fleet_Cold 456MB/s ± 2% 466MB/s ± 2% +2.22% (p=0.000 n=28+28) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.08GB/s ± 2% 3.20GB/s ± 1% +3.72% (p=0.000 n=28+22) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 2.92GB/s ± 1% 3.05GB/s ± 2% +4.49% (p=0.000 n=23+23) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 1.83GB/s ± 8% 1.94GB/s ± 4% +6.24% (p=0.000 n=25+27) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 654MB/s ± 2% 659MB/s ± 2% +0.76% (p=0.012 n=30+29) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.19GB/s ± 2% 3.34GB/s ± 2% +4.41% (p=0.000 n=26+23) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.05GB/s ± 2% 3.21GB/s ± 2% +5.32% (p=0.000 n=28+25) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 1.95GB/s ± 4% 2.03GB/s ±10% +3.61% (p=0.000 n=27+30) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 700MB/s ± 2% 702MB/s ± 2% ~ (p=0.150 n=30+30) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.69GB/s ± 2% 1.85GB/s ± 1% +9.31% (p=0.000 n=30+26) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.60GB/s ± 2% 1.78GB/s ± 2% +10.90% (p=0.000 n=26+27) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.01GB/s ± 5% 1.12GB/s ± 5% +11.40% (p=0.000 n=27+28) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 355MB/s ± 3% 360MB/s ± 3% +1.46% (p=0.000 n=30+30) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 1.98GB/s ± 2% 2.15GB/s ± 2% +8.89% (p=0.000 n=29+27) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 1.87GB/s ± 3% 2.05GB/s ± 2% +10.06% (p=0.000 n=30+26) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.19GB/s ± 4% 1.31GB/s ± 6% +9.82% (p=0.000 n=27+29) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 424MB/s ± 3% 431MB/s ± 3% +1.58% (p=0.000 n=28+30) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 849MB/s ± 2% 949MB/s ± 2% +11.84% (p=0.000 n=27+28) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 815MB/s ± 3% 913MB/s ± 3% +12.06% (p=0.000 n=29+30) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 512MB/s ± 9% 571MB/s ± 7% +11.40% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 187MB/s ± 3% 192MB/s ± 2% +2.56% (p=0.000 n=30+28) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.55GB/s ± 2% 1.77GB/s ± 3% +13.93% (p=0.000 n=30+28) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.47GB/s ± 2% 1.70GB/s ± 2% +15.96% (p=0.000 n=27+26) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 939MB/s ± 5% 1084MB/s ± 4% +15.36% (p=0.000 n=28+27) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 340MB/s ± 2% 347MB/s ± 3% +1.93% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.06GB/s ± 3% 3.40GB/s ± 2% +11.13% (p=0.000 n=30+28) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 2.89GB/s ± 3% 3.24GB/s ± 2% +12.20% (p=0.000 n=29+26) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 1.93GB/s ± 4% 2.09GB/s ±11% +8.16% (p=0.000 n=26+30) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 746MB/s ± 2% 762MB/s ± 2% +2.11% (p=0.000 n=30+28) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.59GB/s ± 2% 1.62GB/s ± 2% +1.72% (p=0.000 n=25+27) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.49GB/s ± 2% 1.53GB/s ± 2% +2.62% (p=0.000 n=27+29) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 852MB/s ±10% 909MB/s ± 6% +6.71% (p=0.000 n=30+29) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 283MB/s ± 3% 283MB/s ± 2% ~ (p=0.617 n=30+27) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 891MB/s ± 2% 1083MB/s ± 2% +21.64% (p=0.000 n=27+24) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 855MB/s ± 2% 1045MB/s ± 1% +22.31% (p=0.000 n=25+23) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 568MB/s ± 7% 659MB/s ± 8% +16.04% (p=0.000 n=29+30) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 207MB/s ± 2% 212MB/s ± 2% +2.31% (p=0.000 n=30+27) ``` - arcadia-rome ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 2.16GB/s ± 2% 2.27GB/s ± 2% +5.13% (p=0.000 n=26+30) BM_LIBC_Bcmp_Fleet_L2 2.15GB/s ± 2% 2.25GB/s ± 2% +4.64% (p=0.000 n=27+30) BM_LIBC_Bcmp_Fleet_LLC 1.73GB/s ± 3% 1.81GB/s ± 3% +4.66% (p=0.000 n=25+28) BM_LIBC_Bcmp_Fleet_Cold 494MB/s ± 1% 496MB/s ± 2% +0.45% (p=0.023 n=22+24) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.30GB/s ± 1% 3.24GB/s ± 2% -1.70% (p=0.000 n=27+30) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 3.23GB/s ± 2% 3.19GB/s ± 2% -1.28% (p=0.000 n=28+28) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 2.59GB/s ± 3% 2.58GB/s ± 2% -0.65% (p=0.010 n=26+26) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 720MB/s ± 1% 707MB/s ± 3% -1.75% (p=0.000 n=22+25) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.37GB/s ± 1% 3.36GB/s ± 2% ~ (p=0.102 n=28+29) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.32GB/s ± 2% 3.30GB/s ± 2% -0.51% (p=0.038 n=28+29) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 2.67GB/s ± 4% 2.70GB/s ± 4% +0.96% (p=0.009 n=28+27) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 755MB/s ± 1% 751MB/s ± 2% -0.57% (p=0.000 n=22+25) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.79GB/s ± 1% 1.86GB/s ± 2% +3.92% (p=0.000 n=27+29) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.77GB/s ± 2% 1.82GB/s ± 2% +2.99% (p=0.000 n=28+29) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.41GB/s ± 4% 1.47GB/s ± 3% +3.97% (p=0.000 n=28+28) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 386MB/s ± 1% 389MB/s ± 1% +0.60% (p=0.000 n=21+23) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 2.07GB/s ± 2% 2.17GB/s ± 2% +4.87% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 2.07GB/s ± 2% 2.13GB/s ± 2% +3.02% (p=0.000 n=28+30) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.66GB/s ± 2% 1.73GB/s ± 2% +4.08% (p=0.000 n=29+26) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 466MB/s ± 2% 469MB/s ± 3% +0.66% (p=0.001 n=22+25) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 861MB/s ± 1% 964MB/s ± 2% +11.98% (p=0.000 n=29+29) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 853MB/s ± 2% 935MB/s ± 2% +9.54% (p=0.000 n=28+29) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 707MB/s ± 3% 743MB/s ± 4% +5.08% (p=0.000 n=29+29) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 199MB/s ± 3% 199MB/s ± 2% ~ (p=0.107 n=29+25) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.65GB/s ± 1% 1.75GB/s ± 2% +6.15% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.64GB/s ± 3% 1.73GB/s ± 2% +5.37% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 1.32GB/s ± 2% 1.40GB/s ± 2% +6.21% (p=0.000 n=28+27) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 370MB/s ± 3% 371MB/s ± 2% +0.16% (p=0.008 n=29+25) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.25GB/s ± 2% 3.47GB/s ± 2% +6.74% (p=0.000 n=28+29) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 3.26GB/s ± 1% 3.44GB/s ± 1% +5.43% (p=0.000 n=28+29) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 2.66GB/s ± 2% 2.79GB/s ± 3% +4.90% (p=0.000 n=27+29) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 812MB/s ± 3% 799MB/s ± 2% -1.57% (p=0.000 n=29+25) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.71GB/s ± 2% 1.66GB/s ± 2% -3.14% (p=0.000 n=29+29) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.63GB/s ± 2% 1.59GB/s ± 2% -2.50% (p=0.000 n=29+28) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 1.25GB/s ± 4% 1.25GB/s ± 2% ~ (p=0.530 n=28+26) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 311MB/s ± 3% 308MB/s ± 1% ~ (p=0.127 n=29+24) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 869MB/s ± 2% 1098MB/s ± 2% +26.28% (p=0.000 n=27+29) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 873MB/s ± 2% 1075MB/s ± 1% +23.06% (p=0.000 n=27+29) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 743MB/s ± 4% 859MB/s ± 4% +15.58% (p=0.000 n=27+27) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 221MB/s ± 4% 221MB/s ± 3% +0.14% (p=0.034 n=29+25) ``` - ixion-haswell ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 2.27GB/s ± 5% 2.41GB/s ± 6% +6.10% (p=0.000 n=29+28) BM_LIBC_Bcmp_Fleet_L2 2.14GB/s ± 6% 2.33GB/s ± 5% +9.21% (p=0.000 n=29+30) BM_LIBC_Bcmp_Fleet_LLC 1.30GB/s ± 9% 1.43GB/s ± 8% +9.85% (p=0.000 n=30+30) BM_LIBC_Bcmp_Fleet_Cold 475MB/s ± 6% 475MB/s ± 5% ~ (p=0.839 n=30+29) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.38GB/s ± 7% 3.46GB/s ± 6% +2.35% (p=0.009 n=30+29) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 3.20GB/s ± 5% 3.32GB/s ± 6% +3.52% (p=0.000 n=28+30) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 1.88GB/s ± 9% 2.00GB/s ± 6% +6.63% (p=0.000 n=30+28) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 664MB/s ± 6% 655MB/s ± 6% -1.32% (p=0.025 n=30+30) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.50GB/s ± 8% 3.61GB/s ±10% +3.09% (p=0.001 n=29+30) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.32GB/s ± 7% 3.48GB/s ± 8% +4.89% (p=0.000 n=29+30) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 2.02GB/s ± 7% 2.14GB/s ± 9% +5.82% (p=0.000 n=28+29) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 716MB/s ± 6% 709MB/s ± 5% -0.97% (p=0.040 n=30+28) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.83GB/s ± 7% 1.97GB/s ± 8% +7.90% (p=0.000 n=30+30) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.74GB/s ± 6% 1.92GB/s ± 6% +10.29% (p=0.000 n=30+29) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.05GB/s ± 9% 1.15GB/s ± 9% +9.73% (p=0.000 n=30+30) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 379MB/s ± 6% 372MB/s ± 6% -1.74% (p=0.012 n=30+30) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 2.17GB/s ± 5% 2.29GB/s ± 6% +5.61% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 2.02GB/s ± 6% 2.20GB/s ± 6% +8.75% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.22GB/s ± 8% 1.34GB/s ± 9% +9.19% (p=0.000 n=30+30) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 447MB/s ± 3% 441MB/s ± 7% -1.40% (p=0.033 n=30+30) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 902MB/s ± 6% 995MB/s ±10% +10.37% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 863MB/s ± 5% 945MB/s ±11% +9.50% (p=0.000 n=29+30) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 528MB/s ±11% 559MB/s ±12% +5.75% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 183MB/s ± 4% 181MB/s ± 7% ~ (p=0.088 n=28+30) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.70GB/s ± 6% 1.87GB/s ± 8% +10.14% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.60GB/s ± 5% 1.80GB/s ± 9% +12.61% (p=0.000 n=29+30) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 994MB/s ±13% 1094MB/s ± 8% +10.10% (p=0.000 n=29+30) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 362MB/s ± 6% 358MB/s ± 7% ~ (p=0.123 n=30+30) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.31GB/s ± 5% 3.67GB/s ± 6% +10.90% (p=0.000 n=28+30) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 3.11GB/s ± 5% 3.53GB/s ± 5% +13.59% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 1.98GB/s ± 9% 2.18GB/s ± 8% +10.34% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 754MB/s ± 5% 752MB/s ± 5% ~ (p=0.592 n=30+30) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.72GB/s ± 5% 1.72GB/s ± 6% ~ (p=0.549 n=29+29) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.61GB/s ± 7% 1.63GB/s ± 8% ~ (p=0.191 n=30+29) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 913MB/s ± 8% 905MB/s ± 9% ~ (p=0.423 n=30+30) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 304MB/s ± 6% 287MB/s ± 4% -5.57% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 961MB/s ± 5% 1124MB/s ± 6% +16.94% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 915MB/s ± 8% 1100MB/s ± 7% +20.16% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 593MB/s ± 8% 669MB/s ± 8% +12.92% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 220MB/s ± 4% 220MB/s ± 6% ~ (p=0.572 n=30+30) ``` Co-authored-by: goldvitaly@google.com <%username%@google.com>
2024-08-29[libc][x86] Use prefetch for write for memcpy (#90450)Guillaume Chatelet
Currently when `LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING` is set we prefetch memory for read on the source buffer. This patch adds prefetch for write on the destination buffer.
2024-07-12[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)Petr Hosek
This is a part of #97655.
2024-07-12Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace ↵Mehdi Amini
declaration" (#98593) Reverts llvm/llvm-project#98075 bots are broken
2024-07-11[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075)Petr Hosek
This is a part of #97655.
2024-05-31[libc][NFC] Allow compilation of `memcpy` with `-m32` (#93790)Guillaume Chatelet
Needed to support i386 (#93709).
2024-05-14[libc][bug] Fix out of bound write in memcpy w/ software prefetching (#90591)Guillaume Chatelet
This patch adds tests for `memcpy` and `memset` making sure that we don't access buffers out of bounds. It relies on POSIX `mmap` / `mprotect` and works only when FULL_BUILD_MODE is disabled. The bug showed up while enabling software prefetching. `loop_and_tail_offset` is always running at least one iteration but in some configurations loop unrolled prefetching was actually needing only the tail operation and no loop iterations at all.
2024-03-27[libc] Remove obsolete LIBC_HAS_BUILTIN macro (#86554)Marc Auberer
Fixes #86546 and removes the macro `LIBC_HAS_BUILTIN`. This was necessary to support older compilers that did not support `__has_builtin`. All of the compilers we support already have this builtin. See: https://libc.llvm.org/compiler_support.html All uses now use `__has_builtin` directly cc @nickdesaulniers
2024-03-09[libc] Provide `LIBC_TYPES_HAS_INT64` (#83441)Guillaume Chatelet
Umbrella bug #83182
2024-03-05[libc] suppress readability-identifier-naming for std::numeric_limits ↵Nick Desaulniers
interfaces (#83921) These templates are made to match the ergonomics of std::numeric_limits. Because our style for constexpr variables is ALL_CAPS, we must silence the linter for these manually. Link: https://clang.llvm.org/extra/clang-tidy/#suppressing-undesired-diagnostics
2024-03-05[libc] fix readability-identifier-naming in memory_utils/utils.h (#83919)Nick Desaulniers
Fixes: libc/src/string/memory_utils/utils.h:345:13: warning: invalid case style for member 'offset_' [readability-identifier-naming] Having a trailing underscore for members is a google3 style, not LLVM style. Removing the underscore is insufficient, as we would then have 2 members with the same identifier which is not allowed (it is a compile time error). Remove the getter, and just access the renamed member that's now made public.
2024-03-05[libc] fix more readability-identifier-naming lints (#83914)Nick Desaulniers
Found via: $ ninja -k2000 libc-lint 2>&1 | grep readability-identifier-naming Auto fixed via: $ clang-tidy -p build/compile_commands.json \ -checks="-*,readability-identifier-naming" \ <filename> --fix This doesn't fix all instances, just the obvious simple cases where it makes sense to change the identifier names. Subsequent PRs will fix up the stragglers.
2024-02-28[libc] fix typo introduced in inline_bcmp_byte_per_byte (#83356)Nick Desaulniers
My global find+replace was overzealous and broke post submit unit tests. Link: #83345
2024-02-28[libc] fix readability-identifier-naming.ConstexprFunctionCase (#83345)Nick Desaulniers
Codify that we use lower_case for readability-identifier-naming.ConstexprFunctionCase and then fix the 11 violations (rather than codify UPPER_CASE and have to fix the 170 violations).
2024-02-28[libc] fix clang-tidy llvm-header-guard warnings (#82679)Nick Desaulniers
Towards the goal of getting `ninja libc-lint` back to green, fix the numerous instances of: warning: header guard does not follow preferred style [llvm-header-guard] This is because many of our header guards start with `__LLVM` rather than `LLVM`. To filter just these warnings: $ ninja -k2000 libc-lint 2>&1 | grep llvm-header-guard To automatically apply fixits: $ find libc/src libc/include libc/test -name \*.h | \ xargs -n1 -I {} clang-tidy {} -p build/compile_commands.json \ -checks='-*,llvm-header-guard' --fix --quiet Some manual cleanup is still necessary as headers that were missing header guards outright will have them inserted before the license block (we prefer them after).
2024-01-18[libc][NFC] Selectively disable GCC warnings (#78462)Guillaume Chatelet
2024-01-11[libc][NFC] Use 16-byte indices for _mmXXX_shuffle_epi8 (#77781)Guillaume Chatelet
This is less confusing since the implementation only cares about the 4 lower bits.
2024-01-11[libc] Fix buggy AVX2 / AVX512 `memcmp` (#77081)Guillaume Chatelet
Fixes #77080.