summaryrefslogtreecommitdiff
path: root/libc/src/string/memory_utils
AgeCommit message (Collapse)Author
2025-06-26[libc] Improve memcpy for ARM Cortex-M supporting unaligned accesses. (#144872)Guillaume Chatelet
This implementation has been compiled with the [pigweed toolchain](https://pigweed.dev/toolchain.html) and tested on: - Raspberry Pi Pico 2 with the following options\ `--target=armv8m.main-none-eabi` `-march=armv8m.main+fp+dsp` `-mcpu=cortex-m33` - Raspberry Pi Pico with the following options\ `--target=armv6m-none-eabi` `-march=armv6m` `-mcpu=cortex-m0+` They both compile down to a little bit more than 200 bytes and are between 2 and 10 times faster than byte per byte copies. For best performance the following options can be set in the `libc/config/baremetal/arm/config.json` ``` { "codegen": { "LIBC_CONF_KEEP_FRAME_POINTER": { "value": false } }, "general": { "LIBC_ADD_NULL_CHECKS": { "value": false } } } ```
2025-05-02[libc] Add support for string/memory_utils functions for AArch64 without HW ↵William
FP/SIMD (#137592) Add conditional compilation to add support for AArch64 without vector registers and/or hardware FPUs by using the generic implementation. **Context:** A few functions were hard-coded to use vector registers/hardware FPUs. This meant that libc would not compile on architectures that did not support these features. This fix falls back on the generic implementation if a feature is not supported.
2025-03-14[libc] Fix memmove macros for unreocognized targetsJoseph Huber
2025-03-14[libc] Default to `byte_per_byte` instead of erroring (#131340)Joseph Huber
Summary: Right now a lot of the memory functions error if we don't have specific handling for them. This is weird because we have a generic implementation that should just be used whenever someone hasn't written a more optimized version. This allows us to use the `libc` headers with more architectures from the `shared/` directory without worrying about it breaking.
2025-03-10[libc] Add `-Wno-sign-conversion` & re-attempt `-Wconversion` (#129811)Vinay Deshmukh
Relates to https://github.com/llvm/llvm-project/issues/119281#issuecomment-2699470459
2025-03-05Revert "[libc] Enable -Wconversion for tests. (#127523)"Augie Fackler
This reverts commit 1e6e845d49a336e9da7ca6c576ec45c0b419b5f6 because it changed the 1st parameter of adjust() to be unsigned, but libc itself calls adjust() with a negative argument in align_backward() in op_generic.h.
2025-03-04[libc] Fix casts for arm32 after Wconversion (#129771)Michael Jones
Followup to #127523 There were some test failures on arm32 after enabling Wconversion. There were some tests that were failing due to missing casts. Also I changed BigInt's `safe_get_at` back to being signed since it needed the ability to be negative.
2025-03-04[libc] Enable -Wconversion for tests. (#127523)Vinay Deshmukh
Relates to: #119281
2025-02-05[libc] Fix all imports of src/string/memory_utils (#114939)Krishna Pandey
Fixed imports for all files *within* `libc/src/string/memory_utils`. Note: This doesn't include **all** files that need to be fixed. Fixes #86579
2024-11-25[libc] suppress string warning in case intrinsics are defined as macros ↵Schrodinger ZHU Yifan
(#117640)
2024-11-13[libc] Rename libc/src/__support/endian.h to endian_internal.h (#115950)Daniel Thornburgh
This prevents a conflict with the Linux system endian.h when built in overlay mode for CPP files in __support. This issue appeared in PR #106259.
2024-10-22[libc][x86] copy one cache line at a time to prevent the use of `rep;movsb` ↵Guillaume Chatelet
(#113161) When using `-mprefer-vector-width=128` with `-march=sandybridge` copying 3 cache lines in one go (192B) gets converted into `rep;movsb` which translate into a 60% hit in performance. Consecutive calls to `__builtin_memcpy_inline` (implementation behind `builtin::Memcpy::block_offset`) are not coalesced by the compiler and so calling it three times in a row generates the desired assembly. It only differs in the interleaving of the loads and stores and does not affect performance. This is needed to reland https://github.com/llvm/llvm-project/pull/108939.
2024-10-06[libc] Clean up some include in `libc`. (#110980)c8ef
The patch primarily cleans up some incorrect includes. The `LIBC_INLINE` macro is defined in `attributes.h`, not `config.h`. There appears to be no need to change the CMake and Bazel build files.
2024-09-06[libc] Implement branchless head-tail comparison for bcmp (#107540)Vitaly Goldshteyn
Binary size changes: | Bytes (cache lines) | before | after | |---------------------|----------|---------| | sse4 | 419 (7) | 288 (5) | | avx | 430 (7) | 308 (5) | | avx512f | 589 (10) | 390 (7) | Benchmarks for different CPUs using https://github.com/google/fleetbench. - indus-cascadelake ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 1.96GB/s ± 1% 2.19GB/s ± 0% +11.49% (p=0.000 n=29+24) BM_LIBC_Bcmp_Fleet_L2 1.90GB/s ± 1% 2.14GB/s ± 1% +12.68% (p=0.000 n=29+24) BM_LIBC_Bcmp_Fleet_LLC 513MB/s ± 4% 531MB/s ± 4% +3.53% (p=0.000 n=24+24) BM_LIBC_Bcmp_Fleet_Cold 452MB/s ± 3% 456MB/s ± 4% ~ (p=0.103 n=30+30) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 2.98GB/s ± 1% 3.15GB/s ± 1% +5.59% (p=0.000 n=29+30) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 2.86GB/s ± 1% 3.07GB/s ± 1% +7.21% (p=0.000 n=29+30) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 738MB/s ± 7% 751MB/s ± 3% +1.68% (p=0.000 n=24+25) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 643MB/s ± 3% 642MB/s ± 4% ~ (p=0.522 n=29+30) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.08GB/s ± 0% 3.25GB/s ± 0% +5.35% (p=0.000 n=28+30) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 2.97GB/s ± 1% 3.17GB/s ± 1% +6.65% (p=0.000 n=29+30) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 901MB/s ±59% 871MB/s ±36% ~ (p=0.676 n=29+27) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 686MB/s ± 4% 686MB/s ± 3% ~ (p=0.934 n=29+30) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.63GB/s ± 0% 1.80GB/s ± 1% +10.19% (p=0.000 n=29+30) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.57GB/s ± 1% 1.75GB/s ± 1% +11.46% (p=0.000 n=29+30) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 451MB/s ±61% 427MB/s ±28% ~ (p=0.469 n=29+25) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 353MB/s ± 4% 354MB/s ± 5% ~ (p=0.467 n=30+30) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 1.91GB/s ± 1% 2.10GB/s ± 1% +9.90% (p=0.000 n=29+29) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 1.84GB/s ± 1% 2.03GB/s ± 1% +10.63% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 491MB/s ±24% 538MB/s ±24% +9.66% (p=0.000 n=24+27) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 417MB/s ± 4% 421MB/s ± 3% ~ (p=0.063 n=30+29) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 761MB/s ± 1% 867MB/s ± 1% +14.02% (p=0.000 n=28+30) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 748MB/s ± 1% 860MB/s ± 1% +15.04% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 227MB/s ±29% 260MB/s ±64% +14.70% (p=0.000 n=26+27) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 187MB/s ± 3% 191MB/s ± 5% +2.26% (p=0.000 n=30+30) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.48GB/s ± 1% 1.71GB/s ± 1% +15.26% (p=0.000 n=29+30) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.42GB/s ± 1% 1.67GB/s ± 1% +17.68% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 412MB/s ±34% 519MB/s ±80% +25.87% (p=0.000 n=27+30) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 336MB/s ± 4% 343MB/s ± 6% +2.05% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 2.87GB/s ± 0% 3.24GB/s ± 1% +12.88% (p=0.000 n=26+30) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 2.78GB/s ± 1% 3.20GB/s ± 1% +15.15% (p=0.000 n=26+30) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 926MB/s ±43% 1227MB/s ±76% +32.53% (p=0.000 n=27+30) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 716MB/s ± 4% 737MB/s ± 6% +3.02% (p=0.000 n=28+29) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.54GB/s ± 1% 1.56GB/s ± 0% +1.40% (p=0.000 n=29+30) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.47GB/s ± 1% 1.52GB/s ± 1% +2.97% (p=0.000 n=27+30) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 351MB/s ±23% 436MB/s ±83% +24.04% (p=0.005 n=24+29) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 283MB/s ± 4% 282MB/s ± 4% ~ (p=0.644 n=30+30) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 824MB/s ± 1% 1048MB/s ± 1% +27.18% (p=0.000 n=29+30) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 808MB/s ± 1% 1027MB/s ± 1% +27.12% (p=0.000 n=29+29) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 317MB/s ±79% 332MB/s ±74% ~ (p=0.338 n=30+29) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 207MB/s ± 5% 212MB/s ± 5% +2.27% (p=0.000 n=30+30) ``` - indus-skylake ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 2.06GB/s ± 2% 2.25GB/s ± 3% +9.66% (p=0.000 n=27+24) BM_LIBC_Bcmp_Fleet_L2 1.96GB/s ± 2% 2.17GB/s ± 2% +10.61% (p=0.000 n=30+24) BM_LIBC_Bcmp_Fleet_LLC 1.18GB/s ± 6% 1.32GB/s ± 5% +12.27% (p=0.000 n=28+28) BM_LIBC_Bcmp_Fleet_Cold 456MB/s ± 2% 466MB/s ± 2% +2.22% (p=0.000 n=28+28) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.08GB/s ± 2% 3.20GB/s ± 1% +3.72% (p=0.000 n=28+22) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 2.92GB/s ± 1% 3.05GB/s ± 2% +4.49% (p=0.000 n=23+23) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 1.83GB/s ± 8% 1.94GB/s ± 4% +6.24% (p=0.000 n=25+27) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 654MB/s ± 2% 659MB/s ± 2% +0.76% (p=0.012 n=30+29) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.19GB/s ± 2% 3.34GB/s ± 2% +4.41% (p=0.000 n=26+23) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.05GB/s ± 2% 3.21GB/s ± 2% +5.32% (p=0.000 n=28+25) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 1.95GB/s ± 4% 2.03GB/s ±10% +3.61% (p=0.000 n=27+30) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 700MB/s ± 2% 702MB/s ± 2% ~ (p=0.150 n=30+30) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.69GB/s ± 2% 1.85GB/s ± 1% +9.31% (p=0.000 n=30+26) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.60GB/s ± 2% 1.78GB/s ± 2% +10.90% (p=0.000 n=26+27) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.01GB/s ± 5% 1.12GB/s ± 5% +11.40% (p=0.000 n=27+28) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 355MB/s ± 3% 360MB/s ± 3% +1.46% (p=0.000 n=30+30) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 1.98GB/s ± 2% 2.15GB/s ± 2% +8.89% (p=0.000 n=29+27) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 1.87GB/s ± 3% 2.05GB/s ± 2% +10.06% (p=0.000 n=30+26) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.19GB/s ± 4% 1.31GB/s ± 6% +9.82% (p=0.000 n=27+29) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 424MB/s ± 3% 431MB/s ± 3% +1.58% (p=0.000 n=28+30) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 849MB/s ± 2% 949MB/s ± 2% +11.84% (p=0.000 n=27+28) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 815MB/s ± 3% 913MB/s ± 3% +12.06% (p=0.000 n=29+30) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 512MB/s ± 9% 571MB/s ± 7% +11.40% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 187MB/s ± 3% 192MB/s ± 2% +2.56% (p=0.000 n=30+28) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.55GB/s ± 2% 1.77GB/s ± 3% +13.93% (p=0.000 n=30+28) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.47GB/s ± 2% 1.70GB/s ± 2% +15.96% (p=0.000 n=27+26) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 939MB/s ± 5% 1084MB/s ± 4% +15.36% (p=0.000 n=28+27) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 340MB/s ± 2% 347MB/s ± 3% +1.93% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.06GB/s ± 3% 3.40GB/s ± 2% +11.13% (p=0.000 n=30+28) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 2.89GB/s ± 3% 3.24GB/s ± 2% +12.20% (p=0.000 n=29+26) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 1.93GB/s ± 4% 2.09GB/s ±11% +8.16% (p=0.000 n=26+30) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 746MB/s ± 2% 762MB/s ± 2% +2.11% (p=0.000 n=30+28) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.59GB/s ± 2% 1.62GB/s ± 2% +1.72% (p=0.000 n=25+27) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.49GB/s ± 2% 1.53GB/s ± 2% +2.62% (p=0.000 n=27+29) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 852MB/s ±10% 909MB/s ± 6% +6.71% (p=0.000 n=30+29) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 283MB/s ± 3% 283MB/s ± 2% ~ (p=0.617 n=30+27) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 891MB/s ± 2% 1083MB/s ± 2% +21.64% (p=0.000 n=27+24) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 855MB/s ± 2% 1045MB/s ± 1% +22.31% (p=0.000 n=25+23) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 568MB/s ± 7% 659MB/s ± 8% +16.04% (p=0.000 n=29+30) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 207MB/s ± 2% 212MB/s ± 2% +2.31% (p=0.000 n=30+27) ``` - arcadia-rome ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 2.16GB/s ± 2% 2.27GB/s ± 2% +5.13% (p=0.000 n=26+30) BM_LIBC_Bcmp_Fleet_L2 2.15GB/s ± 2% 2.25GB/s ± 2% +4.64% (p=0.000 n=27+30) BM_LIBC_Bcmp_Fleet_LLC 1.73GB/s ± 3% 1.81GB/s ± 3% +4.66% (p=0.000 n=25+28) BM_LIBC_Bcmp_Fleet_Cold 494MB/s ± 1% 496MB/s ± 2% +0.45% (p=0.023 n=22+24) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.30GB/s ± 1% 3.24GB/s ± 2% -1.70% (p=0.000 n=27+30) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 3.23GB/s ± 2% 3.19GB/s ± 2% -1.28% (p=0.000 n=28+28) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 2.59GB/s ± 3% 2.58GB/s ± 2% -0.65% (p=0.010 n=26+26) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 720MB/s ± 1% 707MB/s ± 3% -1.75% (p=0.000 n=22+25) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.37GB/s ± 1% 3.36GB/s ± 2% ~ (p=0.102 n=28+29) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.32GB/s ± 2% 3.30GB/s ± 2% -0.51% (p=0.038 n=28+29) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 2.67GB/s ± 4% 2.70GB/s ± 4% +0.96% (p=0.009 n=28+27) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 755MB/s ± 1% 751MB/s ± 2% -0.57% (p=0.000 n=22+25) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.79GB/s ± 1% 1.86GB/s ± 2% +3.92% (p=0.000 n=27+29) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.77GB/s ± 2% 1.82GB/s ± 2% +2.99% (p=0.000 n=28+29) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.41GB/s ± 4% 1.47GB/s ± 3% +3.97% (p=0.000 n=28+28) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 386MB/s ± 1% 389MB/s ± 1% +0.60% (p=0.000 n=21+23) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 2.07GB/s ± 2% 2.17GB/s ± 2% +4.87% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 2.07GB/s ± 2% 2.13GB/s ± 2% +3.02% (p=0.000 n=28+30) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.66GB/s ± 2% 1.73GB/s ± 2% +4.08% (p=0.000 n=29+26) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 466MB/s ± 2% 469MB/s ± 3% +0.66% (p=0.001 n=22+25) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 861MB/s ± 1% 964MB/s ± 2% +11.98% (p=0.000 n=29+29) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 853MB/s ± 2% 935MB/s ± 2% +9.54% (p=0.000 n=28+29) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 707MB/s ± 3% 743MB/s ± 4% +5.08% (p=0.000 n=29+29) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 199MB/s ± 3% 199MB/s ± 2% ~ (p=0.107 n=29+25) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.65GB/s ± 1% 1.75GB/s ± 2% +6.15% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.64GB/s ± 3% 1.73GB/s ± 2% +5.37% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 1.32GB/s ± 2% 1.40GB/s ± 2% +6.21% (p=0.000 n=28+27) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 370MB/s ± 3% 371MB/s ± 2% +0.16% (p=0.008 n=29+25) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.25GB/s ± 2% 3.47GB/s ± 2% +6.74% (p=0.000 n=28+29) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 3.26GB/s ± 1% 3.44GB/s ± 1% +5.43% (p=0.000 n=28+29) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 2.66GB/s ± 2% 2.79GB/s ± 3% +4.90% (p=0.000 n=27+29) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 812MB/s ± 3% 799MB/s ± 2% -1.57% (p=0.000 n=29+25) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.71GB/s ± 2% 1.66GB/s ± 2% -3.14% (p=0.000 n=29+29) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.63GB/s ± 2% 1.59GB/s ± 2% -2.50% (p=0.000 n=29+28) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 1.25GB/s ± 4% 1.25GB/s ± 2% ~ (p=0.530 n=28+26) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 311MB/s ± 3% 308MB/s ± 1% ~ (p=0.127 n=29+24) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 869MB/s ± 2% 1098MB/s ± 2% +26.28% (p=0.000 n=27+29) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 873MB/s ± 2% 1075MB/s ± 1% +23.06% (p=0.000 n=27+29) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 743MB/s ± 4% 859MB/s ± 4% +15.58% (p=0.000 n=27+27) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 221MB/s ± 4% 221MB/s ± 3% +0.14% (p=0.034 n=29+25) ``` - ixion-haswell ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 2.27GB/s ± 5% 2.41GB/s ± 6% +6.10% (p=0.000 n=29+28) BM_LIBC_Bcmp_Fleet_L2 2.14GB/s ± 6% 2.33GB/s ± 5% +9.21% (p=0.000 n=29+30) BM_LIBC_Bcmp_Fleet_LLC 1.30GB/s ± 9% 1.43GB/s ± 8% +9.85% (p=0.000 n=30+30) BM_LIBC_Bcmp_Fleet_Cold 475MB/s ± 6% 475MB/s ± 5% ~ (p=0.839 n=30+29) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.38GB/s ± 7% 3.46GB/s ± 6% +2.35% (p=0.009 n=30+29) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 3.20GB/s ± 5% 3.32GB/s ± 6% +3.52% (p=0.000 n=28+30) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 1.88GB/s ± 9% 2.00GB/s ± 6% +6.63% (p=0.000 n=30+28) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 664MB/s ± 6% 655MB/s ± 6% -1.32% (p=0.025 n=30+30) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.50GB/s ± 8% 3.61GB/s ±10% +3.09% (p=0.001 n=29+30) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.32GB/s ± 7% 3.48GB/s ± 8% +4.89% (p=0.000 n=29+30) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 2.02GB/s ± 7% 2.14GB/s ± 9% +5.82% (p=0.000 n=28+29) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 716MB/s ± 6% 709MB/s ± 5% -0.97% (p=0.040 n=30+28) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.83GB/s ± 7% 1.97GB/s ± 8% +7.90% (p=0.000 n=30+30) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.74GB/s ± 6% 1.92GB/s ± 6% +10.29% (p=0.000 n=30+29) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.05GB/s ± 9% 1.15GB/s ± 9% +9.73% (p=0.000 n=30+30) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 379MB/s ± 6% 372MB/s ± 6% -1.74% (p=0.012 n=30+30) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 2.17GB/s ± 5% 2.29GB/s ± 6% +5.61% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 2.02GB/s ± 6% 2.20GB/s ± 6% +8.75% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.22GB/s ± 8% 1.34GB/s ± 9% +9.19% (p=0.000 n=30+30) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 447MB/s ± 3% 441MB/s ± 7% -1.40% (p=0.033 n=30+30) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 902MB/s ± 6% 995MB/s ±10% +10.37% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 863MB/s ± 5% 945MB/s ±11% +9.50% (p=0.000 n=29+30) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 528MB/s ±11% 559MB/s ±12% +5.75% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 183MB/s ± 4% 181MB/s ± 7% ~ (p=0.088 n=28+30) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.70GB/s ± 6% 1.87GB/s ± 8% +10.14% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.60GB/s ± 5% 1.80GB/s ± 9% +12.61% (p=0.000 n=29+30) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 994MB/s ±13% 1094MB/s ± 8% +10.10% (p=0.000 n=29+30) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 362MB/s ± 6% 358MB/s ± 7% ~ (p=0.123 n=30+30) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.31GB/s ± 5% 3.67GB/s ± 6% +10.90% (p=0.000 n=28+30) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 3.11GB/s ± 5% 3.53GB/s ± 5% +13.59% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 1.98GB/s ± 9% 2.18GB/s ± 8% +10.34% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 754MB/s ± 5% 752MB/s ± 5% ~ (p=0.592 n=30+30) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.72GB/s ± 5% 1.72GB/s ± 6% ~ (p=0.549 n=29+29) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.61GB/s ± 7% 1.63GB/s ± 8% ~ (p=0.191 n=30+29) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 913MB/s ± 8% 905MB/s ± 9% ~ (p=0.423 n=30+30) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 304MB/s ± 6% 287MB/s ± 4% -5.57% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 961MB/s ± 5% 1124MB/s ± 6% +16.94% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 915MB/s ± 8% 1100MB/s ± 7% +20.16% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 593MB/s ± 8% 669MB/s ± 8% +12.92% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 220MB/s ± 4% 220MB/s ± 6% ~ (p=0.572 n=30+30) ``` Co-authored-by: goldvitaly@google.com <%username%@google.com>
2024-08-29[libc][x86] Use prefetch for write for memcpy (#90450)Guillaume Chatelet
Currently when `LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING` is set we prefetch memory for read on the source buffer. This patch adds prefetch for write on the destination buffer.
2024-07-12[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)Petr Hosek
This is a part of #97655.
2024-07-12Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace ↵Mehdi Amini
declaration" (#98593) Reverts llvm/llvm-project#98075 bots are broken
2024-07-11[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075)Petr Hosek
This is a part of #97655.
2024-05-31[libc][NFC] Allow compilation of `memcpy` with `-m32` (#93790)Guillaume Chatelet
Needed to support i386 (#93709).
2024-05-14[libc][bug] Fix out of bound write in memcpy w/ software prefetching (#90591)Guillaume Chatelet
This patch adds tests for `memcpy` and `memset` making sure that we don't access buffers out of bounds. It relies on POSIX `mmap` / `mprotect` and works only when FULL_BUILD_MODE is disabled. The bug showed up while enabling software prefetching. `loop_and_tail_offset` is always running at least one iteration but in some configurations loop unrolled prefetching was actually needing only the tail operation and no loop iterations at all.
2024-03-27[libc] Remove obsolete LIBC_HAS_BUILTIN macro (#86554)Marc Auberer
Fixes #86546 and removes the macro `LIBC_HAS_BUILTIN`. This was necessary to support older compilers that did not support `__has_builtin`. All of the compilers we support already have this builtin. See: https://libc.llvm.org/compiler_support.html All uses now use `__has_builtin` directly cc @nickdesaulniers
2024-03-09[libc] Provide `LIBC_TYPES_HAS_INT64` (#83441)Guillaume Chatelet
Umbrella bug #83182
2024-03-05[libc] suppress readability-identifier-naming for std::numeric_limits ↵Nick Desaulniers
interfaces (#83921) These templates are made to match the ergonomics of std::numeric_limits. Because our style for constexpr variables is ALL_CAPS, we must silence the linter for these manually. Link: https://clang.llvm.org/extra/clang-tidy/#suppressing-undesired-diagnostics
2024-03-05[libc] fix readability-identifier-naming in memory_utils/utils.h (#83919)Nick Desaulniers
Fixes: libc/src/string/memory_utils/utils.h:345:13: warning: invalid case style for member 'offset_' [readability-identifier-naming] Having a trailing underscore for members is a google3 style, not LLVM style. Removing the underscore is insufficient, as we would then have 2 members with the same identifier which is not allowed (it is a compile time error). Remove the getter, and just access the renamed member that's now made public.
2024-03-05[libc] fix more readability-identifier-naming lints (#83914)Nick Desaulniers
Found via: $ ninja -k2000 libc-lint 2>&1 | grep readability-identifier-naming Auto fixed via: $ clang-tidy -p build/compile_commands.json \ -checks="-*,readability-identifier-naming" \ <filename> --fix This doesn't fix all instances, just the obvious simple cases where it makes sense to change the identifier names. Subsequent PRs will fix up the stragglers.
2024-02-28[libc] fix typo introduced in inline_bcmp_byte_per_byte (#83356)Nick Desaulniers
My global find+replace was overzealous and broke post submit unit tests. Link: #83345
2024-02-28[libc] fix readability-identifier-naming.ConstexprFunctionCase (#83345)Nick Desaulniers
Codify that we use lower_case for readability-identifier-naming.ConstexprFunctionCase and then fix the 11 violations (rather than codify UPPER_CASE and have to fix the 170 violations).
2024-02-28[libc] fix clang-tidy llvm-header-guard warnings (#82679)Nick Desaulniers
Towards the goal of getting `ninja libc-lint` back to green, fix the numerous instances of: warning: header guard does not follow preferred style [llvm-header-guard] This is because many of our header guards start with `__LLVM` rather than `LLVM`. To filter just these warnings: $ ninja -k2000 libc-lint 2>&1 | grep llvm-header-guard To automatically apply fixits: $ find libc/src libc/include libc/test -name \*.h | \ xargs -n1 -I {} clang-tidy {} -p build/compile_commands.json \ -checks='-*,llvm-header-guard' --fix --quiet Some manual cleanup is still necessary as headers that were missing header guards outright will have them inserted before the license block (we prefer them after).
2024-01-18[libc][NFC] Selectively disable GCC warnings (#78462)Guillaume Chatelet
2024-01-11[libc][NFC] Use 16-byte indices for _mmXXX_shuffle_epi8 (#77781)Guillaume Chatelet
This is less confusing since the implementation only cares about the 4 lower bits.
2024-01-11[libc] Fix buggy AVX2 / AVX512 `memcmp` (#77081)Guillaume Chatelet
Fixes #77080.
2024-01-08[libc] fix up #77384Nick Desaulniers
2024-01-08[libc] fix -Wconversion (#77384)Nick Desaulniers
Fixes the following from GCC: llvm-project/libc/src/string/memory_utils/op_x86.h:236:24: error: conversion from ‘long unsigned int’ to ‘uint32_t’ {aka ‘unsigned int’} may change value [-Werror=conversion] 236 | return (xored >> 32) | (xored & 0xFFFFFFFF); | ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~ Link: https://lab.llvm.org/buildbot/#/builders/250/builds/16236/steps/8/logs/stdio Link: https://github.com/llvm/llvm-project/pull/74506
2024-01-05[libc] fix -Warray-bounds in block_offset (#77001)Nick Desaulniers
GCC reports an instance of -Warray-bounds in block_offset. Reimplement block_offset in terms of memcpy_inline which was created to avoid this diagnostic. See the linked issue for the full trace of diagnostic. Fixes: https://github.com/llvm/llvm-project/issues/76877
2023-12-19[libc] Remove unnecessary call in memfunction dispatchers (#75800)Guillaume Chatelet
Before this patch the compiler could generate unnecessary calls to the selected implementation. https://clang.llvm.org/docs/AttributeReference.html#flatten
2023-12-05[reland][libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h ↵Guillaume Chatelet
instead (#73939) (#74446) Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h` that was still using `deferred_static_assert`.
2023-12-05Revert "[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h ↵Guillaume Chatelet
instead" (#74444) Reverts llvm/llvm-project#73939 This broke libc-aarch64-ubuntu build bot https://lab.llvm.org/buildbot/#/builders/138/builds/56186
2023-12-05[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead (#73939)Guillaume Chatelet
2023-12-04[libc] Fix UB in memory utils (#74295)Guillaume Chatelet
The [standard](https://eel.is/c++draft/expr.add#4.3) forbids forming pointers to invalid objects even if the pointer is never read from or written to. This patch makes sure that we don't do pointer arithmetic on invalid pointers. Co-authored-by: Vitaly Buka <vitalybuka@google.com>
2023-11-10[libc] Adding a version of memset with software prefetching (#70857)doshimili
Software prefetching helps recover performance when hardware prefetching is disabled. The 'LIBC_COPT_MEMSET_X86_USE_SOFTWARE_PREFETCHING' compile time option allows users to use this patch.
2023-11-07[libc] Optimize mempcy size thresholds (#70049)Dmitry Vyukov
Adjust boundary conditions for sizes = 16/32/64. See the added comment for explanations. Results on a machine with AVX2, so sizes 64/128 affected: ``` │ baseline │ adjusted │ │ sec/op │ sec/op vs base │ memcpy/Google_A 5.701n ± 0% 5.551n ± 1% -2.63% (n=100) memcpy/Google_B 3.817n ± 0% 3.776n ± 0% -1.07% (p=0.000 n=100) memcpy/Google_D 11.35n ± 1% 11.32n ± 0% ~ (p=0.066 n=100) memcpy/Google_U 3.874n ± 1% 3.821n ± 1% -1.37% (p=0.001 n=100) memcpy/64 3.843n ± 0% 3.105n ± 3% -19.22% (n=50) memcpy/128 4.842n ± 0% 3.818n ± 0% -21.15% (p=0.000 n=50) ```
2023-11-06Fix load64_aligned (#71391)Guillaume Chatelet
Fix #64758 `load64_aligned` was missing a case for `alignment == 6`.
2023-10-26[libc] memmove optimizations (#70043)Dmitry Vyukov
1. Remove is_disjoint check for smaller sizes and reduce code bloat. inline_memmove may handle some small sizes as efficiently as inline_memcpy. For these sizes we may not do is_disjoint check. This both avoids additional code for the most frequent smaller sizes and removes code bloat (we don't need the memcpy logic for small sizes). Here we heavily rely on inlining and dead code elimination: from the first inline_memmove we should get only handling of small sizes, and from the second inline_memmove and inline_memcpy we should get only handling of larger sizes. 2. Use the memcpy thresholds for memmove. Memcpy thresholds were more carefully tuned. This becomes more important since we use memmove for all small sizes always now. 3. Fix boundary conditions for sizes = 16/32/64. See the added comment for explanations. Memmove function size drops from 885 to 715 bytes due to removed duplication. ``` │ baseline │ small-size │ │ sec/op │ sec/op vs base │ memmove/Google_A 3.208n ± 0% 2.911n ± 0% -9.25% (n=100) memmove/Google_B 4.113n ± 1% 3.428n ± 0% -16.65% (n=100) memmove/Google_D 5.838n ± 0% 4.158n ± 0% -28.78% (n=100) memmove/Google_S 4.712n ± 1% 3.899n ± 0% -17.25% (n=100) memmove/Google_U 3.609n ± 0% 3.247n ± 1% -10.02% (n=100) memmove/0 2.982n ± 0% 2.169n ± 0% -27.26% (n=50) memmove/1 3.253n ± 0% 2.168n ± 0% -33.34% (n=50) memmove/2 3.255n ± 0% 2.169n ± 0% -33.38% (n=50) memmove/3 3.259n ± 2% 2.175n ± 0% -33.27% (p=0.000 n=50) memmove/4 3.259n ± 0% 2.168n ± 5% -33.46% (p=0.000 n=50) memmove/5 2.488n ± 0% 1.926n ± 0% -22.57% (p=0.000 n=50) memmove/6 2.490n ± 0% 1.928n ± 0% -22.59% (p=0.000 n=50) memmove/7 2.492n ± 0% 1.927n ± 0% -22.65% (p=0.000 n=50) memmove/8 2.737n ± 0% 2.711n ± 0% -0.97% (p=0.000 n=50) memmove/9 2.736n ± 0% 2.711n ± 0% -0.94% (p=0.000 n=50) memmove/10 2.739n ± 0% 2.711n ± 0% -1.04% (p=0.000 n=50) memmove/11 2.740n ± 0% 2.711n ± 0% -1.07% (p=0.000 n=50) memmove/12 2.740n ± 0% 2.711n ± 0% -1.09% (p=0.000 n=50) memmove/13 2.744n ± 0% 2.711n ± 0% -1.22% (p=0.000 n=50) memmove/14 2.742n ± 0% 2.711n ± 0% -1.14% (p=0.000 n=50) memmove/15 2.742n ± 0% 2.711n ± 0% -1.15% (p=0.000 n=50) memmove/16 2.997n ± 0% 2.981n ± 0% -0.52% (p=0.000 n=50) memmove/17 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50) memmove/18 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50) memmove/19 2.999n ± 0% 2.982n ± 0% -0.59% (p=0.000 n=50) memmove/20 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50) memmove/21 3.000n ± 0% 2.981n ± 0% -0.61% (p=0.000 n=50) memmove/22 3.002n ± 0% 2.981n ± 0% -0.68% (p=0.000 n=50) memmove/23 3.002n ± 0% 2.981n ± 0% -0.67% (p=0.000 n=50) memmove/24 3.002n ± 0% 2.981n ± 0% -0.70% (n=50) memmove/25 3.002n ± 0% 2.981n ± 0% -0.68% (p=0.000 n=50) memmove/26 3.004n ± 0% 2.982n ± 0% -0.74% (p=0.000 n=50) memmove/27 3.005n ± 0% 2.981n ± 0% -0.79% (n=50) memmove/28 3.005n ± 0% 2.982n ± 0% -0.77% (n=50) memmove/29 3.009n ± 0% 2.981n ± 0% -0.92% (n=50) memmove/30 3.008n ± 0% 2.981n ± 0% -0.89% (n=50) memmove/31 3.007n ± 0% 2.982n ± 0% -0.86% (n=50) memmove/32 3.540n ± 0% 2.998n ± 0% -15.31% (p=0.000 n=50) memmove/33 3.544n ± 0% 2.997n ± 0% -15.44% (p=0.000 n=50) memmove/34 3.546n ± 0% 2.999n ± 0% -15.42% (n=50) memmove/35 3.545n ± 0% 2.999n ± 0% -15.40% (n=50) memmove/36 3.548n ± 0% 2.998n ± 0% -15.52% (p=0.000 n=50) memmove/37 3.546n ± 0% 3.000n ± 0% -15.41% (n=50) memmove/38 3.549n ± 0% 2.999n ± 0% -15.49% (p=0.000 n=50) memmove/39 3.549n ± 0% 2.999n ± 0% -15.48% (p=0.000 n=50) memmove/40 3.549n ± 0% 3.000n ± 0% -15.46% (p=0.000 n=50) memmove/41 3.550n ± 0% 3.001n ± 0% -15.47% (n=50) memmove/42 3.549n ± 0% 3.001n ± 0% -15.43% (n=50) memmove/43 3.552n ± 0% 3.001n ± 0% -15.52% (p=0.000 n=50) memmove/44 3.552n ± 0% 3.001n ± 0% -15.51% (n=50) memmove/45 3.552n ± 0% 3.002n ± 0% -15.48% (n=50) memmove/46 3.554n ± 0% 3.001n ± 0% -15.55% (p=0.000 n=50) memmove/47 3.556n ± 0% 3.002n ± 0% -15.58% (p=0.000 n=50) memmove/48 3.555n ± 0% 3.003n ± 0% -15.54% (n=50) memmove/49 3.557n ± 0% 3.002n ± 0% -15.59% (p=0.000 n=50) memmove/50 3.557n ± 0% 3.004n ± 0% -15.55% (p=0.000 n=50) memmove/51 3.556n ± 0% 3.004n ± 0% -15.53% (p=0.000 n=50) memmove/52 3.561n ± 0% 3.004n ± 0% -15.65% (p=0.000 n=50) memmove/53 3.558n ± 0% 3.004n ± 0% -15.57% (p=0.000 n=50) memmove/54 3.561n ± 0% 3.005n ± 0% -15.62% (n=50) memmove/55 3.560n ± 0% 3.006n ± 0% -15.57% (n=50) memmove/56 3.562n ± 0% 3.006n ± 0% -15.60% (p=0.000 n=50) memmove/57 3.563n ± 0% 3.006n ± 0% -15.64% (n=50) memmove/58 3.565n ± 0% 3.007n ± 0% -15.64% (p=0.000 n=50) memmove/59 3.564n ± 0% 3.006n ± 0% -15.66% (p=0.000 n=50) memmove/60 3.570n ± 0% 3.008n ± 0% -15.74% (p=0.000 n=50) memmove/61 3.566n ± 0% 3.009n ± 0% -15.63% (p=0.000 n=50) memmove/62 3.567n ± 0% 3.007n ± 0% -15.70% (p=0.000 n=50) memmove/63 3.568n ± 0% 3.008n ± 0% -15.71% (p=0.000 n=50) memmove/64 4.104n ± 0% 3.008n ± 0% -26.70% (p=0.000 n=50) memmove/65 4.126n ± 0% 3.662n ± 0% -11.26% (p=0.000 n=50) memmove/66 4.128n ± 0% 3.662n ± 0% -11.29% (n=50) memmove/67 4.129n ± 0% 3.662n ± 0% -11.31% (n=50) memmove/68 4.129n ± 0% 3.661n ± 0% -11.33% (p=0.000 n=50) memmove/69 4.130n ± 0% 3.662n ± 0% -11.34% (p=0.000 n=50) memmove/70 4.130n ± 0% 3.662n ± 0% -11.33% (n=50) memmove/71 4.132n ± 0% 3.662n ± 0% -11.38% (p=0.000 n=50) memmove/72 4.131n ± 0% 3.661n ± 0% -11.39% (n=50) memmove/73 4.135n ± 0% 3.661n ± 0% -11.45% (p=0.000 n=50) memmove/74 4.137n ± 0% 3.662n ± 0% -11.49% (n=50) memmove/75 4.138n ± 0% 3.662n ± 0% -11.51% (p=0.000 n=50) memmove/76 4.139n ± 0% 3.661n ± 0% -11.56% (p=0.000 n=50) memmove/77 4.136n ± 0% 3.662n ± 0% -11.47% (p=0.000 n=50) memmove/78 4.143n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50) memmove/79 4.142n ± 0% 3.661n ± 0% -11.60% (n=50) memmove/80 4.142n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50) memmove/81 4.140n ± 0% 3.661n ± 0% -11.57% (n=50) memmove/82 4.146n ± 0% 3.661n ± 0% -11.69% (n=50) memmove/83 4.143n ± 0% 3.661n ± 0% -11.63% (p=0.000 n=50) memmove/84 4.143n ± 0% 3.661n ± 0% -11.63% (n=50) memmove/85 4.147n ± 0% 3.661n ± 0% -11.73% (p=0.000 n=50) memmove/86 4.142n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50) memmove/87 4.147n ± 0% 3.661n ± 0% -11.72% (p=0.000 n=50) memmove/88 4.148n ± 0% 3.661n ± 0% -11.74% (n=50) memmove/89 4.152n ± 0% 3.661n ± 0% -11.84% (n=50) memmove/90 4.151n ± 0% 3.661n ± 0% -11.81% (n=50) memmove/91 4.150n ± 0% 3.661n ± 0% -11.78% (n=50) memmove/92 4.153n ± 0% 3.661n ± 0% -11.86% (n=50) memmove/93 4.158n ± 0% 3.661n ± 0% -11.95% (n=50) memmove/94 4.157n ± 0% 3.661n ± 0% -11.95% (p=0.000 n=50) memmove/95 4.155n ± 0% 3.661n ± 0% -11.90% (p=0.000 n=50) memmove/96 4.149n ± 0% 3.660n ± 0% -11.79% (n=50) memmove/97 4.157n ± 0% 3.661n ± 0% -11.94% (n=50) memmove/98 4.157n ± 0% 3.661n ± 0% -11.94% (n=50) memmove/99 4.168n ± 0% 3.661n ± 0% -12.17% (p=0.000 n=50) memmove/100 4.159n ± 0% 3.660n ± 0% -12.00% (p=0.000 n=50) memmove/101 4.161n ± 0% 3.660n ± 0% -12.03% (p=0.000 n=50) memmove/102 4.165n ± 0% 3.660n ± 0% -12.12% (p=0.000 n=50) memmove/103 4.164n ± 0% 3.661n ± 0% -12.08% (n=50) memmove/104 4.164n ± 0% 3.660n ± 0% -12.11% (n=50) memmove/105 4.165n ± 0% 3.660n ± 0% -12.12% (p=0.000 n=50) memmove/106 4.166n ± 0% 3.660n ± 0% -12.15% (n=50) memmove/107 4.171n ± 0% 3.660n ± 1% -12.26% (p=0.000 n=50) memmove/108 4.173n ± 0% 3.660n ± 0% -12.30% (p=0.000 n=50) memmove/109 4.170n ± 0% 3.660n ± 0% -12.24% (n=50) memmove/110 4.174n ± 0% 3.660n ± 0% -12.31% (n=50) memmove/111 4.176n ± 0% 3.660n ± 0% -12.35% (p=0.000 n=50) memmove/112 4.174n ± 0% 3.659n ± 0% -12.34% (p=0.000 n=50) memmove/113 4.176n ± 0% 3.660n ± 0% -12.35% (n=50) memmove/114 4.182n ± 0% 3.660n ± 0% -12.49% (n=50) memmove/115 4.185n ± 0% 3.660n ± 0% -12.55% (n=50) memmove/116 4.184n ± 0% 3.659n ± 0% -12.54% (n=50) memmove/117 4.182n ± 0% 3.660n ± 0% -12.50% (n=50) memmove/118 4.188n ± 0% 3.660n ± 0% -12.61% (n=50) memmove/119 4.186n ± 0% 3.660n ± 0% -12.57% (p=0.000 n=50) memmove/120 4.189n ± 0% 3.659n ± 0% -12.63% (n=50) memmove/121 4.187n ± 0% 3.660n ± 0% -12.60% (n=50) memmove/122 4.186n ± 0% 3.660n ± 0% -12.58% (n=50) memmove/123 4.187n ± 0% 3.660n ± 0% -12.60% (n=50) memmove/124 4.189n ± 0% 3.659n ± 0% -12.65% (n=50) memmove/125 4.195n ± 0% 3.659n ± 0% -12.78% (n=50) memmove/126 4.197n ± 0% 3.659n ± 0% -12.81% (n=50) memmove/127 4.194n ± 0% 3.659n ± 0% -12.75% (n=50) memmove/128 5.035n ± 0% 3.659n ± 0% -27.32% (n=50) memmove/129 5.127n ± 0% 5.164n ± 0% +0.73% (p=0.000 n=50) memmove/130 5.130n ± 0% 5.176n ± 0% +0.88% (p=0.000 n=50) memmove/131 5.127n ± 0% 5.180n ± 0% +1.05% (p=0.000 n=50) memmove/132 5.131n ± 0% 5.169n ± 0% +0.75% (p=0.000 n=50) memmove/133 5.137n ± 0% 5.179n ± 0% +0.81% (p=0.000 n=50) memmove/134 5.140n ± 0% 5.178n ± 0% +0.74% (p=0.000 n=50) memmove/135 5.141n ± 0% 5.187n ± 0% +0.88% (p=0.000 n=50) memmove/136 5.133n ± 0% 5.184n ± 0% +0.99% (p=0.000 n=50) memmove/137 5.148n ± 0% 5.186n ± 0% +0.73% (p=0.000 n=50) memmove/138 5.143n ± 0% 5.189n ± 0% +0.88% (p=0.000 n=50) memmove/139 5.142n ± 0% 5.192n ± 0% +0.97% (p=0.000 n=50) memmove/140 5.141n ± 0% 5.192n ± 0% +1.01% (p=0.000 n=50) memmove/141 5.155n ± 0% 5.188n ± 0% +0.64% (p=0.000 n=50) memmove/142 5.146n ± 0% 5.192n ± 0% +0.90% (p=0.000 n=50) memmove/143 5.142n ± 0% 5.203n ± 0% +1.19% (p=0.000 n=50) memmove/144 5.146n ± 0% 5.197n ± 0% +0.99% (p=0.000 n=50) memmove/145 5.146n ± 0% 5.196n ± 0% +0.97% (p=0.000 n=50) memmove/146 5.151n ± 0% 5.207n ± 0% +1.10% (p=0.000 n=50) memmove/147 5.151n ± 0% 5.205n ± 0% +1.06% (p=0.000 n=50) memmove/148 5.156n ± 0% 5.190n ± 0% +0.66% (p=0.000 n=50) memmove/149 5.158n ± 0% 5.212n ± 0% +1.04% (p=0.000 n=50) memmove/150 5.160n ± 0% 5.203n ± 0% +0.84% (p=0.000 n=50) memmove/151 5.167n ± 0% 5.210n ± 0% +0.83% (p=0.000 n=50) memmove/152 5.157n ± 0% 5.206n ± 0% +0.94% (p=0.000 n=50) memmove/153 5.170n ± 0% 5.211n ± 0% +0.80% (p=0.000 n=50) memmove/154 5.169n ± 0% 5.222n ± 0% +1.02% (p=0.000 n=50) memmove/155 5.171n ± 0% 5.215n ± 0% +0.87% (p=0.000 n=50) memmove/156 5.174n ± 0% 5.214n ± 0% +0.78% (p=0.000 n=50) memmove/157 5.171n ± 0% 5.218n ± 0% +0.92% (p=0.000 n=50) memmove/158 5.168n ± 0% 5.224n ± 0% +1.09% (p=0.000 n=50) memmove/159 5.179n ± 0% 5.218n ± 0% +0.76% (p=0.000 n=50) memmove/160 5.170n ± 0% 5.219n ± 0% +0.95% (p=0.000 n=50) memmove/161 5.187n ± 0% 5.220n ± 0% +0.64% (p=0.000 n=50) memmove/162 5.189n ± 0% 5.234n ± 0% +0.86% (p=0.000 n=50) memmove/163 5.199n ± 0% 5.250n ± 0% +0.99% (p=0.000 n=50) memmove/164 5.205n ± 0% 5.260n ± 0% +1.04% (p=0.000 n=50) memmove/165 5.208n ± 0% 5.261n ± 0% +1.01% (p=0.000 n=50) memmove/166 5.227n ± 0% 5.275n ± 0% +0.91% (p=0.000 n=50) memmove/167 5.233n ± 0% 5.281n ± 0% +0.92% (p=0.000 n=50) memmove/168 5.236n ± 0% 5.295n ± 0% +1.12% (p=0.000 n=50) memmove/169 5.256n ± 0% 5.297n ± 0% +0.79% (p=0.000 n=50) memmove/170 5.259n ± 0% 5.302n ± 0% +0.80% (p=0.000 n=50) memmove/171 5.269n ± 0% 5.321n ± 0% +0.97% (p=0.000 n=50) memmove/172 5.266n ± 0% 5.318n ± 0% +0.98% (p=0.000 n=50) memmove/173 5.272n ± 0% 5.330n ± 0% +1.09% (p=0.000 n=50) memmove/174 5.284n ± 0% 5.331n ± 0% +0.89% (p=0.000 n=50) memmove/175 5.284n ± 0% 5.322n ± 0% +0.72% (p=0.000 n=50) memmove/176 5.298n ± 0% 5.337n ± 0% +0.74% (p=0.000 n=50) memmove/177 5.282n ± 0% 5.338n ± 0% +1.04% (p=0.000 n=50) memmove/178 5.299n ± 0% 5.337n ± 0% +0.71% (p=0.000 n=50) memmove/179 5.296n ± 0% 5.343n ± 0% +0.88% (p=0.000 n=50) memmove/180 5.292n ± 0% 5.343n ± 0% +0.97% (p=0.000 n=50) memmove/181 5.303n ± 0% 5.335n ± 0% +0.60% (p=0.000 n=50) memmove/182 5.305n ± 0% 5.338n ± 0% +0.62% (p=0.000 n=50) memmove/183 5.298n ± 0% 5.329n ± 0% +0.59% (p=0.000 n=50) memmove/184 5.299n ± 0% 5.333n ± 0% +0.64% (p=0.000 n=50) memmove/185 5.291n ± 0% 5.330n ± 0% +0.73% (p=0.000 n=50) memmove/186 5.296n ± 0% 5.332n ± 0% +0.68% (p=0.000 n=50) memmove/187 5.297n ± 0% 5.320n ± 0% +0.44% (p=0.000 n=50) memmove/188 5.286n ± 0% 5.314n ± 0% +0.53% (p=0.000 n=50) memmove/189 5.293n ± 0% 5.318n ± 0% +0.46% (p=0.000 n=50) memmove/190 5.294n ± 0% 5.318n ± 0% +0.45% (p=0.000 n=50) memmove/191 5.292n ± 0% 5.314n ± 0% +0.40% (p=0.032 n=50) memmove/192 5.272n ± 0% 5.304n ± 0% +0.60% (p=0.000 n=50) memmove/193 5.279n ± 0% 5.310n ± 0% +0.57% (p=0.000 n=50) memmove/194 5.294n ± 0% 5.308n ± 0% +0.26% (p=0.018 n=50) memmove/195 5.302n ± 0% 5.311n ± 0% +0.18% (p=0.010 n=50) memmove/196 5.301n ± 0% 5.316n ± 0% +0.28% (p=0.023 n=50) memmove/197 5.302n ± 0% 5.327n ± 0% +0.47% (p=0.000 n=50) memmove/198 5.310n ± 0% 5.326n ± 0% +0.30% (p=0.003 n=50) memmove/199 5.303n ± 0% 5.319n ± 0% +0.30% (p=0.009 n=50) memmove/200 5.312n ± 0% 5.330n ± 0% +0.35% (p=0.001 n=50) memmove/201 5.307n ± 0% 5.333n ± 0% +0.50% (p=0.000 n=50) memmove/202 5.311n ± 0% 5.334n ± 0% +0.44% (p=0.000 n=50) memmove/203 5.313n ± 0% 5.335n ± 0% +0.41% (p=0.006 n=50) memmove/204 5.312n ± 0% 5.332n ± 0% +0.36% (p=0.002 n=50) memmove/205 5.318n ± 0% 5.345n ± 0% +0.50% (p=0.000 n=50) memmove/206 5.311n ± 0% 5.333n ± 0% +0.42% (p=0.002 n=50) memmove/207 5.310n ± 0% 5.338n ± 0% +0.52% (p=0.000 n=50) memmove/208 5.319n ± 0% 5.341n ± 0% +0.40% (p=0.004 n=50) memmove/209 5.330n ± 0% 5.346n ± 0% +0.30% (p=0.004 n=50) memmove/210 5.329n ± 0% 5.349n ± 0% +0.38% (p=0.002 n=50) memmove/211 5.318n ± 0% 5.340n ± 0% +0.41% (p=0.000 n=50) memmove/212 5.339n ± 0% 5.343n ± 0% ~ (p=0.396 n=50) memmove/213 5.329n ± 0% 5.343n ± 0% +0.25% (p=0.017 n=50) memmove/214 5.339n ± 0% 5.358n ± 0% +0.35% (p=0.035 n=50) memmove/215 5.342n ± 0% 5.346n ± 0% ~ (p=0.063 n=50) memmove/216 5.338n ± 0% 5.359n ± 0% +0.39% (p=0.002 n=50) memmove/217 5.341n ± 0% 5.362n ± 0% +0.39% (p=0.015 n=50) memmove/218 5.354n ± 0% 5.373n ± 0% +0.36% (p=0.041 n=50) memmove/219 5.352n ± 0% 5.362n ± 0% ~ (p=0.143 n=50) memmove/220 5.344n ± 0% 5.370n ± 0% +0.50% (p=0.001 n=50) memmove/221 5.345n ± 0% 5.373n ± 0% +0.53% (p=0.000 n=50) memmove/222 5.348n ± 0% 5.360n ± 0% +0.23% (p=0.014 n=50) memmove/223 5.354n ± 0% 5.377n ± 0% +0.43% (p=0.024 n=50) memmove/224 5.352n ± 0% 5.363n ± 0% ~ (p=0.052 n=50) memmove/225 5.372n ± 0% 5.380n ± 0% ~ (p=0.481 n=50) memmove/226 5.368n ± 0% 5.386n ± 0% +0.34% (p=0.004 n=50) memmove/227 5.386n ± 0% 5.402n ± 0% +0.29% (p=0.028 n=50) memmove/228 5.400n ± 0% 5.408n ± 0% ~ (p=0.174 n=50) memmove/229 5.423n ± 0% 5.427n ± 0% ~ (p=0.444 n=50) memmove/230 5.411n ± 0% 5.429n ± 0% +0.33% (p=0.020 n=50) memmove/231 5.420n ± 0% 5.433n ± 0% +0.24% (p=0.034 n=50) memmove/232 5.435n ± 0% 5.441n ± 0% ~ (p=0.235 n=50) memmove/233 5.446n ± 0% 5.462n ± 0% ~ (p=0.590 n=50) memmove/234 5.467n ± 0% 5.461n ± 0% ~ (p=0.921 n=50) memmove/235 5.472n ± 0% 5.478n ± 0% ~ (p=0.883 n=50) memmove/236 5.466n ± 0% 5.478n ± 0% ~ (p=0.324 n=50) memmove/237 5.471n ± 0% 5.489n ± 0% ~ (p=0.132 n=50) memmove/238 5.485n ± 0% 5.489n ± 0% ~ (p=0.460 n=50) memmove/239 5.484n ± 0% 5.488n ± 0% ~ (p=0.833 n=50) memmove/240 5.483n ± 0% 5.495n ± 0% ~ (p=0.095 n=50) memmove/241 5.498n ± 0% 5.514n ± 0% ~ (p=0.077 n=50) memmove/242 5.518n ± 0% 5.517n ± 0% ~ (p=0.481 n=50) memmove/243 5.514n ± 0% 5.511n ± 0% ~ (p=0.503 n=50) memmove/244 5.510n ± 0% 5.497n ± 0% -0.24% (p=0.038 n=50) memmove/245 5.516n ± 0% 5.505n ± 0% ~ (p=0.317 n=50) memmove/246 5.513n ± 1% 5.494n ± 0% ~ (p=0.147 n=50) memmove/247 5.518n ± 0% 5.499n ± 0% -0.36% (p=0.011 n=50) memmove/248 5.503n ± 0% 5.492n ± 0% ~ (p=0.267 n=50) memmove/249 5.498n ± 0% 5.497n ± 0% ~ (p=0.765 n=50) memmove/250 5.485n ± 0% 5.493n ± 0% ~ (p=0.348 n=50) memmove/251 5.503n ± 0% 5.482n ± 0% -0.37% (p=0.013 n=50) memmove/252 5.497n ± 0% 5.485n ± 0% ~ (p=0.077 n=50) memmove/253 5.489n ± 0% 5.496n ± 0% ~ (p=0.850 n=50) memmove/254 5.497n ± 0% 5.491n ± 0% ~ (p=0.548 n=50) memmove/255 5.484n ± 1% 5.494n ± 0% ~ (p=0.888 n=50) memmove/256 6.952n ± 0% 7.676n ± 0% +10.41% (p=0.000 n=50) geomean 4.406n 4.127n -6.33% ```
2023-10-24[libc] Speed up memmove overlapping check (#70017)Dmitry Vyukov
Use a check that requries fewer instructions and cheaper. Current code: ``` 1b704: 48 39 f7 cmp %rsi,%rdi 1b707: 48 89 f0 mov %rsi,%rax 1b70a: 48 0f 47 c7 cmova %rdi,%rax 1b70e: 48 89 f9 mov %rdi,%rcx 1b711: 48 0f 47 ce cmova %rsi,%rcx 1b715: 48 01 d1 add %rdx,%rcx 1b718: 48 39 c1 cmp %rax,%rcx ``` New code: ``` 1b704: 48 89 f8 mov %rdi,%rax 1b707: 48 29 f0 sub %rsi,%rax 1b70a: 48 89 c1 mov %rax,%rcx 1b70d: 48 f7 d9 neg %rcx 1b710: 48 0f 48 c8 cmovs %rax,%rcx 1b714: 48 39 d1 cmp %rdx,%rcx ``` ``` │ baseline │ disjoint │ │ sec/op │ sec/op vs base │ memmove/Google_A 3.910n ± 0% 3.861n ± 1% -1.26% (p=0.000 n=50) ``` ``` │ baseline │ disjoint │ │ sec/op │ sec/op vs base │ memmove/1 2.724n ± 3% 2.441n ± 0% -10.37% (n=50) memmove/2 2.878n ± 0% 2.713n ± 0% -5.73% (n=50) memmove/3 2.835n ± 0% 2.593n ± 0% -8.54% (n=50) memmove/4 3.032n ± 0% 2.776n ± 0% -8.45% (p=0.000 n=50) memmove/5 2.833n ± 0% 2.600n ± 0% -8.20% (p=0.000 n=50) memmove/6 2.758n ± 0% 2.744n ± 0% -0.52% (p=0.000 n=50) memmove/7 2.762n ± 0% 2.744n ± 0% -0.63% (p=0.000 n=50) memmove/8 2.763n ± 0% 2.750n ± 0% -0.46% (p=0.000 n=50) memmove/9 3.182n ± 0% 3.269n ± 0% +2.75% (p=0.000 n=50) memmove/10 3.185n ± 0% 3.270n ± 0% +2.64% (p=0.000 n=50) memmove/11 3.188n ± 0% 3.277n ± 0% +2.79% (p=0.000 n=50) memmove/12 3.190n ± 0% 3.279n ± 0% +2.82% (p=0.000 n=50) memmove/13 3.194n ± 0% 3.281n ± 0% +2.73% (p=0.000 n=50) memmove/14 3.197n ± 0% 3.285n ± 0% +2.77% (p=0.000 n=50) memmove/15 3.198n ± 0% 3.282n ± 0% +2.62% (p=0.000 n=50) memmove/16 3.201n ± 0% 3.284n ± 0% +2.61% (p=0.000 n=50) memmove/17 3.564n ± 0% 3.320n ± 0% -6.86% (p=0.000 n=50) memmove/18 3.572n ± 0% 3.313n ± 0% -7.25% (p=0.000 n=50) memmove/19 3.572n ± 0% 3.325n ± 0% -6.94% (p=0.000 n=50) memmove/20 3.575n ± 0% 3.319n ± 0% -7.15% (p=0.000 n=50) memmove/21 3.578n ± 0% 3.327n ± 0% -7.03% (p=0.000 n=50) memmove/22 3.581n ± 0% 3.330n ± 0% -7.01% (p=0.000 n=50) memmove/23 3.582n ± 0% 3.354n ± 1% -6.37% (p=0.000 n=50) memmove/24 3.587n ± 0% 3.347n ± 1% -6.71% (p=0.000 n=50) memmove/25 3.591n ± 0% 3.320n ± 0% -7.55% (p=0.000 n=50) memmove/26 3.593n ± 0% 3.348n ± 0% -6.82% (p=0.000 n=50) memmove/27 3.596n ± 0% 3.346n ± 0% -6.94% (p=0.000 n=50) memmove/28 3.597n ± 0% 3.357n ± 0% -6.67% (p=0.000 n=50) memmove/29 3.601n ± 0% 3.340n ± 0% -7.23% (p=0.000 n=50) memmove/30 3.602n ± 0% 3.345n ± 0% -7.12% (p=0.000 n=50) memmove/31 3.608n ± 0% 3.357n ± 0% -6.94% (p=0.000 n=50) memmove/32 3.605n ± 0% 3.352n ± 0% -7.01% (p=0.000 n=50) memmove/33 4.128n ± 1% 3.829n ± 0% -7.23% (p=0.000 n=50) memmove/34 4.149n ± 0% 3.836n ± 0% -7.54% (p=0.000 n=50) memmove/35 4.134n ± 0% 3.839n ± 0% -7.15% (n=50) memmove/36 4.151n ± 0% 3.842n ± 0% -7.45% (n=50) memmove/37 4.152n ± 0% 3.841n ± 0% -7.49% (p=0.000 n=50) memmove/38 4.159n ± 0% 3.844n ± 0% -7.58% (p=0.000 n=50) memmove/39 4.165n ± 0% 3.841n ± 0% -7.78% (p=0.000 n=50) memmove/40 4.162n ± 0% 3.837n ± 0% -7.81% (p=0.000 n=50) memmove/41 4.161n ± 0% 3.845n ± 0% -7.58% (p=0.000 n=50) memmove/42 4.164n ± 0% 3.851n ± 0% -7.53% (p=0.000 n=50) memmove/43 4.165n ± 0% 3.843n ± 0% -7.74% (p=0.000 n=50) memmove/44 4.175n ± 0% 3.847n ± 0% -7.83% (p=0.000 n=50) memmove/45 4.170n ± 0% 3.849n ± 0% -7.70% (p=0.000 n=50) memmove/46 4.175n ± 0% 3.850n ± 0% -7.79% (p=0.000 n=50) memmove/47 4.180n ± 0% 3.851n ± 0% -7.87% (p=0.000 n=50) memmove/48 4.178n ± 0% 3.852n ± 0% -7.81% (p=0.000 n=50) memmove/49 4.175n ± 0% 3.851n ± 0% -7.76% (n=50) memmove/50 4.178n ± 0% 3.855n ± 0% -7.73% (p=0.000 n=50) memmove/51 4.190n ± 0% 3.859n ± 0% -7.91% (p=0.000 n=50) memmove/52 4.188n ± 0% 3.859n ± 0% -7.84% (p=0.000 n=50) memmove/53 4.191n ± 0% 3.863n ± 0% -7.82% (p=0.000 n=50) memmove/54 4.192n ± 0% 3.860n ± 0% -7.91% (p=0.000 n=50) memmove/55 4.192n ± 0% 3.869n ± 0% -7.70% (p=0.000 n=50) memmove/56 4.204n ± 0% 3.866n ± 0% -8.05% (p=0.000 n=50) memmove/57 4.198n ± 0% 3.864n ± 0% -7.95% (p=0.000 n=50) memmove/58 4.202n ± 0% 3.865n ± 0% -8.02% (p=0.000 n=50) memmove/59 4.208n ± 0% 3.868n ± 0% -8.09% (p=0.000 n=50) memmove/60 4.205n ± 0% 3.873n ± 0% -7.89% (p=0.000 n=50) memmove/61 4.212n ± 0% 3.872n ± 0% -8.08% (p=0.000 n=50) memmove/62 4.214n ± 0% 3.870n ± 0% -8.16% (p=0.000 n=50) memmove/63 4.215n ± 0% 3.877n ± 0% -8.02% (p=0.000 n=50) memmove/64 4.217n ± 0% 3.881n ± 0% -7.99% (p=0.000 n=50) memmove/65 4.990n ± 0% 4.683n ± 0% -6.15% (p=0.000 n=50) memmove/66 5.022n ± 0% 4.719n ± 0% -6.03% (p=0.000 n=50) memmove/67 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50) memmove/68 5.035n ± 0% 4.724n ± 0% -6.18% (p=0.000 n=50) memmove/69 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50) memmove/70 5.040n ± 0% 4.728n ± 0% -6.19% (p=0.000 n=50) memmove/71 5.053n ± 0% 4.728n ± 0% -6.43% (p=0.000 n=50) memmove/72 5.050n ± 0% 4.732n ± 0% -6.29% (p=0.000 n=50) memmove/73 5.049n ± 0% 4.733n ± 0% -6.24% (p=0.000 n=50) memmove/74 5.054n ± 0% 4.734n ± 0% -6.34% (p=0.000 n=50) memmove/75 5.063n ± 0% 4.736n ± 0% -6.46% (p=0.000 n=50) memmove/76 5.046n ± 0% 4.741n ± 0% -6.04% (p=0.000 n=50) memmove/77 5.057n ± 0% 4.741n ± 0% -6.25% (p=0.000 n=50) memmove/78 5.077n ± 0% 4.739n ± 0% -6.65% (p=0.000 n=50) memmove/79 5.074n ± 0% 4.746n ± 0% -6.46% (p=0.000 n=50) memmove/80 5.085n ± 0% 4.747n ± 0% -6.65% (p=0.000 n=50) memmove/81 5.077n ± 0% 4.735n ± 0% -6.74% (p=0.000 n=50) memmove/82 5.087n ± 0% 4.747n ± 0% -6.68% (p=0.000 n=50) memmove/83 5.087n ± 0% 4.754n ± 0% -6.56% (p=0.000 n=50) memmove/84 5.096n ± 0% 4.753n ± 0% -6.73% (p=0.000 n=50) memmove/85 5.082n ± 0% 4.749n ± 0% -6.55% (p=0.000 n=50) memmove/86 5.103n ± 0% 4.752n ± 0% -6.87% (p=0.000 n=50) memmove/87 5.096n ± 0% 4.760n ± 0% -6.61% (p=0.000 n=50) memmove/88 5.099n ± 0% 4.765n ± 0% -6.55% (p=0.000 n=50) memmove/89 5.104n ± 0% 4.757n ± 0% -6.79% (p=0.000 n=50) memmove/90 5.117n ± 0% 4.767n ± 0% -6.84% (p=0.000 n=50) memmove/91 5.100n ± 0% 4.766n ± 0% -6.54% (p=0.000 n=50) memmove/92 5.103n ± 0% 4.763n ± 0% -6.67% (p=0.000 n=50) memmove/93 5.115n ± 0% 4.772n ± 0% -6.71% (p=0.000 n=50) memmove/94 5.117n ± 0% 4.769n ± 0% -6.80% (p=0.000 n=50) memmove/95 5.131n ± 0% 4.775n ± 0% -6.94% (p=0.000 n=50) memmove/96 5.129n ± 0% 4.772n ± 0% -6.97% (p=0.000 n=50) memmove/97 5.130n ± 0% 4.764n ± 0% -7.13% (p=0.000 n=50) memmove/98 5.134n ± 0% 4.780n ± 0% -6.89% (p=0.000 n=50) memmove/99 5.141n ± 0% 4.780n ± 0% -7.03% (p=0.000 n=50) memmove/100 5.141n ± 0% 4.780n ± 0% -7.02% (p=0.000 n=50) memmove/101 5.150n ± 0% 4.782n ± 0% -7.14% (p=0.000 n=50) memmove/102 5.150n ± 0% 4.790n ± 0% -6.99% (p=0.000 n=50) memmove/103 5.156n ± 0% 4.788n ± 0% -7.14% (n=50) memmove/104 5.157n ± 0% 4.793n ± 0% -7.05% (p=0.000 n=50) memmove/105 5.147n ± 0% 4.791n ± 0% -6.90% (p=0.000 n=50) memmove/106 5.167n ± 0% 4.793n ± 0% -7.23% (p=0.000 n=50) memmove/107 5.165n ± 0% 4.801n ± 0% -7.06% (p=0.000 n=50) memmove/108 5.173n ± 0% 4.800n ± 0% -7.21% (p=0.000 n=50) memmove/109 5.173n ± 0% 4.797n ± 0% -7.27% (p=0.000 n=50) memmove/110 5.171n ± 0% 4.808n ± 0% -7.01% (p=0.000 n=50) memmove/111 5.180n ± 0% 4.799n ± 0% -7.36% (p=0.000 n=50) memmove/112 5.185n ± 0% 4.812n ± 0% -7.19% (p=0.000 n=50) memmove/113 5.187n ± 0% 4.797n ± 0% -7.53% (p=0.000 n=50) memmove/114 5.183n ± 0% 4.809n ± 0% -7.21% (n=50) memmove/115 5.193n ± 0% 4.811n ± 0% -7.36% (p=0.000 n=50) memmove/116 5.196n ± 0% 4.815n ± 0% -7.32% (p=0.000 n=50) memmove/117 5.199n ± 0% 4.816n ± 0% -7.37% (p=0.000 n=50) memmove/118 5.198n ± 0% 4.811n ± 0% -7.45% (p=0.000 n=50) memmove/119 5.203n ± 0% 4.818n ± 0% -7.40% (p=0.000 n=50) memmove/120 5.195n ± 0% 4.823n ± 0% -7.16% (p=0.000 n=50) memmove/121 5.203n ± 0% 4.812n ± 0% -7.51% (p=0.000 n=50) memmove/122 5.204n ± 0% 4.818n ± 0% -7.42% (n=50) memmove/123 5.202n ± 0% 4.822n ± 0% -7.31% (p=0.000 n=50) memmove/124 5.216n ± 0% 4.823n ± 0% -7.54% (p=0.000 n=50) memmove/125 5.227n ± 0% 4.823n ± 0% -7.72% (p=0.000 n=50) memmove/126 5.235n ± 0% 4.830n ± 0% -7.74% (p=0.000 n=50) memmove/127 5.237n ± 0% 4.833n ± 0% -7.72% (p=0.000 n=50) memmove/128 5.241n ± 0% 4.832n ± 0% -7.81% (p=0.000 n=50) memmove/129 6.460n ± 0% 5.858n ± 0% -9.31% (p=0.000 n=50) memmove/130 7.539n ± 0% 6.634n ± 0% -12.00% (p=0.000 n=50) memmove/131 7.542n ± 0% 6.623n ± 0% -12.18% (p=0.000 n=50) memmove/132 7.527n ± 0% 6.667n ± 1% -11.43% (p=0.000 n=50) memmove/133 7.521n ± 0% 6.631n ± 0% -11.83% (p=0.000 n=50) memmove/134 7.531n ± 0% 6.642n ± 0% -11.81% (p=0.000 n=50) memmove/135 7.541n ± 0% 6.692n ± 1% -11.25% (p=0.000 n=50) memmove/136 7.549n ± 0% 6.657n ± 0% -11.81% (p=0.000 n=50) memmove/137 7.544n ± 0% 6.646n ± 0% -11.90% (p=0.000 n=50) memmove/138 7.557n ± 0% 6.673n ± 1% -11.70% (p=0.000 n=50) memmove/139 7.545n ± 0% 6.654n ± 0% -11.81% (n=50) memmove/140 7.559n ± 0% 6.680n ± 1% -11.63% (p=0.000 n=50) memmove/141 7.560n ± 0% 6.664n ± 0% -11.85% (p=0.000 n=50) memmove/142 7.556n ± 0% 6.679n ± 0% -11.62% (p=0.000 n=50) memmove/143 7.570n ± 0% 6.683n ± 1% -11.71% (p=0.000 n=50) memmove/144 7.586n ± 0% 6.683n ± 0% -11.91% (p=0.000 n=50) memmove/145 7.593n ± 0% 6.665n ± 0% -12.22% (p=0.000 n=50) memmove/146 7.591n ± 0% 6.665n ± 0% -12.20% (p=0.000 n=50) memmove/147 7.598n ± 0% 6.665n ± 0% -12.27% (p=0.000 n=50) memmove/148 7.598n ± 0% 6.670n ± 0% -12.21% (p=0.000 n=50) memmove/149 7.593n ± 0% 6.691n ± 0% -11.88% (p=0.000 n=50) memmove/150 7.625n ± 0% 6.713n ± 1% -11.97% (p=0.000 n=50) memmove/151 7.603n ± 0% 6.710n ± 1% -11.74% (p=0.000 n=50) memmove/152 7.613n ± 0% 6.701n ± 1% -11.97% (p=0.000 n=50) memmove/153 7.595n ± 0% 6.710n ± 0% -11.65% (p=0.000 n=50) memmove/154 7.614n ± 0% 6.721n ± 0% -11.74% (p=0.000 n=50) memmove/155 7.615n ± 0% 6.709n ± 0% -11.89% (p=0.000 n=50) memmove/156 7.613n ± 0% 6.693n ± 0% -12.08% (p=0.000 n=50) memmove/157 7.628n ± 0% 6.708n ± 0% -12.05% (p=0.000 n=50) memmove/158 7.629n ± 0% 6.706n ± 0% -12.10% (p=0.000 n=50) memmove/159 7.639n ± 0% 6.724n ± 0% -11.98% (p=0.000 n=50) memmove/160 7.619n ± 0% 6.702n ± 0% -12.04% (p=0.000 n=50) memmove/161 7.653n ± 0% 6.698n ± 0% -12.49% (p=0.000 n=50) memmove/162 8.104n ± 0% 7.140n ± 1% -11.89% (p=0.000 n=50) memmove/163 8.141n ± 0% 7.187n ± 1% -11.72% (p=0.000 n=50) memmove/164 8.154n ± 0% 7.107n ± 0% -12.84% (p=0.000 n=50) memmove/165 8.143n ± 0% 7.117n ± 0% -12.59% (p=0.000 n=50) memmove/166 8.176n ± 0% 7.110n ± 0% -13.04% (p=0.000 n=50) memmove/167 8.194n ± 0% 7.168n ± 1% -12.52% (p=0.000 n=50) memmove/168 8.214n ± 0% 7.188n ± 1% -12.50% (p=0.000 n=50) memmove/169 8.220n ± 0% 7.242n ± 1% -11.90% (p=0.000 n=50) memmove/170 8.228n ± 0% 7.244n ± 1% -11.96% (p=0.000 n=50) memmove/171 8.263n ± 0% 7.184n ± 0% -13.06% (p=0.000 n=50) memmove/172 8.259n ± 0% 7.325n ± 1% -11.31% (p=0.000 n=50) memmove/173 8.271n ± 0% 7.225n ± 0% -12.65% (p=0.000 n=50) memmove/174 8.284n ± 0% 7.287n ± 1% -12.04% (p=0.000 n=50) memmove/175 8.289n ± 0% 7.282n ± 1% -12.15% (p=0.000 n=50) memmove/176 8.309n ± 0% 7.328n ± 1% -11.81% (p=0.000 n=50) memmove/177 8.317n ± 0% 7.264n ± 1% -12.67% (p=0.000 n=50) memmove/178 8.302n ± 0% 7.342n ± 1% -11.57% (p=0.000 n=50) memmove/179 8.309n ± 0% 7.357n ± 1% -11.45% (p=0.000 n=50) memmove/180 8.304n ± 0% 7.318n ± 1% -11.87% (p=0.000 n=50) memmove/181 8.312n ± 0% 7.363n ± 1% -11.42% (p=0.000 n=50) memmove/182 8.315n ± 0% 7.320n ± 1% -11.96% (p=0.000 n=50) memmove/183 8.330n ± 0% 7.286n ± 1% -12.53% (p=0.000 n=50) memmove/184 8.310n ± 0% 7.324n ± 1% -11.86% (p=0.000 n=50) memmove/185 8.303n ± 0% 7.267n ± 1% -12.47% (p=0.000 n=50) memmove/186 8.287n ± 0% 7.312n ± 1% -11.76% (p=0.000 n=50) memmove/187 8.298n ± 0% 7.395n ± 2% -10.88% (p=0.000 n=50) memmove/188 8.296n ± 0% 7.339n ± 1% -11.54% (p=0.000 n=50) memmove/189 8.306n ± 0% 7.299n ± 1% -12.12% (p=0.000 n=50) memmove/190 8.281n ± 0% 7.309n ± 1% -11.74% (p=0.000 n=50) memmove/191 8.299n ± 0% 7.282n ± 1% -12.26% (p=0.000 n=50) memmove/192 8.281n ± 0% 7.335n ± 1% -11.41% (p=0.000 n=50) memmove/193 8.299n ± 0% 7.325n ± 1% -11.74% (p=0.000 n=50) memmove/194 8.641n ± 0% 8.034n ± 0% -7.02% (p=0.000 n=50) memmove/195 8.667n ± 0% 8.073n ± 0% -6.85% (p=0.000 n=50) memmove/196 8.666n ± 0% 8.030n ± 0% -7.34% (p=0.000 n=50) memmove/197 8.660n ± 0% 8.096n ± 1% -6.51% (p=0.000 n=50) memmove/198 8.688n ± 0% 8.047n ± 0% -7.39% (p=0.000 n=50) memmove/199 8.678n ± 0% 8.061n ± 0% -7.11% (p=0.000 n=50) memmove/200 8.669n ± 0% 8.034n ± 0% -7.32% (p=0.000 n=50) memmove/201 8.692n ± 0% 8.061n ± 0% -7.26% (p=0.000 n=50) memmove/202 8.668n ± 0% 8.060n ± 0% -7.02% (p=0.000 n=50) memmove/203 8.687n ± 0% 8.066n ± 0% -7.15% (p=0.000 n=50) memmove/204 8.699n ± 0% 8.076n ± 0% -7.16% (p=0.000 n=50) memmove/205 8.676n ± 0% 8.085n ± 0% -6.82% (p=0.000 n=50) memmove/206 8.684n ± 0% 8.101n ± 1% -6.71% (p=0.000 n=50) memmove/207 8.725n ± 0% 8.099n ± 0% -7.18% (p=0.000 n=50) memmove/208 8.674n ± 0% 8.073n ± 0% -6.92% (p=0.000 n=50) memmove/209 8.697n ± 0% 8.088n ± 0% -7.01% (p=0.000 n=50) memmove/210 8.733n ± 0% 8.076n ± 0% -7.53% (p=0.000 n=50) memmove/211 8.732n ± 0% 8.104n ± 0% -7.19% (p=0.000 n=50) memmove/212 8.730n ± 0% 8.091n ± 0% -7.32% (p=0.000 n=50) memmove/213 8.728n ± 0% 8.100n ± 0% -7.19% (p=0.000 n=50) memmove/214 8.744n ± 1% 8.081n ± 1% -7.57% (p=0.000 n=50) memmove/215 8.734n ± 0% 8.150n ± 0% -6.68% (p=0.000 n=50) memmove/216 8.748n ± 0% 8.116n ± 0% -7.23% (p=0.000 n=50) memmove/217 8.751n ± 0% 8.129n ± 1% -7.11% (p=0.000 n=50) memmove/218 8.747n ± 0% 8.114n ± 0% -7.23% (p=0.000 n=50) memmove/219 8.733n ± 0% 8.159n ± 0% -6.57% (p=0.000 n=50) memmove/220 8.764n ± 0% 8.145n ± 0% -7.06% (p=0.000 n=50) memmove/221 8.764n ± 0% 8.142n ± 0% -7.10% (p=0.000 n=50) memmove/222 8.775n ± 0% 8.152n ± 0% -7.10% (p=0.000 n=50) memmove/223 8.771n ± 0% 8.143n ± 0% -7.16% (p=0.000 n=50) memmove/224 8.778n ± 0% 8.175n ± 1% -6.87% (p=0.000 n=50) memmove/225 8.794n ± 0% 8.138n ± 0% -7.45% (p=0.000 n=50) memmove/226 10.13n ± 0% 10.06n ± 0% -0.71% (p=0.000 n=50) memmove/227 10.14n ± 0% 10.08n ± 0% -0.53% (p=0.000 n=50) memmove/228 10.13n ± 0% 10.08n ± 0% -0.56% (p=0.000 n=50) memmove/229 10.17n ± 0% 10.11n ± 0% -0.56% (p=0.000 n=50) memmove/230 10.17n ± 0% 10.13n ± 0% -0.38% (p=0.003 n=50) memmove/231 10.16n ± 0% 10.12n ± 0% -0.41% (p=0.001 n=50) memmove/232 10.19n ± 0% 10.12n ± 0% -0.67% (p=0.000 n=50) memmove/233 10.21n ± 0% 10.14n ± 0% -0.71% (p=0.000 n=50) memmove/234 10.24n ± 0% 10.16n ± 0% -0.79% (p=0.000 n=50) memmove/235 10.24n ± 0% 10.16n ± 0% -0.76% (p=0.000 n=50) memmove/236 10.25n ± 0% 10.16n ± 0% -0.81% (p=0.000 n=50) memmove/237 10.24n ± 0% 10.17n ± 0% -0.69% (p=0.000 n=50) memmove/238 10.27n ± 0% 10.19n ± 0% -0.79% (p=0.000 n=50) memmove/239 10.29n ± 0% 10.19n ± 0% -0.90% (p=0.000 n=50) memmove/240 10.30n ± 0% 10.20n ± 0% -0.95% (p=0.000 n=50) memmove/241 10.29n ± 0% 10.20n ± 0% -0.91% (p=0.000 n=50) memmove/242 10.30n ± 0% 10.22n ± 0% -0.80% (p=0.000 n=50) memmove/243 10.32n ± 0% 10.23n ± 0% -0.87% (p=0.000 n=50) memmove/244 10.32n ± 0% 10.24n ± 0% -0.74% (p=0.000 n=50) memmove/245 10.33n ± 0% 10.23n ± 0% -0.97% (p=0.000 n=50) memmove/246 10.33n ± 0% 10.24n ± 0% -0.92% (p=0.000 n=50) memmove/247 10.31n ± 0% 10.24n ± 0% -0.69% (p=0.000 n=50) memmove/248 10.32n ± 0% 10.26n ± 0% -0.55% (p=0.000 n=50) memmove/249 10.33n ± 0% 10.28n ± 0% -0.52% (p=0.000 n=50) memmove/250 10.34n ± 0% 10.27n ± 0% -0.66% (p=0.000 n=50) memmove/251 10.32n ± 0% 10.27n ± 0% -0.45% (p=0.000 n=50) memmove/252 10.34n ± 0% 10.30n ± 0% -0.39% (p=0.005 n=50) memmove/253 10.33n ± 0% 10.27n ± 0% -0.57% (p=0.000 n=50) memmove/254 10.33n ± 0% 10.27n ± 0% -0.54% (p=0.000 n=50) memmove/255 10.34n ± 0% 10.29n ± 0% -0.50% (p=0.002 n=50) memmove/256 10.36n ± 0% 10.31n ± 0% -0.44% (p=0.006 n=50) memmove/257 10.33n ± 0% 10.29n ± 0% -0.36% (p=0.004 n=50) geomean 6.142n 5.696n -7.26% ```
2023-10-04[libc] Change the GPU to use builtin memory functions (#68003)Joseph Huber
Summary: The GPU build is special in the sense that we always know that up-to-date `clang` is always going to be the compiler. This allows us to rely directly on builtins, which allow us to push a lot of this complexity into the backend. Backend implementations are favored on the GPU because it allows us to do a lot more target specific optimizations. This patch changes over the common memory functions to use builtin versions when building for AMDGPU or NVPTX.
2023-09-26[libc] Mass replace enclosing namespace (#67032)Guillaume Chatelet
This is step 4 of https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
2023-09-21[libc][clang-tidy] Add llvm-header-guard to get consistant naming and ↵Guillaume Chatelet
prevent file copy/paste issues. (#66477)
2023-09-12[libc] Fix a typo in a CMakeLists.txt - replace DEPS with DEPENDS. (#66130)Siva Chandra
2023-08-07[libc] Clean up required LIBC_INLINE uses in src/stringRoland McGrath
This was generated using clang-tidy and clang-apply-replacements, on src/string/*.cpp for just the llvmlibc-inline-function-decl check, after applying https://reviews.llvm.org/D157164, and then some manual fixup. Reviewed By: abrachet Differential Revision: https://reviews.llvm.org/D157169
2023-07-19[libc][NFC] Rename filesGuillaume Chatelet
This patch mostly renames files so it better reflects the function they declare. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D155607