| Age | Commit message (Collapse) | Author |
|
This implementation has been compiled with the [pigweed toolchain](https://pigweed.dev/toolchain.html) and tested on:
- Raspberry Pi Pico 2 with the following options\
`--target=armv8m.main-none-eabi`
`-march=armv8m.main+fp+dsp`
`-mcpu=cortex-m33`
- Raspberry Pi Pico with the following options\
`--target=armv6m-none-eabi`
`-march=armv6m`
`-mcpu=cortex-m0+`
They both compile down to a little bit more than 200 bytes and are between 2 and 10 times faster than byte per byte copies.
For best performance the following options can be set in the `libc/config/baremetal/arm/config.json`
```
{
"codegen": {
"LIBC_CONF_KEEP_FRAME_POINTER": {
"value": false
}
},
"general": {
"LIBC_ADD_NULL_CHECKS": {
"value": false
}
}
}
```
|
|
FP/SIMD (#137592)
Add conditional compilation to add support for AArch64 without vector
registers and/or hardware FPUs by using the generic implementation.
**Context:**
A few functions were hard-coded to use vector registers/hardware FPUs.
This meant that libc would not compile on architectures that did not
support these features. This fix falls back on the generic
implementation if a feature is not supported.
|
|
|
|
Summary:
Right now a lot of the memory functions error if we don't have specific
handling for them. This is weird because we have a generic
implementation that should just be used whenever someone hasn't written
a more optimized version. This allows us to use the `libc` headers with
more architectures from the `shared/` directory without worrying about
it breaking.
|
|
Relates to
https://github.com/llvm/llvm-project/issues/119281#issuecomment-2699470459
|
|
This reverts commit 1e6e845d49a336e9da7ca6c576ec45c0b419b5f6 because it
changed the 1st parameter of adjust() to be unsigned, but libc itself
calls adjust() with a negative argument in align_backward() in
op_generic.h.
|
|
Followup to #127523
There were some test failures on arm32 after enabling Wconversion. There
were some tests that were failing due to missing casts. Also I changed
BigInt's `safe_get_at` back to being signed since it needed the ability
to be negative.
|
|
Relates to: #119281
|
|
Fixed imports for all files *within* `libc/src/string/memory_utils`.
Note: This doesn't include **all** files that need to be fixed.
Fixes #86579
|
|
(#117640)
|
|
This prevents a conflict with the Linux system endian.h when built in
overlay mode for CPP files in __support.
This issue appeared in PR #106259.
|
|
(#113161)
When using `-mprefer-vector-width=128` with `-march=sandybridge` copying
3 cache lines in one go (192B) gets converted into `rep;movsb` which
translate into a 60% hit in performance.
Consecutive calls to `__builtin_memcpy_inline` (implementation behind
`builtin::Memcpy::block_offset`) are not coalesced by the compiler and
so calling it three times in a row generates the desired assembly. It
only differs in the interleaving of the loads and stores and does not
affect performance.
This is needed to reland
https://github.com/llvm/llvm-project/pull/108939.
|
|
The patch primarily cleans up some incorrect includes. The `LIBC_INLINE`
macro is defined in `attributes.h`, not `config.h`. There appears to be
no need to change the CMake and Bazel build files.
|
|
Binary size changes:
| Bytes (cache lines) | before | after |
|---------------------|----------|---------|
| sse4 | 419 (7) | 288 (5) |
| avx | 430 (7) | 308 (5) |
| avx512f | 589 (10) | 390 (7) |
Benchmarks for different CPUs using
https://github.com/google/fleetbench.
- indus-cascadelake
```
name old speed new speed delta
BM_LIBC_Bcmp_Fleet_L1 1.96GB/s ± 1% 2.19GB/s ± 0% +11.49% (p=0.000 n=29+24)
BM_LIBC_Bcmp_Fleet_L2 1.90GB/s ± 1% 2.14GB/s ± 1% +12.68% (p=0.000 n=29+24)
BM_LIBC_Bcmp_Fleet_LLC 513MB/s ± 4% 531MB/s ± 4% +3.53% (p=0.000 n=24+24)
BM_LIBC_Bcmp_Fleet_Cold 452MB/s ± 3% 456MB/s ± 4% ~ (p=0.103 n=30+30)
BM_LIBC_Bcmp_0_L1 [Bcmp_0] 2.98GB/s ± 1% 3.15GB/s ± 1% +5.59% (p=0.000 n=29+30)
BM_LIBC_Bcmp_0_L2 [Bcmp_0] 2.86GB/s ± 1% 3.07GB/s ± 1% +7.21% (p=0.000 n=29+30)
BM_LIBC_Bcmp_0_LLC [Bcmp_0] 738MB/s ± 7% 751MB/s ± 3% +1.68% (p=0.000 n=24+25)
BM_LIBC_Bcmp_0_Cold [Bcmp_0] 643MB/s ± 3% 642MB/s ± 4% ~ (p=0.522 n=29+30)
BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.08GB/s ± 0% 3.25GB/s ± 0% +5.35% (p=0.000 n=28+30)
BM_LIBC_Bcmp_1_L2 [Bcmp_1] 2.97GB/s ± 1% 3.17GB/s ± 1% +6.65% (p=0.000 n=29+30)
BM_LIBC_Bcmp_1_LLC [Bcmp_1] 901MB/s ±59% 871MB/s ±36% ~ (p=0.676 n=29+27)
BM_LIBC_Bcmp_1_Cold [Bcmp_1] 686MB/s ± 4% 686MB/s ± 3% ~ (p=0.934 n=29+30)
BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.63GB/s ± 0% 1.80GB/s ± 1% +10.19% (p=0.000 n=29+30)
BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.57GB/s ± 1% 1.75GB/s ± 1% +11.46% (p=0.000 n=29+30)
BM_LIBC_Bcmp_2_LLC [Bcmp_2] 451MB/s ±61% 427MB/s ±28% ~ (p=0.469 n=29+25)
BM_LIBC_Bcmp_2_Cold [Bcmp_2] 353MB/s ± 4% 354MB/s ± 5% ~ (p=0.467 n=30+30)
BM_LIBC_Bcmp_3_L1 [Bcmp_3] 1.91GB/s ± 1% 2.10GB/s ± 1% +9.90% (p=0.000 n=29+29)
BM_LIBC_Bcmp_3_L2 [Bcmp_3] 1.84GB/s ± 1% 2.03GB/s ± 1% +10.63% (p=0.000 n=29+30)
BM_LIBC_Bcmp_3_LLC [Bcmp_3] 491MB/s ±24% 538MB/s ±24% +9.66% (p=0.000 n=24+27)
BM_LIBC_Bcmp_3_Cold [Bcmp_3] 417MB/s ± 4% 421MB/s ± 3% ~ (p=0.063 n=30+29)
BM_LIBC_Bcmp_4_L1 [Bcmp_4] 761MB/s ± 1% 867MB/s ± 1% +14.02% (p=0.000 n=28+30)
BM_LIBC_Bcmp_4_L2 [Bcmp_4] 748MB/s ± 1% 860MB/s ± 1% +15.04% (p=0.000 n=30+30)
BM_LIBC_Bcmp_4_LLC [Bcmp_4] 227MB/s ±29% 260MB/s ±64% +14.70% (p=0.000 n=26+27)
BM_LIBC_Bcmp_4_Cold [Bcmp_4] 187MB/s ± 3% 191MB/s ± 5% +2.26% (p=0.000 n=30+30)
BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.48GB/s ± 1% 1.71GB/s ± 1% +15.26% (p=0.000 n=29+30)
BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.42GB/s ± 1% 1.67GB/s ± 1% +17.68% (p=0.000 n=29+29)
BM_LIBC_Bcmp_5_LLC [Bcmp_5] 412MB/s ±34% 519MB/s ±80% +25.87% (p=0.000 n=27+30)
BM_LIBC_Bcmp_5_Cold [Bcmp_5] 336MB/s ± 4% 343MB/s ± 6% +2.05% (p=0.000 n=30+30)
BM_LIBC_Bcmp_6_L1 [Bcmp_6] 2.87GB/s ± 0% 3.24GB/s ± 1% +12.88% (p=0.000 n=26+30)
BM_LIBC_Bcmp_6_L2 [Bcmp_6] 2.78GB/s ± 1% 3.20GB/s ± 1% +15.15% (p=0.000 n=26+30)
BM_LIBC_Bcmp_6_LLC [Bcmp_6] 926MB/s ±43% 1227MB/s ±76% +32.53% (p=0.000 n=27+30)
BM_LIBC_Bcmp_6_Cold [Bcmp_6] 716MB/s ± 4% 737MB/s ± 6% +3.02% (p=0.000 n=28+29)
BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.54GB/s ± 1% 1.56GB/s ± 0% +1.40% (p=0.000 n=29+30)
BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.47GB/s ± 1% 1.52GB/s ± 1% +2.97% (p=0.000 n=27+30)
BM_LIBC_Bcmp_7_LLC [Bcmp_7] 351MB/s ±23% 436MB/s ±83% +24.04% (p=0.005 n=24+29)
BM_LIBC_Bcmp_7_Cold [Bcmp_7] 283MB/s ± 4% 282MB/s ± 4% ~ (p=0.644 n=30+30)
BM_LIBC_Bcmp_8_L1 [Bcmp_8] 824MB/s ± 1% 1048MB/s ± 1% +27.18% (p=0.000 n=29+30)
BM_LIBC_Bcmp_8_L2 [Bcmp_8] 808MB/s ± 1% 1027MB/s ± 1% +27.12% (p=0.000 n=29+29)
BM_LIBC_Bcmp_8_LLC [Bcmp_8] 317MB/s ±79% 332MB/s ±74% ~ (p=0.338 n=30+29)
BM_LIBC_Bcmp_8_Cold [Bcmp_8] 207MB/s ± 5% 212MB/s ± 5% +2.27% (p=0.000 n=30+30)
```
- indus-skylake
```
name old speed new speed delta
BM_LIBC_Bcmp_Fleet_L1 2.06GB/s ± 2% 2.25GB/s ± 3% +9.66% (p=0.000 n=27+24)
BM_LIBC_Bcmp_Fleet_L2 1.96GB/s ± 2% 2.17GB/s ± 2% +10.61% (p=0.000 n=30+24)
BM_LIBC_Bcmp_Fleet_LLC 1.18GB/s ± 6% 1.32GB/s ± 5% +12.27% (p=0.000 n=28+28)
BM_LIBC_Bcmp_Fleet_Cold 456MB/s ± 2% 466MB/s ± 2% +2.22% (p=0.000 n=28+28)
BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.08GB/s ± 2% 3.20GB/s ± 1% +3.72% (p=0.000 n=28+22)
BM_LIBC_Bcmp_0_L2 [Bcmp_0] 2.92GB/s ± 1% 3.05GB/s ± 2% +4.49% (p=0.000 n=23+23)
BM_LIBC_Bcmp_0_LLC [Bcmp_0] 1.83GB/s ± 8% 1.94GB/s ± 4% +6.24% (p=0.000 n=25+27)
BM_LIBC_Bcmp_0_Cold [Bcmp_0] 654MB/s ± 2% 659MB/s ± 2% +0.76% (p=0.012 n=30+29)
BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.19GB/s ± 2% 3.34GB/s ± 2% +4.41% (p=0.000 n=26+23)
BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.05GB/s ± 2% 3.21GB/s ± 2% +5.32% (p=0.000 n=28+25)
BM_LIBC_Bcmp_1_LLC [Bcmp_1] 1.95GB/s ± 4% 2.03GB/s ±10% +3.61% (p=0.000 n=27+30)
BM_LIBC_Bcmp_1_Cold [Bcmp_1] 700MB/s ± 2% 702MB/s ± 2% ~ (p=0.150 n=30+30)
BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.69GB/s ± 2% 1.85GB/s ± 1% +9.31% (p=0.000 n=30+26)
BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.60GB/s ± 2% 1.78GB/s ± 2% +10.90% (p=0.000 n=26+27)
BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.01GB/s ± 5% 1.12GB/s ± 5% +11.40% (p=0.000 n=27+28)
BM_LIBC_Bcmp_2_Cold [Bcmp_2] 355MB/s ± 3% 360MB/s ± 3% +1.46% (p=0.000 n=30+30)
BM_LIBC_Bcmp_3_L1 [Bcmp_3] 1.98GB/s ± 2% 2.15GB/s ± 2% +8.89% (p=0.000 n=29+27)
BM_LIBC_Bcmp_3_L2 [Bcmp_3] 1.87GB/s ± 3% 2.05GB/s ± 2% +10.06% (p=0.000 n=30+26)
BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.19GB/s ± 4% 1.31GB/s ± 6% +9.82% (p=0.000 n=27+29)
BM_LIBC_Bcmp_3_Cold [Bcmp_3] 424MB/s ± 3% 431MB/s ± 3% +1.58% (p=0.000 n=28+30)
BM_LIBC_Bcmp_4_L1 [Bcmp_4] 849MB/s ± 2% 949MB/s ± 2% +11.84% (p=0.000 n=27+28)
BM_LIBC_Bcmp_4_L2 [Bcmp_4] 815MB/s ± 3% 913MB/s ± 3% +12.06% (p=0.000 n=29+30)
BM_LIBC_Bcmp_4_LLC [Bcmp_4] 512MB/s ± 9% 571MB/s ± 7% +11.40% (p=0.000 n=30+30)
BM_LIBC_Bcmp_4_Cold [Bcmp_4] 187MB/s ± 3% 192MB/s ± 2% +2.56% (p=0.000 n=30+28)
BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.55GB/s ± 2% 1.77GB/s ± 3% +13.93% (p=0.000 n=30+28)
BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.47GB/s ± 2% 1.70GB/s ± 2% +15.96% (p=0.000 n=27+26)
BM_LIBC_Bcmp_5_LLC [Bcmp_5] 939MB/s ± 5% 1084MB/s ± 4% +15.36% (p=0.000 n=28+27)
BM_LIBC_Bcmp_5_Cold [Bcmp_5] 340MB/s ± 2% 347MB/s ± 3% +1.93% (p=0.000 n=30+30)
BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.06GB/s ± 3% 3.40GB/s ± 2% +11.13% (p=0.000 n=30+28)
BM_LIBC_Bcmp_6_L2 [Bcmp_6] 2.89GB/s ± 3% 3.24GB/s ± 2% +12.20% (p=0.000 n=29+26)
BM_LIBC_Bcmp_6_LLC [Bcmp_6] 1.93GB/s ± 4% 2.09GB/s ±11% +8.16% (p=0.000 n=26+30)
BM_LIBC_Bcmp_6_Cold [Bcmp_6] 746MB/s ± 2% 762MB/s ± 2% +2.11% (p=0.000 n=30+28)
BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.59GB/s ± 2% 1.62GB/s ± 2% +1.72% (p=0.000 n=25+27)
BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.49GB/s ± 2% 1.53GB/s ± 2% +2.62% (p=0.000 n=27+29)
BM_LIBC_Bcmp_7_LLC [Bcmp_7] 852MB/s ±10% 909MB/s ± 6% +6.71% (p=0.000 n=30+29)
BM_LIBC_Bcmp_7_Cold [Bcmp_7] 283MB/s ± 3% 283MB/s ± 2% ~ (p=0.617 n=30+27)
BM_LIBC_Bcmp_8_L1 [Bcmp_8] 891MB/s ± 2% 1083MB/s ± 2% +21.64% (p=0.000 n=27+24)
BM_LIBC_Bcmp_8_L2 [Bcmp_8] 855MB/s ± 2% 1045MB/s ± 1% +22.31% (p=0.000 n=25+23)
BM_LIBC_Bcmp_8_LLC [Bcmp_8] 568MB/s ± 7% 659MB/s ± 8% +16.04% (p=0.000 n=29+30)
BM_LIBC_Bcmp_8_Cold [Bcmp_8] 207MB/s ± 2% 212MB/s ± 2% +2.31% (p=0.000 n=30+27)
```
- arcadia-rome
```
name old speed new speed delta
BM_LIBC_Bcmp_Fleet_L1 2.16GB/s ± 2% 2.27GB/s ± 2% +5.13% (p=0.000 n=26+30)
BM_LIBC_Bcmp_Fleet_L2 2.15GB/s ± 2% 2.25GB/s ± 2% +4.64% (p=0.000 n=27+30)
BM_LIBC_Bcmp_Fleet_LLC 1.73GB/s ± 3% 1.81GB/s ± 3% +4.66% (p=0.000 n=25+28)
BM_LIBC_Bcmp_Fleet_Cold 494MB/s ± 1% 496MB/s ± 2% +0.45% (p=0.023 n=22+24)
BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.30GB/s ± 1% 3.24GB/s ± 2% -1.70% (p=0.000 n=27+30)
BM_LIBC_Bcmp_0_L2 [Bcmp_0] 3.23GB/s ± 2% 3.19GB/s ± 2% -1.28% (p=0.000 n=28+28)
BM_LIBC_Bcmp_0_LLC [Bcmp_0] 2.59GB/s ± 3% 2.58GB/s ± 2% -0.65% (p=0.010 n=26+26)
BM_LIBC_Bcmp_0_Cold [Bcmp_0] 720MB/s ± 1% 707MB/s ± 3% -1.75% (p=0.000 n=22+25)
BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.37GB/s ± 1% 3.36GB/s ± 2% ~ (p=0.102 n=28+29)
BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.32GB/s ± 2% 3.30GB/s ± 2% -0.51% (p=0.038 n=28+29)
BM_LIBC_Bcmp_1_LLC [Bcmp_1] 2.67GB/s ± 4% 2.70GB/s ± 4% +0.96% (p=0.009 n=28+27)
BM_LIBC_Bcmp_1_Cold [Bcmp_1] 755MB/s ± 1% 751MB/s ± 2% -0.57% (p=0.000 n=22+25)
BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.79GB/s ± 1% 1.86GB/s ± 2% +3.92% (p=0.000 n=27+29)
BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.77GB/s ± 2% 1.82GB/s ± 2% +2.99% (p=0.000 n=28+29)
BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.41GB/s ± 4% 1.47GB/s ± 3% +3.97% (p=0.000 n=28+28)
BM_LIBC_Bcmp_2_Cold [Bcmp_2] 386MB/s ± 1% 389MB/s ± 1% +0.60% (p=0.000 n=21+23)
BM_LIBC_Bcmp_3_L1 [Bcmp_3] 2.07GB/s ± 2% 2.17GB/s ± 2% +4.87% (p=0.000 n=29+30)
BM_LIBC_Bcmp_3_L2 [Bcmp_3] 2.07GB/s ± 2% 2.13GB/s ± 2% +3.02% (p=0.000 n=28+30)
BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.66GB/s ± 2% 1.73GB/s ± 2% +4.08% (p=0.000 n=29+26)
BM_LIBC_Bcmp_3_Cold [Bcmp_3] 466MB/s ± 2% 469MB/s ± 3% +0.66% (p=0.001 n=22+25)
BM_LIBC_Bcmp_4_L1 [Bcmp_4] 861MB/s ± 1% 964MB/s ± 2% +11.98% (p=0.000 n=29+29)
BM_LIBC_Bcmp_4_L2 [Bcmp_4] 853MB/s ± 2% 935MB/s ± 2% +9.54% (p=0.000 n=28+29)
BM_LIBC_Bcmp_4_LLC [Bcmp_4] 707MB/s ± 3% 743MB/s ± 4% +5.08% (p=0.000 n=29+29)
BM_LIBC_Bcmp_4_Cold [Bcmp_4] 199MB/s ± 3% 199MB/s ± 2% ~ (p=0.107 n=29+25)
BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.65GB/s ± 1% 1.75GB/s ± 2% +6.15% (p=0.000 n=29+29)
BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.64GB/s ± 3% 1.73GB/s ± 2% +5.37% (p=0.000 n=29+29)
BM_LIBC_Bcmp_5_LLC [Bcmp_5] 1.32GB/s ± 2% 1.40GB/s ± 2% +6.21% (p=0.000 n=28+27)
BM_LIBC_Bcmp_5_Cold [Bcmp_5] 370MB/s ± 3% 371MB/s ± 2% +0.16% (p=0.008 n=29+25)
BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.25GB/s ± 2% 3.47GB/s ± 2% +6.74% (p=0.000 n=28+29)
BM_LIBC_Bcmp_6_L2 [Bcmp_6] 3.26GB/s ± 1% 3.44GB/s ± 1% +5.43% (p=0.000 n=28+29)
BM_LIBC_Bcmp_6_LLC [Bcmp_6] 2.66GB/s ± 2% 2.79GB/s ± 3% +4.90% (p=0.000 n=27+29)
BM_LIBC_Bcmp_6_Cold [Bcmp_6] 812MB/s ± 3% 799MB/s ± 2% -1.57% (p=0.000 n=29+25)
BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.71GB/s ± 2% 1.66GB/s ± 2% -3.14% (p=0.000 n=29+29)
BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.63GB/s ± 2% 1.59GB/s ± 2% -2.50% (p=0.000 n=29+28)
BM_LIBC_Bcmp_7_LLC [Bcmp_7] 1.25GB/s ± 4% 1.25GB/s ± 2% ~ (p=0.530 n=28+26)
BM_LIBC_Bcmp_7_Cold [Bcmp_7] 311MB/s ± 3% 308MB/s ± 1% ~ (p=0.127 n=29+24)
BM_LIBC_Bcmp_8_L1 [Bcmp_8] 869MB/s ± 2% 1098MB/s ± 2% +26.28% (p=0.000 n=27+29)
BM_LIBC_Bcmp_8_L2 [Bcmp_8] 873MB/s ± 2% 1075MB/s ± 1% +23.06% (p=0.000 n=27+29)
BM_LIBC_Bcmp_8_LLC [Bcmp_8] 743MB/s ± 4% 859MB/s ± 4% +15.58% (p=0.000 n=27+27)
BM_LIBC_Bcmp_8_Cold [Bcmp_8] 221MB/s ± 4% 221MB/s ± 3% +0.14% (p=0.034 n=29+25)
```
- ixion-haswell
```
name old speed new speed delta
BM_LIBC_Bcmp_Fleet_L1 2.27GB/s ± 5% 2.41GB/s ± 6% +6.10% (p=0.000 n=29+28)
BM_LIBC_Bcmp_Fleet_L2 2.14GB/s ± 6% 2.33GB/s ± 5% +9.21% (p=0.000 n=29+30)
BM_LIBC_Bcmp_Fleet_LLC 1.30GB/s ± 9% 1.43GB/s ± 8% +9.85% (p=0.000 n=30+30)
BM_LIBC_Bcmp_Fleet_Cold 475MB/s ± 6% 475MB/s ± 5% ~ (p=0.839 n=30+29)
BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.38GB/s ± 7% 3.46GB/s ± 6% +2.35% (p=0.009 n=30+29)
BM_LIBC_Bcmp_0_L2 [Bcmp_0] 3.20GB/s ± 5% 3.32GB/s ± 6% +3.52% (p=0.000 n=28+30)
BM_LIBC_Bcmp_0_LLC [Bcmp_0] 1.88GB/s ± 9% 2.00GB/s ± 6% +6.63% (p=0.000 n=30+28)
BM_LIBC_Bcmp_0_Cold [Bcmp_0] 664MB/s ± 6% 655MB/s ± 6% -1.32% (p=0.025 n=30+30)
BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.50GB/s ± 8% 3.61GB/s ±10% +3.09% (p=0.001 n=29+30)
BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.32GB/s ± 7% 3.48GB/s ± 8% +4.89% (p=0.000 n=29+30)
BM_LIBC_Bcmp_1_LLC [Bcmp_1] 2.02GB/s ± 7% 2.14GB/s ± 9% +5.82% (p=0.000 n=28+29)
BM_LIBC_Bcmp_1_Cold [Bcmp_1] 716MB/s ± 6% 709MB/s ± 5% -0.97% (p=0.040 n=30+28)
BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.83GB/s ± 7% 1.97GB/s ± 8% +7.90% (p=0.000 n=30+30)
BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.74GB/s ± 6% 1.92GB/s ± 6% +10.29% (p=0.000 n=30+29)
BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.05GB/s ± 9% 1.15GB/s ± 9% +9.73% (p=0.000 n=30+30)
BM_LIBC_Bcmp_2_Cold [Bcmp_2] 379MB/s ± 6% 372MB/s ± 6% -1.74% (p=0.012 n=30+30)
BM_LIBC_Bcmp_3_L1 [Bcmp_3] 2.17GB/s ± 5% 2.29GB/s ± 6% +5.61% (p=0.000 n=29+30)
BM_LIBC_Bcmp_3_L2 [Bcmp_3] 2.02GB/s ± 6% 2.20GB/s ± 6% +8.75% (p=0.000 n=29+30)
BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.22GB/s ± 8% 1.34GB/s ± 9% +9.19% (p=0.000 n=30+30)
BM_LIBC_Bcmp_3_Cold [Bcmp_3] 447MB/s ± 3% 441MB/s ± 7% -1.40% (p=0.033 n=30+30)
BM_LIBC_Bcmp_4_L1 [Bcmp_4] 902MB/s ± 6% 995MB/s ±10% +10.37% (p=0.000 n=30+30)
BM_LIBC_Bcmp_4_L2 [Bcmp_4] 863MB/s ± 5% 945MB/s ±11% +9.50% (p=0.000 n=29+30)
BM_LIBC_Bcmp_4_LLC [Bcmp_4] 528MB/s ±11% 559MB/s ±12% +5.75% (p=0.000 n=30+30)
BM_LIBC_Bcmp_4_Cold [Bcmp_4] 183MB/s ± 4% 181MB/s ± 7% ~ (p=0.088 n=28+30)
BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.70GB/s ± 6% 1.87GB/s ± 8% +10.14% (p=0.000 n=29+29)
BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.60GB/s ± 5% 1.80GB/s ± 9% +12.61% (p=0.000 n=29+30)
BM_LIBC_Bcmp_5_LLC [Bcmp_5] 994MB/s ±13% 1094MB/s ± 8% +10.10% (p=0.000 n=29+30)
BM_LIBC_Bcmp_5_Cold [Bcmp_5] 362MB/s ± 6% 358MB/s ± 7% ~ (p=0.123 n=30+30)
BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.31GB/s ± 5% 3.67GB/s ± 6% +10.90% (p=0.000 n=28+30)
BM_LIBC_Bcmp_6_L2 [Bcmp_6] 3.11GB/s ± 5% 3.53GB/s ± 5% +13.59% (p=0.000 n=30+30)
BM_LIBC_Bcmp_6_LLC [Bcmp_6] 1.98GB/s ± 9% 2.18GB/s ± 8% +10.34% (p=0.000 n=30+30)
BM_LIBC_Bcmp_6_Cold [Bcmp_6] 754MB/s ± 5% 752MB/s ± 5% ~ (p=0.592 n=30+30)
BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.72GB/s ± 5% 1.72GB/s ± 6% ~ (p=0.549 n=29+29)
BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.61GB/s ± 7% 1.63GB/s ± 8% ~ (p=0.191 n=30+29)
BM_LIBC_Bcmp_7_LLC [Bcmp_7] 913MB/s ± 8% 905MB/s ± 9% ~ (p=0.423 n=30+30)
BM_LIBC_Bcmp_7_Cold [Bcmp_7] 304MB/s ± 6% 287MB/s ± 4% -5.57% (p=0.000 n=30+30)
BM_LIBC_Bcmp_8_L1 [Bcmp_8] 961MB/s ± 5% 1124MB/s ± 6% +16.94% (p=0.000 n=30+30)
BM_LIBC_Bcmp_8_L2 [Bcmp_8] 915MB/s ± 8% 1100MB/s ± 7% +20.16% (p=0.000 n=30+30)
BM_LIBC_Bcmp_8_LLC [Bcmp_8] 593MB/s ± 8% 669MB/s ± 8% +12.92% (p=0.000 n=30+30)
BM_LIBC_Bcmp_8_Cold [Bcmp_8] 220MB/s ± 4% 220MB/s ± 6% ~ (p=0.572 n=30+30)
```
Co-authored-by: goldvitaly@google.com <%username%@google.com>
|
|
Currently when `LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING` is set we
prefetch memory for read on the source buffer. This patch adds prefetch
for write on the destination buffer.
|
|
This is a part of #97655.
|
|
declaration" (#98593)
Reverts llvm/llvm-project#98075
bots are broken
|
|
This is a part of #97655.
|
|
Needed to support i386 (#93709).
|
|
This patch adds tests for `memcpy` and `memset` making sure that we
don't access buffers out of bounds. It relies on POSIX `mmap` /
`mprotect` and works only when FULL_BUILD_MODE is disabled.
The bug showed up while enabling software prefetching.
`loop_and_tail_offset` is always running at least one iteration but in
some configurations loop unrolled prefetching was actually needing only
the tail operation and no loop iterations at all.
|
|
Fixes #86546 and removes the macro `LIBC_HAS_BUILTIN`. This was
necessary to support older compilers that did not support
`__has_builtin`. All of the compilers we support already have this
builtin.
See: https://libc.llvm.org/compiler_support.html
All uses now use `__has_builtin` directly
cc @nickdesaulniers
|
|
Umbrella bug #83182
|
|
interfaces (#83921)
These templates are made to match the ergonomics of std::numeric_limits.
Because our style for constexpr variables is ALL_CAPS, we must silence the
linter for these manually.
Link:
https://clang.llvm.org/extra/clang-tidy/#suppressing-undesired-diagnostics
|
|
Fixes:
libc/src/string/memory_utils/utils.h:345:13: warning: invalid case style
for member 'offset_' [readability-identifier-naming]
Having a trailing underscore for members is a google3 style, not LLVM style.
Removing the underscore is insufficient, as we would then have 2 members with
the same identifier which is not allowed (it is a compile time error). Remove
the getter, and just access the renamed member that's now made public.
|
|
Found via:
$ ninja -k2000 libc-lint 2>&1 | grep readability-identifier-naming
Auto fixed via:
$ clang-tidy -p build/compile_commands.json \
-checks="-*,readability-identifier-naming" \
<filename> --fix
This doesn't fix all instances, just the obvious simple cases where it makes
sense to change the identifier names. Subsequent PRs will fix up the
stragglers.
|
|
My global find+replace was overzealous and broke post submit unit tests.
Link: #83345
|
|
Codify that we use lower_case for
readability-identifier-naming.ConstexprFunctionCase and then fix the 11
violations (rather than codify UPPER_CASE and have to fix the 170 violations).
|
|
Towards the goal of getting `ninja libc-lint` back to green, fix the numerous
instances of:
warning: header guard does not follow preferred style [llvm-header-guard]
This is because many of our header guards start with `__LLVM` rather than
`LLVM`.
To filter just these warnings:
$ ninja -k2000 libc-lint 2>&1 | grep llvm-header-guard
To automatically apply fixits:
$ find libc/src libc/include libc/test -name \*.h | \
xargs -n1 -I {} clang-tidy {} -p build/compile_commands.json \
-checks='-*,llvm-header-guard' --fix --quiet
Some manual cleanup is still necessary as headers that were missing header
guards outright will have them inserted before the license block (we prefer
them after).
|
|
|
|
This is less confusing since the implementation only cares about the 4
lower bits.
|
|
Fixes #77080.
|
|
|
|
Fixes the following from GCC:
llvm-project/libc/src/string/memory_utils/op_x86.h:236:24: error:
conversion from ‘long unsigned int’ to ‘uint32_t’ {aka ‘unsigned int’}
may
change value [-Werror=conversion]
236 | return (xored >> 32) | (xored & 0xFFFFFFFF);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
Link:
https://lab.llvm.org/buildbot/#/builders/250/builds/16236/steps/8/logs/stdio
Link: https://github.com/llvm/llvm-project/pull/74506
|
|
GCC reports an instance of -Warray-bounds in block_offset. Reimplement
block_offset in terms of memcpy_inline which was created to avoid this
diagnostic. See the linked issue for the full trace of diagnostic.
Fixes: https://github.com/llvm/llvm-project/issues/76877
|
|
Before this patch the compiler could generate unnecessary calls to the
selected implementation.
https://clang.llvm.org/docs/AttributeReference.html#flatten
|
|
instead (#73939) (#74446)
Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h`
that was still using `deferred_static_assert`.
|
|
instead" (#74444)
Reverts llvm/llvm-project#73939
This broke libc-aarch64-ubuntu build bot
https://lab.llvm.org/buildbot/#/builders/138/builds/56186
|
|
|
|
The [standard](https://eel.is/c++draft/expr.add#4.3) forbids forming
pointers to invalid objects even if the pointer is never read from or
written to. This patch makes sure that we don't do pointer arithmetic on
invalid pointers.
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
|
|
Software prefetching helps recover performance when hardware prefetching
is disabled. The 'LIBC_COPT_MEMSET_X86_USE_SOFTWARE_PREFETCHING' compile
time option allows users to use this patch.
|
|
Adjust boundary conditions for sizes = 16/32/64.
See the added comment for explanations.
Results on a machine with AVX2, so sizes 64/128 affected:
```
│ baseline │ adjusted │
│ sec/op │ sec/op vs base │
memcpy/Google_A 5.701n ± 0% 5.551n ± 1% -2.63% (n=100)
memcpy/Google_B 3.817n ± 0% 3.776n ± 0% -1.07% (p=0.000 n=100)
memcpy/Google_D 11.35n ± 1% 11.32n ± 0% ~ (p=0.066 n=100)
memcpy/Google_U 3.874n ± 1% 3.821n ± 1% -1.37% (p=0.001 n=100)
memcpy/64 3.843n ± 0% 3.105n ± 3% -19.22% (n=50)
memcpy/128 4.842n ± 0% 3.818n ± 0% -21.15% (p=0.000 n=50)
```
|
|
Fix #64758 `load64_aligned` was missing a case for `alignment == 6`.
|
|
1. Remove is_disjoint check for smaller sizes and reduce code bloat.
inline_memmove may handle some small sizes as efficiently
as inline_memcpy. For these sizes we may not do is_disjoint check.
This both avoids additional code for the most frequent smaller sizes
and removes code bloat (we don't need the memcpy logic for small sizes).
Here we heavily rely on inlining and dead code elimination: from the
first
inline_memmove we should get only handling of small sizes, and from
the second inline_memmove and inline_memcpy we should get only handling
of larger sizes.
2. Use the memcpy thresholds for memmove.
Memcpy thresholds were more carefully tuned.
This becomes more important since we use memmove
for all small sizes always now.
3. Fix boundary conditions for sizes = 16/32/64.
See the added comment for explanations.
Memmove function size drops from 885 to 715 bytes
due to removed duplication.
```
│ baseline │ small-size │
│ sec/op │ sec/op vs base │
memmove/Google_A 3.208n ± 0% 2.911n ± 0% -9.25% (n=100)
memmove/Google_B 4.113n ± 1% 3.428n ± 0% -16.65% (n=100)
memmove/Google_D 5.838n ± 0% 4.158n ± 0% -28.78% (n=100)
memmove/Google_S 4.712n ± 1% 3.899n ± 0% -17.25% (n=100)
memmove/Google_U 3.609n ± 0% 3.247n ± 1% -10.02% (n=100)
memmove/0 2.982n ± 0% 2.169n ± 0% -27.26% (n=50)
memmove/1 3.253n ± 0% 2.168n ± 0% -33.34% (n=50)
memmove/2 3.255n ± 0% 2.169n ± 0% -33.38% (n=50)
memmove/3 3.259n ± 2% 2.175n ± 0% -33.27% (p=0.000 n=50)
memmove/4 3.259n ± 0% 2.168n ± 5% -33.46% (p=0.000 n=50)
memmove/5 2.488n ± 0% 1.926n ± 0% -22.57% (p=0.000 n=50)
memmove/6 2.490n ± 0% 1.928n ± 0% -22.59% (p=0.000 n=50)
memmove/7 2.492n ± 0% 1.927n ± 0% -22.65% (p=0.000 n=50)
memmove/8 2.737n ± 0% 2.711n ± 0% -0.97% (p=0.000 n=50)
memmove/9 2.736n ± 0% 2.711n ± 0% -0.94% (p=0.000 n=50)
memmove/10 2.739n ± 0% 2.711n ± 0% -1.04% (p=0.000 n=50)
memmove/11 2.740n ± 0% 2.711n ± 0% -1.07% (p=0.000 n=50)
memmove/12 2.740n ± 0% 2.711n ± 0% -1.09% (p=0.000 n=50)
memmove/13 2.744n ± 0% 2.711n ± 0% -1.22% (p=0.000 n=50)
memmove/14 2.742n ± 0% 2.711n ± 0% -1.14% (p=0.000 n=50)
memmove/15 2.742n ± 0% 2.711n ± 0% -1.15% (p=0.000 n=50)
memmove/16 2.997n ± 0% 2.981n ± 0% -0.52% (p=0.000 n=50)
memmove/17 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50)
memmove/18 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50)
memmove/19 2.999n ± 0% 2.982n ± 0% -0.59% (p=0.000 n=50)
memmove/20 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50)
memmove/21 3.000n ± 0% 2.981n ± 0% -0.61% (p=0.000 n=50)
memmove/22 3.002n ± 0% 2.981n ± 0% -0.68% (p=0.000 n=50)
memmove/23 3.002n ± 0% 2.981n ± 0% -0.67% (p=0.000 n=50)
memmove/24 3.002n ± 0% 2.981n ± 0% -0.70% (n=50)
memmove/25 3.002n ± 0% 2.981n ± 0% -0.68% (p=0.000 n=50)
memmove/26 3.004n ± 0% 2.982n ± 0% -0.74% (p=0.000 n=50)
memmove/27 3.005n ± 0% 2.981n ± 0% -0.79% (n=50)
memmove/28 3.005n ± 0% 2.982n ± 0% -0.77% (n=50)
memmove/29 3.009n ± 0% 2.981n ± 0% -0.92% (n=50)
memmove/30 3.008n ± 0% 2.981n ± 0% -0.89% (n=50)
memmove/31 3.007n ± 0% 2.982n ± 0% -0.86% (n=50)
memmove/32 3.540n ± 0% 2.998n ± 0% -15.31% (p=0.000 n=50)
memmove/33 3.544n ± 0% 2.997n ± 0% -15.44% (p=0.000 n=50)
memmove/34 3.546n ± 0% 2.999n ± 0% -15.42% (n=50)
memmove/35 3.545n ± 0% 2.999n ± 0% -15.40% (n=50)
memmove/36 3.548n ± 0% 2.998n ± 0% -15.52% (p=0.000 n=50)
memmove/37 3.546n ± 0% 3.000n ± 0% -15.41% (n=50)
memmove/38 3.549n ± 0% 2.999n ± 0% -15.49% (p=0.000 n=50)
memmove/39 3.549n ± 0% 2.999n ± 0% -15.48% (p=0.000 n=50)
memmove/40 3.549n ± 0% 3.000n ± 0% -15.46% (p=0.000 n=50)
memmove/41 3.550n ± 0% 3.001n ± 0% -15.47% (n=50)
memmove/42 3.549n ± 0% 3.001n ± 0% -15.43% (n=50)
memmove/43 3.552n ± 0% 3.001n ± 0% -15.52% (p=0.000 n=50)
memmove/44 3.552n ± 0% 3.001n ± 0% -15.51% (n=50)
memmove/45 3.552n ± 0% 3.002n ± 0% -15.48% (n=50)
memmove/46 3.554n ± 0% 3.001n ± 0% -15.55% (p=0.000 n=50)
memmove/47 3.556n ± 0% 3.002n ± 0% -15.58% (p=0.000 n=50)
memmove/48 3.555n ± 0% 3.003n ± 0% -15.54% (n=50)
memmove/49 3.557n ± 0% 3.002n ± 0% -15.59% (p=0.000 n=50)
memmove/50 3.557n ± 0% 3.004n ± 0% -15.55% (p=0.000 n=50)
memmove/51 3.556n ± 0% 3.004n ± 0% -15.53% (p=0.000 n=50)
memmove/52 3.561n ± 0% 3.004n ± 0% -15.65% (p=0.000 n=50)
memmove/53 3.558n ± 0% 3.004n ± 0% -15.57% (p=0.000 n=50)
memmove/54 3.561n ± 0% 3.005n ± 0% -15.62% (n=50)
memmove/55 3.560n ± 0% 3.006n ± 0% -15.57% (n=50)
memmove/56 3.562n ± 0% 3.006n ± 0% -15.60% (p=0.000 n=50)
memmove/57 3.563n ± 0% 3.006n ± 0% -15.64% (n=50)
memmove/58 3.565n ± 0% 3.007n ± 0% -15.64% (p=0.000 n=50)
memmove/59 3.564n ± 0% 3.006n ± 0% -15.66% (p=0.000 n=50)
memmove/60 3.570n ± 0% 3.008n ± 0% -15.74% (p=0.000 n=50)
memmove/61 3.566n ± 0% 3.009n ± 0% -15.63% (p=0.000 n=50)
memmove/62 3.567n ± 0% 3.007n ± 0% -15.70% (p=0.000 n=50)
memmove/63 3.568n ± 0% 3.008n ± 0% -15.71% (p=0.000 n=50)
memmove/64 4.104n ± 0% 3.008n ± 0% -26.70% (p=0.000 n=50)
memmove/65 4.126n ± 0% 3.662n ± 0% -11.26% (p=0.000 n=50)
memmove/66 4.128n ± 0% 3.662n ± 0% -11.29% (n=50)
memmove/67 4.129n ± 0% 3.662n ± 0% -11.31% (n=50)
memmove/68 4.129n ± 0% 3.661n ± 0% -11.33% (p=0.000 n=50)
memmove/69 4.130n ± 0% 3.662n ± 0% -11.34% (p=0.000 n=50)
memmove/70 4.130n ± 0% 3.662n ± 0% -11.33% (n=50)
memmove/71 4.132n ± 0% 3.662n ± 0% -11.38% (p=0.000 n=50)
memmove/72 4.131n ± 0% 3.661n ± 0% -11.39% (n=50)
memmove/73 4.135n ± 0% 3.661n ± 0% -11.45% (p=0.000 n=50)
memmove/74 4.137n ± 0% 3.662n ± 0% -11.49% (n=50)
memmove/75 4.138n ± 0% 3.662n ± 0% -11.51% (p=0.000 n=50)
memmove/76 4.139n ± 0% 3.661n ± 0% -11.56% (p=0.000 n=50)
memmove/77 4.136n ± 0% 3.662n ± 0% -11.47% (p=0.000 n=50)
memmove/78 4.143n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50)
memmove/79 4.142n ± 0% 3.661n ± 0% -11.60% (n=50)
memmove/80 4.142n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50)
memmove/81 4.140n ± 0% 3.661n ± 0% -11.57% (n=50)
memmove/82 4.146n ± 0% 3.661n ± 0% -11.69% (n=50)
memmove/83 4.143n ± 0% 3.661n ± 0% -11.63% (p=0.000 n=50)
memmove/84 4.143n ± 0% 3.661n ± 0% -11.63% (n=50)
memmove/85 4.147n ± 0% 3.661n ± 0% -11.73% (p=0.000 n=50)
memmove/86 4.142n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50)
memmove/87 4.147n ± 0% 3.661n ± 0% -11.72% (p=0.000 n=50)
memmove/88 4.148n ± 0% 3.661n ± 0% -11.74% (n=50)
memmove/89 4.152n ± 0% 3.661n ± 0% -11.84% (n=50)
memmove/90 4.151n ± 0% 3.661n ± 0% -11.81% (n=50)
memmove/91 4.150n ± 0% 3.661n ± 0% -11.78% (n=50)
memmove/92 4.153n ± 0% 3.661n ± 0% -11.86% (n=50)
memmove/93 4.158n ± 0% 3.661n ± 0% -11.95% (n=50)
memmove/94 4.157n ± 0% 3.661n ± 0% -11.95% (p=0.000 n=50)
memmove/95 4.155n ± 0% 3.661n ± 0% -11.90% (p=0.000 n=50)
memmove/96 4.149n ± 0% 3.660n ± 0% -11.79% (n=50)
memmove/97 4.157n ± 0% 3.661n ± 0% -11.94% (n=50)
memmove/98 4.157n ± 0% 3.661n ± 0% -11.94% (n=50)
memmove/99 4.168n ± 0% 3.661n ± 0% -12.17% (p=0.000 n=50)
memmove/100 4.159n ± 0% 3.660n ± 0% -12.00% (p=0.000 n=50)
memmove/101 4.161n ± 0% 3.660n ± 0% -12.03% (p=0.000 n=50)
memmove/102 4.165n ± 0% 3.660n ± 0% -12.12% (p=0.000 n=50)
memmove/103 4.164n ± 0% 3.661n ± 0% -12.08% (n=50)
memmove/104 4.164n ± 0% 3.660n ± 0% -12.11% (n=50)
memmove/105 4.165n ± 0% 3.660n ± 0% -12.12% (p=0.000 n=50)
memmove/106 4.166n ± 0% 3.660n ± 0% -12.15% (n=50)
memmove/107 4.171n ± 0% 3.660n ± 1% -12.26% (p=0.000 n=50)
memmove/108 4.173n ± 0% 3.660n ± 0% -12.30% (p=0.000 n=50)
memmove/109 4.170n ± 0% 3.660n ± 0% -12.24% (n=50)
memmove/110 4.174n ± 0% 3.660n ± 0% -12.31% (n=50)
memmove/111 4.176n ± 0% 3.660n ± 0% -12.35% (p=0.000 n=50)
memmove/112 4.174n ± 0% 3.659n ± 0% -12.34% (p=0.000 n=50)
memmove/113 4.176n ± 0% 3.660n ± 0% -12.35% (n=50)
memmove/114 4.182n ± 0% 3.660n ± 0% -12.49% (n=50)
memmove/115 4.185n ± 0% 3.660n ± 0% -12.55% (n=50)
memmove/116 4.184n ± 0% 3.659n ± 0% -12.54% (n=50)
memmove/117 4.182n ± 0% 3.660n ± 0% -12.50% (n=50)
memmove/118 4.188n ± 0% 3.660n ± 0% -12.61% (n=50)
memmove/119 4.186n ± 0% 3.660n ± 0% -12.57% (p=0.000 n=50)
memmove/120 4.189n ± 0% 3.659n ± 0% -12.63% (n=50)
memmove/121 4.187n ± 0% 3.660n ± 0% -12.60% (n=50)
memmove/122 4.186n ± 0% 3.660n ± 0% -12.58% (n=50)
memmove/123 4.187n ± 0% 3.660n ± 0% -12.60% (n=50)
memmove/124 4.189n ± 0% 3.659n ± 0% -12.65% (n=50)
memmove/125 4.195n ± 0% 3.659n ± 0% -12.78% (n=50)
memmove/126 4.197n ± 0% 3.659n ± 0% -12.81% (n=50)
memmove/127 4.194n ± 0% 3.659n ± 0% -12.75% (n=50)
memmove/128 5.035n ± 0% 3.659n ± 0% -27.32% (n=50)
memmove/129 5.127n ± 0% 5.164n ± 0% +0.73% (p=0.000 n=50)
memmove/130 5.130n ± 0% 5.176n ± 0% +0.88% (p=0.000 n=50)
memmove/131 5.127n ± 0% 5.180n ± 0% +1.05% (p=0.000 n=50)
memmove/132 5.131n ± 0% 5.169n ± 0% +0.75% (p=0.000 n=50)
memmove/133 5.137n ± 0% 5.179n ± 0% +0.81% (p=0.000 n=50)
memmove/134 5.140n ± 0% 5.178n ± 0% +0.74% (p=0.000 n=50)
memmove/135 5.141n ± 0% 5.187n ± 0% +0.88% (p=0.000 n=50)
memmove/136 5.133n ± 0% 5.184n ± 0% +0.99% (p=0.000 n=50)
memmove/137 5.148n ± 0% 5.186n ± 0% +0.73% (p=0.000 n=50)
memmove/138 5.143n ± 0% 5.189n ± 0% +0.88% (p=0.000 n=50)
memmove/139 5.142n ± 0% 5.192n ± 0% +0.97% (p=0.000 n=50)
memmove/140 5.141n ± 0% 5.192n ± 0% +1.01% (p=0.000 n=50)
memmove/141 5.155n ± 0% 5.188n ± 0% +0.64% (p=0.000 n=50)
memmove/142 5.146n ± 0% 5.192n ± 0% +0.90% (p=0.000 n=50)
memmove/143 5.142n ± 0% 5.203n ± 0% +1.19% (p=0.000 n=50)
memmove/144 5.146n ± 0% 5.197n ± 0% +0.99% (p=0.000 n=50)
memmove/145 5.146n ± 0% 5.196n ± 0% +0.97% (p=0.000 n=50)
memmove/146 5.151n ± 0% 5.207n ± 0% +1.10% (p=0.000 n=50)
memmove/147 5.151n ± 0% 5.205n ± 0% +1.06% (p=0.000 n=50)
memmove/148 5.156n ± 0% 5.190n ± 0% +0.66% (p=0.000 n=50)
memmove/149 5.158n ± 0% 5.212n ± 0% +1.04% (p=0.000 n=50)
memmove/150 5.160n ± 0% 5.203n ± 0% +0.84% (p=0.000 n=50)
memmove/151 5.167n ± 0% 5.210n ± 0% +0.83% (p=0.000 n=50)
memmove/152 5.157n ± 0% 5.206n ± 0% +0.94% (p=0.000 n=50)
memmove/153 5.170n ± 0% 5.211n ± 0% +0.80% (p=0.000 n=50)
memmove/154 5.169n ± 0% 5.222n ± 0% +1.02% (p=0.000 n=50)
memmove/155 5.171n ± 0% 5.215n ± 0% +0.87% (p=0.000 n=50)
memmove/156 5.174n ± 0% 5.214n ± 0% +0.78% (p=0.000 n=50)
memmove/157 5.171n ± 0% 5.218n ± 0% +0.92% (p=0.000 n=50)
memmove/158 5.168n ± 0% 5.224n ± 0% +1.09% (p=0.000 n=50)
memmove/159 5.179n ± 0% 5.218n ± 0% +0.76% (p=0.000 n=50)
memmove/160 5.170n ± 0% 5.219n ± 0% +0.95% (p=0.000 n=50)
memmove/161 5.187n ± 0% 5.220n ± 0% +0.64% (p=0.000 n=50)
memmove/162 5.189n ± 0% 5.234n ± 0% +0.86% (p=0.000 n=50)
memmove/163 5.199n ± 0% 5.250n ± 0% +0.99% (p=0.000 n=50)
memmove/164 5.205n ± 0% 5.260n ± 0% +1.04% (p=0.000 n=50)
memmove/165 5.208n ± 0% 5.261n ± 0% +1.01% (p=0.000 n=50)
memmove/166 5.227n ± 0% 5.275n ± 0% +0.91% (p=0.000 n=50)
memmove/167 5.233n ± 0% 5.281n ± 0% +0.92% (p=0.000 n=50)
memmove/168 5.236n ± 0% 5.295n ± 0% +1.12% (p=0.000 n=50)
memmove/169 5.256n ± 0% 5.297n ± 0% +0.79% (p=0.000 n=50)
memmove/170 5.259n ± 0% 5.302n ± 0% +0.80% (p=0.000 n=50)
memmove/171 5.269n ± 0% 5.321n ± 0% +0.97% (p=0.000 n=50)
memmove/172 5.266n ± 0% 5.318n ± 0% +0.98% (p=0.000 n=50)
memmove/173 5.272n ± 0% 5.330n ± 0% +1.09% (p=0.000 n=50)
memmove/174 5.284n ± 0% 5.331n ± 0% +0.89% (p=0.000 n=50)
memmove/175 5.284n ± 0% 5.322n ± 0% +0.72% (p=0.000 n=50)
memmove/176 5.298n ± 0% 5.337n ± 0% +0.74% (p=0.000 n=50)
memmove/177 5.282n ± 0% 5.338n ± 0% +1.04% (p=0.000 n=50)
memmove/178 5.299n ± 0% 5.337n ± 0% +0.71% (p=0.000 n=50)
memmove/179 5.296n ± 0% 5.343n ± 0% +0.88% (p=0.000 n=50)
memmove/180 5.292n ± 0% 5.343n ± 0% +0.97% (p=0.000 n=50)
memmove/181 5.303n ± 0% 5.335n ± 0% +0.60% (p=0.000 n=50)
memmove/182 5.305n ± 0% 5.338n ± 0% +0.62% (p=0.000 n=50)
memmove/183 5.298n ± 0% 5.329n ± 0% +0.59% (p=0.000 n=50)
memmove/184 5.299n ± 0% 5.333n ± 0% +0.64% (p=0.000 n=50)
memmove/185 5.291n ± 0% 5.330n ± 0% +0.73% (p=0.000 n=50)
memmove/186 5.296n ± 0% 5.332n ± 0% +0.68% (p=0.000 n=50)
memmove/187 5.297n ± 0% 5.320n ± 0% +0.44% (p=0.000 n=50)
memmove/188 5.286n ± 0% 5.314n ± 0% +0.53% (p=0.000 n=50)
memmove/189 5.293n ± 0% 5.318n ± 0% +0.46% (p=0.000 n=50)
memmove/190 5.294n ± 0% 5.318n ± 0% +0.45% (p=0.000 n=50)
memmove/191 5.292n ± 0% 5.314n ± 0% +0.40% (p=0.032 n=50)
memmove/192 5.272n ± 0% 5.304n ± 0% +0.60% (p=0.000 n=50)
memmove/193 5.279n ± 0% 5.310n ± 0% +0.57% (p=0.000 n=50)
memmove/194 5.294n ± 0% 5.308n ± 0% +0.26% (p=0.018 n=50)
memmove/195 5.302n ± 0% 5.311n ± 0% +0.18% (p=0.010 n=50)
memmove/196 5.301n ± 0% 5.316n ± 0% +0.28% (p=0.023 n=50)
memmove/197 5.302n ± 0% 5.327n ± 0% +0.47% (p=0.000 n=50)
memmove/198 5.310n ± 0% 5.326n ± 0% +0.30% (p=0.003 n=50)
memmove/199 5.303n ± 0% 5.319n ± 0% +0.30% (p=0.009 n=50)
memmove/200 5.312n ± 0% 5.330n ± 0% +0.35% (p=0.001 n=50)
memmove/201 5.307n ± 0% 5.333n ± 0% +0.50% (p=0.000 n=50)
memmove/202 5.311n ± 0% 5.334n ± 0% +0.44% (p=0.000 n=50)
memmove/203 5.313n ± 0% 5.335n ± 0% +0.41% (p=0.006 n=50)
memmove/204 5.312n ± 0% 5.332n ± 0% +0.36% (p=0.002 n=50)
memmove/205 5.318n ± 0% 5.345n ± 0% +0.50% (p=0.000 n=50)
memmove/206 5.311n ± 0% 5.333n ± 0% +0.42% (p=0.002 n=50)
memmove/207 5.310n ± 0% 5.338n ± 0% +0.52% (p=0.000 n=50)
memmove/208 5.319n ± 0% 5.341n ± 0% +0.40% (p=0.004 n=50)
memmove/209 5.330n ± 0% 5.346n ± 0% +0.30% (p=0.004 n=50)
memmove/210 5.329n ± 0% 5.349n ± 0% +0.38% (p=0.002 n=50)
memmove/211 5.318n ± 0% 5.340n ± 0% +0.41% (p=0.000 n=50)
memmove/212 5.339n ± 0% 5.343n ± 0% ~ (p=0.396 n=50)
memmove/213 5.329n ± 0% 5.343n ± 0% +0.25% (p=0.017 n=50)
memmove/214 5.339n ± 0% 5.358n ± 0% +0.35% (p=0.035 n=50)
memmove/215 5.342n ± 0% 5.346n ± 0% ~ (p=0.063 n=50)
memmove/216 5.338n ± 0% 5.359n ± 0% +0.39% (p=0.002 n=50)
memmove/217 5.341n ± 0% 5.362n ± 0% +0.39% (p=0.015 n=50)
memmove/218 5.354n ± 0% 5.373n ± 0% +0.36% (p=0.041 n=50)
memmove/219 5.352n ± 0% 5.362n ± 0% ~ (p=0.143 n=50)
memmove/220 5.344n ± 0% 5.370n ± 0% +0.50% (p=0.001 n=50)
memmove/221 5.345n ± 0% 5.373n ± 0% +0.53% (p=0.000 n=50)
memmove/222 5.348n ± 0% 5.360n ± 0% +0.23% (p=0.014 n=50)
memmove/223 5.354n ± 0% 5.377n ± 0% +0.43% (p=0.024 n=50)
memmove/224 5.352n ± 0% 5.363n ± 0% ~ (p=0.052 n=50)
memmove/225 5.372n ± 0% 5.380n ± 0% ~ (p=0.481 n=50)
memmove/226 5.368n ± 0% 5.386n ± 0% +0.34% (p=0.004 n=50)
memmove/227 5.386n ± 0% 5.402n ± 0% +0.29% (p=0.028 n=50)
memmove/228 5.400n ± 0% 5.408n ± 0% ~ (p=0.174 n=50)
memmove/229 5.423n ± 0% 5.427n ± 0% ~ (p=0.444 n=50)
memmove/230 5.411n ± 0% 5.429n ± 0% +0.33% (p=0.020 n=50)
memmove/231 5.420n ± 0% 5.433n ± 0% +0.24% (p=0.034 n=50)
memmove/232 5.435n ± 0% 5.441n ± 0% ~ (p=0.235 n=50)
memmove/233 5.446n ± 0% 5.462n ± 0% ~ (p=0.590 n=50)
memmove/234 5.467n ± 0% 5.461n ± 0% ~ (p=0.921 n=50)
memmove/235 5.472n ± 0% 5.478n ± 0% ~ (p=0.883 n=50)
memmove/236 5.466n ± 0% 5.478n ± 0% ~ (p=0.324 n=50)
memmove/237 5.471n ± 0% 5.489n ± 0% ~ (p=0.132 n=50)
memmove/238 5.485n ± 0% 5.489n ± 0% ~ (p=0.460 n=50)
memmove/239 5.484n ± 0% 5.488n ± 0% ~ (p=0.833 n=50)
memmove/240 5.483n ± 0% 5.495n ± 0% ~ (p=0.095 n=50)
memmove/241 5.498n ± 0% 5.514n ± 0% ~ (p=0.077 n=50)
memmove/242 5.518n ± 0% 5.517n ± 0% ~ (p=0.481 n=50)
memmove/243 5.514n ± 0% 5.511n ± 0% ~ (p=0.503 n=50)
memmove/244 5.510n ± 0% 5.497n ± 0% -0.24% (p=0.038 n=50)
memmove/245 5.516n ± 0% 5.505n ± 0% ~ (p=0.317 n=50)
memmove/246 5.513n ± 1% 5.494n ± 0% ~ (p=0.147 n=50)
memmove/247 5.518n ± 0% 5.499n ± 0% -0.36% (p=0.011 n=50)
memmove/248 5.503n ± 0% 5.492n ± 0% ~ (p=0.267 n=50)
memmove/249 5.498n ± 0% 5.497n ± 0% ~ (p=0.765 n=50)
memmove/250 5.485n ± 0% 5.493n ± 0% ~ (p=0.348 n=50)
memmove/251 5.503n ± 0% 5.482n ± 0% -0.37% (p=0.013 n=50)
memmove/252 5.497n ± 0% 5.485n ± 0% ~ (p=0.077 n=50)
memmove/253 5.489n ± 0% 5.496n ± 0% ~ (p=0.850 n=50)
memmove/254 5.497n ± 0% 5.491n ± 0% ~ (p=0.548 n=50)
memmove/255 5.484n ± 1% 5.494n ± 0% ~ (p=0.888 n=50)
memmove/256 6.952n ± 0% 7.676n ± 0% +10.41% (p=0.000 n=50)
geomean 4.406n 4.127n -6.33%
```
|
|
Use a check that requries fewer instructions and cheaper.
Current code:
```
1b704: 48 39 f7 cmp %rsi,%rdi
1b707: 48 89 f0 mov %rsi,%rax
1b70a: 48 0f 47 c7 cmova %rdi,%rax
1b70e: 48 89 f9 mov %rdi,%rcx
1b711: 48 0f 47 ce cmova %rsi,%rcx
1b715: 48 01 d1 add %rdx,%rcx
1b718: 48 39 c1 cmp %rax,%rcx
```
New code:
```
1b704: 48 89 f8 mov %rdi,%rax
1b707: 48 29 f0 sub %rsi,%rax
1b70a: 48 89 c1 mov %rax,%rcx
1b70d: 48 f7 d9 neg %rcx
1b710: 48 0f 48 c8 cmovs %rax,%rcx
1b714: 48 39 d1 cmp %rdx,%rcx
```
```
│ baseline │ disjoint │
│ sec/op │ sec/op vs base │
memmove/Google_A 3.910n ± 0% 3.861n ± 1% -1.26% (p=0.000 n=50)
```
```
│ baseline │ disjoint │
│ sec/op │ sec/op vs base │
memmove/1 2.724n ± 3% 2.441n ± 0% -10.37% (n=50)
memmove/2 2.878n ± 0% 2.713n ± 0% -5.73% (n=50)
memmove/3 2.835n ± 0% 2.593n ± 0% -8.54% (n=50)
memmove/4 3.032n ± 0% 2.776n ± 0% -8.45% (p=0.000 n=50)
memmove/5 2.833n ± 0% 2.600n ± 0% -8.20% (p=0.000 n=50)
memmove/6 2.758n ± 0% 2.744n ± 0% -0.52% (p=0.000 n=50)
memmove/7 2.762n ± 0% 2.744n ± 0% -0.63% (p=0.000 n=50)
memmove/8 2.763n ± 0% 2.750n ± 0% -0.46% (p=0.000 n=50)
memmove/9 3.182n ± 0% 3.269n ± 0% +2.75% (p=0.000 n=50)
memmove/10 3.185n ± 0% 3.270n ± 0% +2.64% (p=0.000 n=50)
memmove/11 3.188n ± 0% 3.277n ± 0% +2.79% (p=0.000 n=50)
memmove/12 3.190n ± 0% 3.279n ± 0% +2.82% (p=0.000 n=50)
memmove/13 3.194n ± 0% 3.281n ± 0% +2.73% (p=0.000 n=50)
memmove/14 3.197n ± 0% 3.285n ± 0% +2.77% (p=0.000 n=50)
memmove/15 3.198n ± 0% 3.282n ± 0% +2.62% (p=0.000 n=50)
memmove/16 3.201n ± 0% 3.284n ± 0% +2.61% (p=0.000 n=50)
memmove/17 3.564n ± 0% 3.320n ± 0% -6.86% (p=0.000 n=50)
memmove/18 3.572n ± 0% 3.313n ± 0% -7.25% (p=0.000 n=50)
memmove/19 3.572n ± 0% 3.325n ± 0% -6.94% (p=0.000 n=50)
memmove/20 3.575n ± 0% 3.319n ± 0% -7.15% (p=0.000 n=50)
memmove/21 3.578n ± 0% 3.327n ± 0% -7.03% (p=0.000 n=50)
memmove/22 3.581n ± 0% 3.330n ± 0% -7.01% (p=0.000 n=50)
memmove/23 3.582n ± 0% 3.354n ± 1% -6.37% (p=0.000 n=50)
memmove/24 3.587n ± 0% 3.347n ± 1% -6.71% (p=0.000 n=50)
memmove/25 3.591n ± 0% 3.320n ± 0% -7.55% (p=0.000 n=50)
memmove/26 3.593n ± 0% 3.348n ± 0% -6.82% (p=0.000 n=50)
memmove/27 3.596n ± 0% 3.346n ± 0% -6.94% (p=0.000 n=50)
memmove/28 3.597n ± 0% 3.357n ± 0% -6.67% (p=0.000 n=50)
memmove/29 3.601n ± 0% 3.340n ± 0% -7.23% (p=0.000 n=50)
memmove/30 3.602n ± 0% 3.345n ± 0% -7.12% (p=0.000 n=50)
memmove/31 3.608n ± 0% 3.357n ± 0% -6.94% (p=0.000 n=50)
memmove/32 3.605n ± 0% 3.352n ± 0% -7.01% (p=0.000 n=50)
memmove/33 4.128n ± 1% 3.829n ± 0% -7.23% (p=0.000 n=50)
memmove/34 4.149n ± 0% 3.836n ± 0% -7.54% (p=0.000 n=50)
memmove/35 4.134n ± 0% 3.839n ± 0% -7.15% (n=50)
memmove/36 4.151n ± 0% 3.842n ± 0% -7.45% (n=50)
memmove/37 4.152n ± 0% 3.841n ± 0% -7.49% (p=0.000 n=50)
memmove/38 4.159n ± 0% 3.844n ± 0% -7.58% (p=0.000 n=50)
memmove/39 4.165n ± 0% 3.841n ± 0% -7.78% (p=0.000 n=50)
memmove/40 4.162n ± 0% 3.837n ± 0% -7.81% (p=0.000 n=50)
memmove/41 4.161n ± 0% 3.845n ± 0% -7.58% (p=0.000 n=50)
memmove/42 4.164n ± 0% 3.851n ± 0% -7.53% (p=0.000 n=50)
memmove/43 4.165n ± 0% 3.843n ± 0% -7.74% (p=0.000 n=50)
memmove/44 4.175n ± 0% 3.847n ± 0% -7.83% (p=0.000 n=50)
memmove/45 4.170n ± 0% 3.849n ± 0% -7.70% (p=0.000 n=50)
memmove/46 4.175n ± 0% 3.850n ± 0% -7.79% (p=0.000 n=50)
memmove/47 4.180n ± 0% 3.851n ± 0% -7.87% (p=0.000 n=50)
memmove/48 4.178n ± 0% 3.852n ± 0% -7.81% (p=0.000 n=50)
memmove/49 4.175n ± 0% 3.851n ± 0% -7.76% (n=50)
memmove/50 4.178n ± 0% 3.855n ± 0% -7.73% (p=0.000 n=50)
memmove/51 4.190n ± 0% 3.859n ± 0% -7.91% (p=0.000 n=50)
memmove/52 4.188n ± 0% 3.859n ± 0% -7.84% (p=0.000 n=50)
memmove/53 4.191n ± 0% 3.863n ± 0% -7.82% (p=0.000 n=50)
memmove/54 4.192n ± 0% 3.860n ± 0% -7.91% (p=0.000 n=50)
memmove/55 4.192n ± 0% 3.869n ± 0% -7.70% (p=0.000 n=50)
memmove/56 4.204n ± 0% 3.866n ± 0% -8.05% (p=0.000 n=50)
memmove/57 4.198n ± 0% 3.864n ± 0% -7.95% (p=0.000 n=50)
memmove/58 4.202n ± 0% 3.865n ± 0% -8.02% (p=0.000 n=50)
memmove/59 4.208n ± 0% 3.868n ± 0% -8.09% (p=0.000 n=50)
memmove/60 4.205n ± 0% 3.873n ± 0% -7.89% (p=0.000 n=50)
memmove/61 4.212n ± 0% 3.872n ± 0% -8.08% (p=0.000 n=50)
memmove/62 4.214n ± 0% 3.870n ± 0% -8.16% (p=0.000 n=50)
memmove/63 4.215n ± 0% 3.877n ± 0% -8.02% (p=0.000 n=50)
memmove/64 4.217n ± 0% 3.881n ± 0% -7.99% (p=0.000 n=50)
memmove/65 4.990n ± 0% 4.683n ± 0% -6.15% (p=0.000 n=50)
memmove/66 5.022n ± 0% 4.719n ± 0% -6.03% (p=0.000 n=50)
memmove/67 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50)
memmove/68 5.035n ± 0% 4.724n ± 0% -6.18% (p=0.000 n=50)
memmove/69 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50)
memmove/70 5.040n ± 0% 4.728n ± 0% -6.19% (p=0.000 n=50)
memmove/71 5.053n ± 0% 4.728n ± 0% -6.43% (p=0.000 n=50)
memmove/72 5.050n ± 0% 4.732n ± 0% -6.29% (p=0.000 n=50)
memmove/73 5.049n ± 0% 4.733n ± 0% -6.24% (p=0.000 n=50)
memmove/74 5.054n ± 0% 4.734n ± 0% -6.34% (p=0.000 n=50)
memmove/75 5.063n ± 0% 4.736n ± 0% -6.46% (p=0.000 n=50)
memmove/76 5.046n ± 0% 4.741n ± 0% -6.04% (p=0.000 n=50)
memmove/77 5.057n ± 0% 4.741n ± 0% -6.25% (p=0.000 n=50)
memmove/78 5.077n ± 0% 4.739n ± 0% -6.65% (p=0.000 n=50)
memmove/79 5.074n ± 0% 4.746n ± 0% -6.46% (p=0.000 n=50)
memmove/80 5.085n ± 0% 4.747n ± 0% -6.65% (p=0.000 n=50)
memmove/81 5.077n ± 0% 4.735n ± 0% -6.74% (p=0.000 n=50)
memmove/82 5.087n ± 0% 4.747n ± 0% -6.68% (p=0.000 n=50)
memmove/83 5.087n ± 0% 4.754n ± 0% -6.56% (p=0.000 n=50)
memmove/84 5.096n ± 0% 4.753n ± 0% -6.73% (p=0.000 n=50)
memmove/85 5.082n ± 0% 4.749n ± 0% -6.55% (p=0.000 n=50)
memmove/86 5.103n ± 0% 4.752n ± 0% -6.87% (p=0.000 n=50)
memmove/87 5.096n ± 0% 4.760n ± 0% -6.61% (p=0.000 n=50)
memmove/88 5.099n ± 0% 4.765n ± 0% -6.55% (p=0.000 n=50)
memmove/89 5.104n ± 0% 4.757n ± 0% -6.79% (p=0.000 n=50)
memmove/90 5.117n ± 0% 4.767n ± 0% -6.84% (p=0.000 n=50)
memmove/91 5.100n ± 0% 4.766n ± 0% -6.54% (p=0.000 n=50)
memmove/92 5.103n ± 0% 4.763n ± 0% -6.67% (p=0.000 n=50)
memmove/93 5.115n ± 0% 4.772n ± 0% -6.71% (p=0.000 n=50)
memmove/94 5.117n ± 0% 4.769n ± 0% -6.80% (p=0.000 n=50)
memmove/95 5.131n ± 0% 4.775n ± 0% -6.94% (p=0.000 n=50)
memmove/96 5.129n ± 0% 4.772n ± 0% -6.97% (p=0.000 n=50)
memmove/97 5.130n ± 0% 4.764n ± 0% -7.13% (p=0.000 n=50)
memmove/98 5.134n ± 0% 4.780n ± 0% -6.89% (p=0.000 n=50)
memmove/99 5.141n ± 0% 4.780n ± 0% -7.03% (p=0.000 n=50)
memmove/100 5.141n ± 0% 4.780n ± 0% -7.02% (p=0.000 n=50)
memmove/101 5.150n ± 0% 4.782n ± 0% -7.14% (p=0.000 n=50)
memmove/102 5.150n ± 0% 4.790n ± 0% -6.99% (p=0.000 n=50)
memmove/103 5.156n ± 0% 4.788n ± 0% -7.14% (n=50)
memmove/104 5.157n ± 0% 4.793n ± 0% -7.05% (p=0.000 n=50)
memmove/105 5.147n ± 0% 4.791n ± 0% -6.90% (p=0.000 n=50)
memmove/106 5.167n ± 0% 4.793n ± 0% -7.23% (p=0.000 n=50)
memmove/107 5.165n ± 0% 4.801n ± 0% -7.06% (p=0.000 n=50)
memmove/108 5.173n ± 0% 4.800n ± 0% -7.21% (p=0.000 n=50)
memmove/109 5.173n ± 0% 4.797n ± 0% -7.27% (p=0.000 n=50)
memmove/110 5.171n ± 0% 4.808n ± 0% -7.01% (p=0.000 n=50)
memmove/111 5.180n ± 0% 4.799n ± 0% -7.36% (p=0.000 n=50)
memmove/112 5.185n ± 0% 4.812n ± 0% -7.19% (p=0.000 n=50)
memmove/113 5.187n ± 0% 4.797n ± 0% -7.53% (p=0.000 n=50)
memmove/114 5.183n ± 0% 4.809n ± 0% -7.21% (n=50)
memmove/115 5.193n ± 0% 4.811n ± 0% -7.36% (p=0.000 n=50)
memmove/116 5.196n ± 0% 4.815n ± 0% -7.32% (p=0.000 n=50)
memmove/117 5.199n ± 0% 4.816n ± 0% -7.37% (p=0.000 n=50)
memmove/118 5.198n ± 0% 4.811n ± 0% -7.45% (p=0.000 n=50)
memmove/119 5.203n ± 0% 4.818n ± 0% -7.40% (p=0.000 n=50)
memmove/120 5.195n ± 0% 4.823n ± 0% -7.16% (p=0.000 n=50)
memmove/121 5.203n ± 0% 4.812n ± 0% -7.51% (p=0.000 n=50)
memmove/122 5.204n ± 0% 4.818n ± 0% -7.42% (n=50)
memmove/123 5.202n ± 0% 4.822n ± 0% -7.31% (p=0.000 n=50)
memmove/124 5.216n ± 0% 4.823n ± 0% -7.54% (p=0.000 n=50)
memmove/125 5.227n ± 0% 4.823n ± 0% -7.72% (p=0.000 n=50)
memmove/126 5.235n ± 0% 4.830n ± 0% -7.74% (p=0.000 n=50)
memmove/127 5.237n ± 0% 4.833n ± 0% -7.72% (p=0.000 n=50)
memmove/128 5.241n ± 0% 4.832n ± 0% -7.81% (p=0.000 n=50)
memmove/129 6.460n ± 0% 5.858n ± 0% -9.31% (p=0.000 n=50)
memmove/130 7.539n ± 0% 6.634n ± 0% -12.00% (p=0.000 n=50)
memmove/131 7.542n ± 0% 6.623n ± 0% -12.18% (p=0.000 n=50)
memmove/132 7.527n ± 0% 6.667n ± 1% -11.43% (p=0.000 n=50)
memmove/133 7.521n ± 0% 6.631n ± 0% -11.83% (p=0.000 n=50)
memmove/134 7.531n ± 0% 6.642n ± 0% -11.81% (p=0.000 n=50)
memmove/135 7.541n ± 0% 6.692n ± 1% -11.25% (p=0.000 n=50)
memmove/136 7.549n ± 0% 6.657n ± 0% -11.81% (p=0.000 n=50)
memmove/137 7.544n ± 0% 6.646n ± 0% -11.90% (p=0.000 n=50)
memmove/138 7.557n ± 0% 6.673n ± 1% -11.70% (p=0.000 n=50)
memmove/139 7.545n ± 0% 6.654n ± 0% -11.81% (n=50)
memmove/140 7.559n ± 0% 6.680n ± 1% -11.63% (p=0.000 n=50)
memmove/141 7.560n ± 0% 6.664n ± 0% -11.85% (p=0.000 n=50)
memmove/142 7.556n ± 0% 6.679n ± 0% -11.62% (p=0.000 n=50)
memmove/143 7.570n ± 0% 6.683n ± 1% -11.71% (p=0.000 n=50)
memmove/144 7.586n ± 0% 6.683n ± 0% -11.91% (p=0.000 n=50)
memmove/145 7.593n ± 0% 6.665n ± 0% -12.22% (p=0.000 n=50)
memmove/146 7.591n ± 0% 6.665n ± 0% -12.20% (p=0.000 n=50)
memmove/147 7.598n ± 0% 6.665n ± 0% -12.27% (p=0.000 n=50)
memmove/148 7.598n ± 0% 6.670n ± 0% -12.21% (p=0.000 n=50)
memmove/149 7.593n ± 0% 6.691n ± 0% -11.88% (p=0.000 n=50)
memmove/150 7.625n ± 0% 6.713n ± 1% -11.97% (p=0.000 n=50)
memmove/151 7.603n ± 0% 6.710n ± 1% -11.74% (p=0.000 n=50)
memmove/152 7.613n ± 0% 6.701n ± 1% -11.97% (p=0.000 n=50)
memmove/153 7.595n ± 0% 6.710n ± 0% -11.65% (p=0.000 n=50)
memmove/154 7.614n ± 0% 6.721n ± 0% -11.74% (p=0.000 n=50)
memmove/155 7.615n ± 0% 6.709n ± 0% -11.89% (p=0.000 n=50)
memmove/156 7.613n ± 0% 6.693n ± 0% -12.08% (p=0.000 n=50)
memmove/157 7.628n ± 0% 6.708n ± 0% -12.05% (p=0.000 n=50)
memmove/158 7.629n ± 0% 6.706n ± 0% -12.10% (p=0.000 n=50)
memmove/159 7.639n ± 0% 6.724n ± 0% -11.98% (p=0.000 n=50)
memmove/160 7.619n ± 0% 6.702n ± 0% -12.04% (p=0.000 n=50)
memmove/161 7.653n ± 0% 6.698n ± 0% -12.49% (p=0.000 n=50)
memmove/162 8.104n ± 0% 7.140n ± 1% -11.89% (p=0.000 n=50)
memmove/163 8.141n ± 0% 7.187n ± 1% -11.72% (p=0.000 n=50)
memmove/164 8.154n ± 0% 7.107n ± 0% -12.84% (p=0.000 n=50)
memmove/165 8.143n ± 0% 7.117n ± 0% -12.59% (p=0.000 n=50)
memmove/166 8.176n ± 0% 7.110n ± 0% -13.04% (p=0.000 n=50)
memmove/167 8.194n ± 0% 7.168n ± 1% -12.52% (p=0.000 n=50)
memmove/168 8.214n ± 0% 7.188n ± 1% -12.50% (p=0.000 n=50)
memmove/169 8.220n ± 0% 7.242n ± 1% -11.90% (p=0.000 n=50)
memmove/170 8.228n ± 0% 7.244n ± 1% -11.96% (p=0.000 n=50)
memmove/171 8.263n ± 0% 7.184n ± 0% -13.06% (p=0.000 n=50)
memmove/172 8.259n ± 0% 7.325n ± 1% -11.31% (p=0.000 n=50)
memmove/173 8.271n ± 0% 7.225n ± 0% -12.65% (p=0.000 n=50)
memmove/174 8.284n ± 0% 7.287n ± 1% -12.04% (p=0.000 n=50)
memmove/175 8.289n ± 0% 7.282n ± 1% -12.15% (p=0.000 n=50)
memmove/176 8.309n ± 0% 7.328n ± 1% -11.81% (p=0.000 n=50)
memmove/177 8.317n ± 0% 7.264n ± 1% -12.67% (p=0.000 n=50)
memmove/178 8.302n ± 0% 7.342n ± 1% -11.57% (p=0.000 n=50)
memmove/179 8.309n ± 0% 7.357n ± 1% -11.45% (p=0.000 n=50)
memmove/180 8.304n ± 0% 7.318n ± 1% -11.87% (p=0.000 n=50)
memmove/181 8.312n ± 0% 7.363n ± 1% -11.42% (p=0.000 n=50)
memmove/182 8.315n ± 0% 7.320n ± 1% -11.96% (p=0.000 n=50)
memmove/183 8.330n ± 0% 7.286n ± 1% -12.53% (p=0.000 n=50)
memmove/184 8.310n ± 0% 7.324n ± 1% -11.86% (p=0.000 n=50)
memmove/185 8.303n ± 0% 7.267n ± 1% -12.47% (p=0.000 n=50)
memmove/186 8.287n ± 0% 7.312n ± 1% -11.76% (p=0.000 n=50)
memmove/187 8.298n ± 0% 7.395n ± 2% -10.88% (p=0.000 n=50)
memmove/188 8.296n ± 0% 7.339n ± 1% -11.54% (p=0.000 n=50)
memmove/189 8.306n ± 0% 7.299n ± 1% -12.12% (p=0.000 n=50)
memmove/190 8.281n ± 0% 7.309n ± 1% -11.74% (p=0.000 n=50)
memmove/191 8.299n ± 0% 7.282n ± 1% -12.26% (p=0.000 n=50)
memmove/192 8.281n ± 0% 7.335n ± 1% -11.41% (p=0.000 n=50)
memmove/193 8.299n ± 0% 7.325n ± 1% -11.74% (p=0.000 n=50)
memmove/194 8.641n ± 0% 8.034n ± 0% -7.02% (p=0.000 n=50)
memmove/195 8.667n ± 0% 8.073n ± 0% -6.85% (p=0.000 n=50)
memmove/196 8.666n ± 0% 8.030n ± 0% -7.34% (p=0.000 n=50)
memmove/197 8.660n ± 0% 8.096n ± 1% -6.51% (p=0.000 n=50)
memmove/198 8.688n ± 0% 8.047n ± 0% -7.39% (p=0.000 n=50)
memmove/199 8.678n ± 0% 8.061n ± 0% -7.11% (p=0.000 n=50)
memmove/200 8.669n ± 0% 8.034n ± 0% -7.32% (p=0.000 n=50)
memmove/201 8.692n ± 0% 8.061n ± 0% -7.26% (p=0.000 n=50)
memmove/202 8.668n ± 0% 8.060n ± 0% -7.02% (p=0.000 n=50)
memmove/203 8.687n ± 0% 8.066n ± 0% -7.15% (p=0.000 n=50)
memmove/204 8.699n ± 0% 8.076n ± 0% -7.16% (p=0.000 n=50)
memmove/205 8.676n ± 0% 8.085n ± 0% -6.82% (p=0.000 n=50)
memmove/206 8.684n ± 0% 8.101n ± 1% -6.71% (p=0.000 n=50)
memmove/207 8.725n ± 0% 8.099n ± 0% -7.18% (p=0.000 n=50)
memmove/208 8.674n ± 0% 8.073n ± 0% -6.92% (p=0.000 n=50)
memmove/209 8.697n ± 0% 8.088n ± 0% -7.01% (p=0.000 n=50)
memmove/210 8.733n ± 0% 8.076n ± 0% -7.53% (p=0.000 n=50)
memmove/211 8.732n ± 0% 8.104n ± 0% -7.19% (p=0.000 n=50)
memmove/212 8.730n ± 0% 8.091n ± 0% -7.32% (p=0.000 n=50)
memmove/213 8.728n ± 0% 8.100n ± 0% -7.19% (p=0.000 n=50)
memmove/214 8.744n ± 1% 8.081n ± 1% -7.57% (p=0.000 n=50)
memmove/215 8.734n ± 0% 8.150n ± 0% -6.68% (p=0.000 n=50)
memmove/216 8.748n ± 0% 8.116n ± 0% -7.23% (p=0.000 n=50)
memmove/217 8.751n ± 0% 8.129n ± 1% -7.11% (p=0.000 n=50)
memmove/218 8.747n ± 0% 8.114n ± 0% -7.23% (p=0.000 n=50)
memmove/219 8.733n ± 0% 8.159n ± 0% -6.57% (p=0.000 n=50)
memmove/220 8.764n ± 0% 8.145n ± 0% -7.06% (p=0.000 n=50)
memmove/221 8.764n ± 0% 8.142n ± 0% -7.10% (p=0.000 n=50)
memmove/222 8.775n ± 0% 8.152n ± 0% -7.10% (p=0.000 n=50)
memmove/223 8.771n ± 0% 8.143n ± 0% -7.16% (p=0.000 n=50)
memmove/224 8.778n ± 0% 8.175n ± 1% -6.87% (p=0.000 n=50)
memmove/225 8.794n ± 0% 8.138n ± 0% -7.45% (p=0.000 n=50)
memmove/226 10.13n ± 0% 10.06n ± 0% -0.71% (p=0.000 n=50)
memmove/227 10.14n ± 0% 10.08n ± 0% -0.53% (p=0.000 n=50)
memmove/228 10.13n ± 0% 10.08n ± 0% -0.56% (p=0.000 n=50)
memmove/229 10.17n ± 0% 10.11n ± 0% -0.56% (p=0.000 n=50)
memmove/230 10.17n ± 0% 10.13n ± 0% -0.38% (p=0.003 n=50)
memmove/231 10.16n ± 0% 10.12n ± 0% -0.41% (p=0.001 n=50)
memmove/232 10.19n ± 0% 10.12n ± 0% -0.67% (p=0.000 n=50)
memmove/233 10.21n ± 0% 10.14n ± 0% -0.71% (p=0.000 n=50)
memmove/234 10.24n ± 0% 10.16n ± 0% -0.79% (p=0.000 n=50)
memmove/235 10.24n ± 0% 10.16n ± 0% -0.76% (p=0.000 n=50)
memmove/236 10.25n ± 0% 10.16n ± 0% -0.81% (p=0.000 n=50)
memmove/237 10.24n ± 0% 10.17n ± 0% -0.69% (p=0.000 n=50)
memmove/238 10.27n ± 0% 10.19n ± 0% -0.79% (p=0.000 n=50)
memmove/239 10.29n ± 0% 10.19n ± 0% -0.90% (p=0.000 n=50)
memmove/240 10.30n ± 0% 10.20n ± 0% -0.95% (p=0.000 n=50)
memmove/241 10.29n ± 0% 10.20n ± 0% -0.91% (p=0.000 n=50)
memmove/242 10.30n ± 0% 10.22n ± 0% -0.80% (p=0.000 n=50)
memmove/243 10.32n ± 0% 10.23n ± 0% -0.87% (p=0.000 n=50)
memmove/244 10.32n ± 0% 10.24n ± 0% -0.74% (p=0.000 n=50)
memmove/245 10.33n ± 0% 10.23n ± 0% -0.97% (p=0.000 n=50)
memmove/246 10.33n ± 0% 10.24n ± 0% -0.92% (p=0.000 n=50)
memmove/247 10.31n ± 0% 10.24n ± 0% -0.69% (p=0.000 n=50)
memmove/248 10.32n ± 0% 10.26n ± 0% -0.55% (p=0.000 n=50)
memmove/249 10.33n ± 0% 10.28n ± 0% -0.52% (p=0.000 n=50)
memmove/250 10.34n ± 0% 10.27n ± 0% -0.66% (p=0.000 n=50)
memmove/251 10.32n ± 0% 10.27n ± 0% -0.45% (p=0.000 n=50)
memmove/252 10.34n ± 0% 10.30n ± 0% -0.39% (p=0.005 n=50)
memmove/253 10.33n ± 0% 10.27n ± 0% -0.57% (p=0.000 n=50)
memmove/254 10.33n ± 0% 10.27n ± 0% -0.54% (p=0.000 n=50)
memmove/255 10.34n ± 0% 10.29n ± 0% -0.50% (p=0.002 n=50)
memmove/256 10.36n ± 0% 10.31n ± 0% -0.44% (p=0.006 n=50)
memmove/257 10.33n ± 0% 10.29n ± 0% -0.36% (p=0.004 n=50)
geomean 6.142n 5.696n -7.26%
```
|
|
Summary:
The GPU build is special in the sense that we always know that
up-to-date `clang` is always going to be the compiler. This allows us to
rely directly on builtins, which allow us to push a lot of this
complexity into the backend. Backend implementations are favored on
the GPU because it allows us to do a lot more target specific
optimizations. This patch changes over the common memory functions to
use builtin versions when building for AMDGPU or NVPTX.
|
|
This is step 4 of
https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
|
|
prevent file copy/paste issues. (#66477)
|
|
|
|
This was generated using clang-tidy and clang-apply-replacements,
on src/string/*.cpp for just the llvmlibc-inline-function-decl
check, after applying https://reviews.llvm.org/D157164, and then
some manual fixup.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D157169
|
|
This patch mostly renames files so it better reflects the function they declare.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D155607
|