| Age | Commit message (Collapse) | Author |
|
|
|
<stdint.h> includes. (#150303)
https://github.com/llvm/llvm-project/issues/149993
|
|
This implementation has been compiled with the [pigweed toolchain](https://pigweed.dev/toolchain.html) and tested on:
- Raspberry Pi Pico 2 with the following options\
`--target=armv8m.main-none-eabi`
`-march=armv8m.main+fp+dsp`
`-mcpu=cortex-m33`
- Raspberry Pi Pico with the following options\
`--target=armv6m-none-eabi`
`-march=armv6m`
`-mcpu=cortex-m0+`
They both compile down to a little bit more than 200 bytes and are between 2 and 10 times faster than byte per byte copies.
For best performance the following options can be set in the `libc/config/baremetal/arm/config.json`
```
{
"codegen": {
"LIBC_CONF_KEEP_FRAME_POINTER": {
"value": false
}
},
"general": {
"LIBC_ADD_NULL_CHECKS": {
"value": false
}
}
}
```
|
|
Relates to
https://github.com/llvm/llvm-project/issues/119281#issuecomment-2699470459
|
|
This reverts commit 1e6e845d49a336e9da7ca6c576ec45c0b419b5f6 because it
changed the 1st parameter of adjust() to be unsigned, but libc itself
calls adjust() with a negative argument in align_backward() in
op_generic.h.
|
|
Relates to: #119281
|
|
Fixed imports for all files *within* `libc/src/string/memory_utils`.
Note: This doesn't include **all** files that need to be fixed.
Fixes #86579
|
|
This prevents a conflict with the Linux system endian.h when built in
overlay mode for CPP files in __support.
This issue appeared in PR #106259.
|
|
This is a part of #97655.
|
|
declaration" (#98593)
Reverts llvm/llvm-project#98075
bots are broken
|
|
This is a part of #97655.
|
|
Needed to support i386 (#93709).
|
|
Fixes #86546 and removes the macro `LIBC_HAS_BUILTIN`. This was
necessary to support older compilers that did not support
`__has_builtin`. All of the compilers we support already have this
builtin.
See: https://libc.llvm.org/compiler_support.html
All uses now use `__has_builtin` directly
cc @nickdesaulniers
|
|
Fixes:
libc/src/string/memory_utils/utils.h:345:13: warning: invalid case style
for member 'offset_' [readability-identifier-naming]
Having a trailing underscore for members is a google3 style, not LLVM style.
Removing the underscore is insufficient, as we would then have 2 members with
the same identifier which is not allowed (it is a compile time error). Remove
the getter, and just access the renamed member that's now made public.
|
|
Found via:
$ ninja -k2000 libc-lint 2>&1 | grep readability-identifier-naming
Auto fixed via:
$ clang-tidy -p build/compile_commands.json \
-checks="-*,readability-identifier-naming" \
<filename> --fix
This doesn't fix all instances, just the obvious simple cases where it makes
sense to change the identifier names. Subsequent PRs will fix up the
stragglers.
|
|
Codify that we use lower_case for
readability-identifier-naming.ConstexprFunctionCase and then fix the 11
violations (rather than codify UPPER_CASE and have to fix the 170 violations).
|
|
|
|
instead (#73939) (#74446)
Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h`
that was still using `deferred_static_assert`.
|
|
instead" (#74444)
Reverts llvm/llvm-project#73939
This broke libc-aarch64-ubuntu build bot
https://lab.llvm.org/buildbot/#/builders/138/builds/56186
|
|
|
|
The [standard](https://eel.is/c++draft/expr.add#4.3) forbids forming
pointers to invalid objects even if the pointer is never read from or
written to. This patch makes sure that we don't do pointer arithmetic on
invalid pointers.
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
|
|
Software prefetching helps recover performance when hardware prefetching
is disabled. The 'LIBC_COPT_MEMSET_X86_USE_SOFTWARE_PREFETCHING' compile
time option allows users to use this patch.
|
|
Use a check that requries fewer instructions and cheaper.
Current code:
```
1b704: 48 39 f7 cmp %rsi,%rdi
1b707: 48 89 f0 mov %rsi,%rax
1b70a: 48 0f 47 c7 cmova %rdi,%rax
1b70e: 48 89 f9 mov %rdi,%rcx
1b711: 48 0f 47 ce cmova %rsi,%rcx
1b715: 48 01 d1 add %rdx,%rcx
1b718: 48 39 c1 cmp %rax,%rcx
```
New code:
```
1b704: 48 89 f8 mov %rdi,%rax
1b707: 48 29 f0 sub %rsi,%rax
1b70a: 48 89 c1 mov %rax,%rcx
1b70d: 48 f7 d9 neg %rcx
1b710: 48 0f 48 c8 cmovs %rax,%rcx
1b714: 48 39 d1 cmp %rdx,%rcx
```
```
│ baseline │ disjoint │
│ sec/op │ sec/op vs base │
memmove/Google_A 3.910n ± 0% 3.861n ± 1% -1.26% (p=0.000 n=50)
```
```
│ baseline │ disjoint │
│ sec/op │ sec/op vs base │
memmove/1 2.724n ± 3% 2.441n ± 0% -10.37% (n=50)
memmove/2 2.878n ± 0% 2.713n ± 0% -5.73% (n=50)
memmove/3 2.835n ± 0% 2.593n ± 0% -8.54% (n=50)
memmove/4 3.032n ± 0% 2.776n ± 0% -8.45% (p=0.000 n=50)
memmove/5 2.833n ± 0% 2.600n ± 0% -8.20% (p=0.000 n=50)
memmove/6 2.758n ± 0% 2.744n ± 0% -0.52% (p=0.000 n=50)
memmove/7 2.762n ± 0% 2.744n ± 0% -0.63% (p=0.000 n=50)
memmove/8 2.763n ± 0% 2.750n ± 0% -0.46% (p=0.000 n=50)
memmove/9 3.182n ± 0% 3.269n ± 0% +2.75% (p=0.000 n=50)
memmove/10 3.185n ± 0% 3.270n ± 0% +2.64% (p=0.000 n=50)
memmove/11 3.188n ± 0% 3.277n ± 0% +2.79% (p=0.000 n=50)
memmove/12 3.190n ± 0% 3.279n ± 0% +2.82% (p=0.000 n=50)
memmove/13 3.194n ± 0% 3.281n ± 0% +2.73% (p=0.000 n=50)
memmove/14 3.197n ± 0% 3.285n ± 0% +2.77% (p=0.000 n=50)
memmove/15 3.198n ± 0% 3.282n ± 0% +2.62% (p=0.000 n=50)
memmove/16 3.201n ± 0% 3.284n ± 0% +2.61% (p=0.000 n=50)
memmove/17 3.564n ± 0% 3.320n ± 0% -6.86% (p=0.000 n=50)
memmove/18 3.572n ± 0% 3.313n ± 0% -7.25% (p=0.000 n=50)
memmove/19 3.572n ± 0% 3.325n ± 0% -6.94% (p=0.000 n=50)
memmove/20 3.575n ± 0% 3.319n ± 0% -7.15% (p=0.000 n=50)
memmove/21 3.578n ± 0% 3.327n ± 0% -7.03% (p=0.000 n=50)
memmove/22 3.581n ± 0% 3.330n ± 0% -7.01% (p=0.000 n=50)
memmove/23 3.582n ± 0% 3.354n ± 1% -6.37% (p=0.000 n=50)
memmove/24 3.587n ± 0% 3.347n ± 1% -6.71% (p=0.000 n=50)
memmove/25 3.591n ± 0% 3.320n ± 0% -7.55% (p=0.000 n=50)
memmove/26 3.593n ± 0% 3.348n ± 0% -6.82% (p=0.000 n=50)
memmove/27 3.596n ± 0% 3.346n ± 0% -6.94% (p=0.000 n=50)
memmove/28 3.597n ± 0% 3.357n ± 0% -6.67% (p=0.000 n=50)
memmove/29 3.601n ± 0% 3.340n ± 0% -7.23% (p=0.000 n=50)
memmove/30 3.602n ± 0% 3.345n ± 0% -7.12% (p=0.000 n=50)
memmove/31 3.608n ± 0% 3.357n ± 0% -6.94% (p=0.000 n=50)
memmove/32 3.605n ± 0% 3.352n ± 0% -7.01% (p=0.000 n=50)
memmove/33 4.128n ± 1% 3.829n ± 0% -7.23% (p=0.000 n=50)
memmove/34 4.149n ± 0% 3.836n ± 0% -7.54% (p=0.000 n=50)
memmove/35 4.134n ± 0% 3.839n ± 0% -7.15% (n=50)
memmove/36 4.151n ± 0% 3.842n ± 0% -7.45% (n=50)
memmove/37 4.152n ± 0% 3.841n ± 0% -7.49% (p=0.000 n=50)
memmove/38 4.159n ± 0% 3.844n ± 0% -7.58% (p=0.000 n=50)
memmove/39 4.165n ± 0% 3.841n ± 0% -7.78% (p=0.000 n=50)
memmove/40 4.162n ± 0% 3.837n ± 0% -7.81% (p=0.000 n=50)
memmove/41 4.161n ± 0% 3.845n ± 0% -7.58% (p=0.000 n=50)
memmove/42 4.164n ± 0% 3.851n ± 0% -7.53% (p=0.000 n=50)
memmove/43 4.165n ± 0% 3.843n ± 0% -7.74% (p=0.000 n=50)
memmove/44 4.175n ± 0% 3.847n ± 0% -7.83% (p=0.000 n=50)
memmove/45 4.170n ± 0% 3.849n ± 0% -7.70% (p=0.000 n=50)
memmove/46 4.175n ± 0% 3.850n ± 0% -7.79% (p=0.000 n=50)
memmove/47 4.180n ± 0% 3.851n ± 0% -7.87% (p=0.000 n=50)
memmove/48 4.178n ± 0% 3.852n ± 0% -7.81% (p=0.000 n=50)
memmove/49 4.175n ± 0% 3.851n ± 0% -7.76% (n=50)
memmove/50 4.178n ± 0% 3.855n ± 0% -7.73% (p=0.000 n=50)
memmove/51 4.190n ± 0% 3.859n ± 0% -7.91% (p=0.000 n=50)
memmove/52 4.188n ± 0% 3.859n ± 0% -7.84% (p=0.000 n=50)
memmove/53 4.191n ± 0% 3.863n ± 0% -7.82% (p=0.000 n=50)
memmove/54 4.192n ± 0% 3.860n ± 0% -7.91% (p=0.000 n=50)
memmove/55 4.192n ± 0% 3.869n ± 0% -7.70% (p=0.000 n=50)
memmove/56 4.204n ± 0% 3.866n ± 0% -8.05% (p=0.000 n=50)
memmove/57 4.198n ± 0% 3.864n ± 0% -7.95% (p=0.000 n=50)
memmove/58 4.202n ± 0% 3.865n ± 0% -8.02% (p=0.000 n=50)
memmove/59 4.208n ± 0% 3.868n ± 0% -8.09% (p=0.000 n=50)
memmove/60 4.205n ± 0% 3.873n ± 0% -7.89% (p=0.000 n=50)
memmove/61 4.212n ± 0% 3.872n ± 0% -8.08% (p=0.000 n=50)
memmove/62 4.214n ± 0% 3.870n ± 0% -8.16% (p=0.000 n=50)
memmove/63 4.215n ± 0% 3.877n ± 0% -8.02% (p=0.000 n=50)
memmove/64 4.217n ± 0% 3.881n ± 0% -7.99% (p=0.000 n=50)
memmove/65 4.990n ± 0% 4.683n ± 0% -6.15% (p=0.000 n=50)
memmove/66 5.022n ± 0% 4.719n ± 0% -6.03% (p=0.000 n=50)
memmove/67 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50)
memmove/68 5.035n ± 0% 4.724n ± 0% -6.18% (p=0.000 n=50)
memmove/69 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50)
memmove/70 5.040n ± 0% 4.728n ± 0% -6.19% (p=0.000 n=50)
memmove/71 5.053n ± 0% 4.728n ± 0% -6.43% (p=0.000 n=50)
memmove/72 5.050n ± 0% 4.732n ± 0% -6.29% (p=0.000 n=50)
memmove/73 5.049n ± 0% 4.733n ± 0% -6.24% (p=0.000 n=50)
memmove/74 5.054n ± 0% 4.734n ± 0% -6.34% (p=0.000 n=50)
memmove/75 5.063n ± 0% 4.736n ± 0% -6.46% (p=0.000 n=50)
memmove/76 5.046n ± 0% 4.741n ± 0% -6.04% (p=0.000 n=50)
memmove/77 5.057n ± 0% 4.741n ± 0% -6.25% (p=0.000 n=50)
memmove/78 5.077n ± 0% 4.739n ± 0% -6.65% (p=0.000 n=50)
memmove/79 5.074n ± 0% 4.746n ± 0% -6.46% (p=0.000 n=50)
memmove/80 5.085n ± 0% 4.747n ± 0% -6.65% (p=0.000 n=50)
memmove/81 5.077n ± 0% 4.735n ± 0% -6.74% (p=0.000 n=50)
memmove/82 5.087n ± 0% 4.747n ± 0% -6.68% (p=0.000 n=50)
memmove/83 5.087n ± 0% 4.754n ± 0% -6.56% (p=0.000 n=50)
memmove/84 5.096n ± 0% 4.753n ± 0% -6.73% (p=0.000 n=50)
memmove/85 5.082n ± 0% 4.749n ± 0% -6.55% (p=0.000 n=50)
memmove/86 5.103n ± 0% 4.752n ± 0% -6.87% (p=0.000 n=50)
memmove/87 5.096n ± 0% 4.760n ± 0% -6.61% (p=0.000 n=50)
memmove/88 5.099n ± 0% 4.765n ± 0% -6.55% (p=0.000 n=50)
memmove/89 5.104n ± 0% 4.757n ± 0% -6.79% (p=0.000 n=50)
memmove/90 5.117n ± 0% 4.767n ± 0% -6.84% (p=0.000 n=50)
memmove/91 5.100n ± 0% 4.766n ± 0% -6.54% (p=0.000 n=50)
memmove/92 5.103n ± 0% 4.763n ± 0% -6.67% (p=0.000 n=50)
memmove/93 5.115n ± 0% 4.772n ± 0% -6.71% (p=0.000 n=50)
memmove/94 5.117n ± 0% 4.769n ± 0% -6.80% (p=0.000 n=50)
memmove/95 5.131n ± 0% 4.775n ± 0% -6.94% (p=0.000 n=50)
memmove/96 5.129n ± 0% 4.772n ± 0% -6.97% (p=0.000 n=50)
memmove/97 5.130n ± 0% 4.764n ± 0% -7.13% (p=0.000 n=50)
memmove/98 5.134n ± 0% 4.780n ± 0% -6.89% (p=0.000 n=50)
memmove/99 5.141n ± 0% 4.780n ± 0% -7.03% (p=0.000 n=50)
memmove/100 5.141n ± 0% 4.780n ± 0% -7.02% (p=0.000 n=50)
memmove/101 5.150n ± 0% 4.782n ± 0% -7.14% (p=0.000 n=50)
memmove/102 5.150n ± 0% 4.790n ± 0% -6.99% (p=0.000 n=50)
memmove/103 5.156n ± 0% 4.788n ± 0% -7.14% (n=50)
memmove/104 5.157n ± 0% 4.793n ± 0% -7.05% (p=0.000 n=50)
memmove/105 5.147n ± 0% 4.791n ± 0% -6.90% (p=0.000 n=50)
memmove/106 5.167n ± 0% 4.793n ± 0% -7.23% (p=0.000 n=50)
memmove/107 5.165n ± 0% 4.801n ± 0% -7.06% (p=0.000 n=50)
memmove/108 5.173n ± 0% 4.800n ± 0% -7.21% (p=0.000 n=50)
memmove/109 5.173n ± 0% 4.797n ± 0% -7.27% (p=0.000 n=50)
memmove/110 5.171n ± 0% 4.808n ± 0% -7.01% (p=0.000 n=50)
memmove/111 5.180n ± 0% 4.799n ± 0% -7.36% (p=0.000 n=50)
memmove/112 5.185n ± 0% 4.812n ± 0% -7.19% (p=0.000 n=50)
memmove/113 5.187n ± 0% 4.797n ± 0% -7.53% (p=0.000 n=50)
memmove/114 5.183n ± 0% 4.809n ± 0% -7.21% (n=50)
memmove/115 5.193n ± 0% 4.811n ± 0% -7.36% (p=0.000 n=50)
memmove/116 5.196n ± 0% 4.815n ± 0% -7.32% (p=0.000 n=50)
memmove/117 5.199n ± 0% 4.816n ± 0% -7.37% (p=0.000 n=50)
memmove/118 5.198n ± 0% 4.811n ± 0% -7.45% (p=0.000 n=50)
memmove/119 5.203n ± 0% 4.818n ± 0% -7.40% (p=0.000 n=50)
memmove/120 5.195n ± 0% 4.823n ± 0% -7.16% (p=0.000 n=50)
memmove/121 5.203n ± 0% 4.812n ± 0% -7.51% (p=0.000 n=50)
memmove/122 5.204n ± 0% 4.818n ± 0% -7.42% (n=50)
memmove/123 5.202n ± 0% 4.822n ± 0% -7.31% (p=0.000 n=50)
memmove/124 5.216n ± 0% 4.823n ± 0% -7.54% (p=0.000 n=50)
memmove/125 5.227n ± 0% 4.823n ± 0% -7.72% (p=0.000 n=50)
memmove/126 5.235n ± 0% 4.830n ± 0% -7.74% (p=0.000 n=50)
memmove/127 5.237n ± 0% 4.833n ± 0% -7.72% (p=0.000 n=50)
memmove/128 5.241n ± 0% 4.832n ± 0% -7.81% (p=0.000 n=50)
memmove/129 6.460n ± 0% 5.858n ± 0% -9.31% (p=0.000 n=50)
memmove/130 7.539n ± 0% 6.634n ± 0% -12.00% (p=0.000 n=50)
memmove/131 7.542n ± 0% 6.623n ± 0% -12.18% (p=0.000 n=50)
memmove/132 7.527n ± 0% 6.667n ± 1% -11.43% (p=0.000 n=50)
memmove/133 7.521n ± 0% 6.631n ± 0% -11.83% (p=0.000 n=50)
memmove/134 7.531n ± 0% 6.642n ± 0% -11.81% (p=0.000 n=50)
memmove/135 7.541n ± 0% 6.692n ± 1% -11.25% (p=0.000 n=50)
memmove/136 7.549n ± 0% 6.657n ± 0% -11.81% (p=0.000 n=50)
memmove/137 7.544n ± 0% 6.646n ± 0% -11.90% (p=0.000 n=50)
memmove/138 7.557n ± 0% 6.673n ± 1% -11.70% (p=0.000 n=50)
memmove/139 7.545n ± 0% 6.654n ± 0% -11.81% (n=50)
memmove/140 7.559n ± 0% 6.680n ± 1% -11.63% (p=0.000 n=50)
memmove/141 7.560n ± 0% 6.664n ± 0% -11.85% (p=0.000 n=50)
memmove/142 7.556n ± 0% 6.679n ± 0% -11.62% (p=0.000 n=50)
memmove/143 7.570n ± 0% 6.683n ± 1% -11.71% (p=0.000 n=50)
memmove/144 7.586n ± 0% 6.683n ± 0% -11.91% (p=0.000 n=50)
memmove/145 7.593n ± 0% 6.665n ± 0% -12.22% (p=0.000 n=50)
memmove/146 7.591n ± 0% 6.665n ± 0% -12.20% (p=0.000 n=50)
memmove/147 7.598n ± 0% 6.665n ± 0% -12.27% (p=0.000 n=50)
memmove/148 7.598n ± 0% 6.670n ± 0% -12.21% (p=0.000 n=50)
memmove/149 7.593n ± 0% 6.691n ± 0% -11.88% (p=0.000 n=50)
memmove/150 7.625n ± 0% 6.713n ± 1% -11.97% (p=0.000 n=50)
memmove/151 7.603n ± 0% 6.710n ± 1% -11.74% (p=0.000 n=50)
memmove/152 7.613n ± 0% 6.701n ± 1% -11.97% (p=0.000 n=50)
memmove/153 7.595n ± 0% 6.710n ± 0% -11.65% (p=0.000 n=50)
memmove/154 7.614n ± 0% 6.721n ± 0% -11.74% (p=0.000 n=50)
memmove/155 7.615n ± 0% 6.709n ± 0% -11.89% (p=0.000 n=50)
memmove/156 7.613n ± 0% 6.693n ± 0% -12.08% (p=0.000 n=50)
memmove/157 7.628n ± 0% 6.708n ± 0% -12.05% (p=0.000 n=50)
memmove/158 7.629n ± 0% 6.706n ± 0% -12.10% (p=0.000 n=50)
memmove/159 7.639n ± 0% 6.724n ± 0% -11.98% (p=0.000 n=50)
memmove/160 7.619n ± 0% 6.702n ± 0% -12.04% (p=0.000 n=50)
memmove/161 7.653n ± 0% 6.698n ± 0% -12.49% (p=0.000 n=50)
memmove/162 8.104n ± 0% 7.140n ± 1% -11.89% (p=0.000 n=50)
memmove/163 8.141n ± 0% 7.187n ± 1% -11.72% (p=0.000 n=50)
memmove/164 8.154n ± 0% 7.107n ± 0% -12.84% (p=0.000 n=50)
memmove/165 8.143n ± 0% 7.117n ± 0% -12.59% (p=0.000 n=50)
memmove/166 8.176n ± 0% 7.110n ± 0% -13.04% (p=0.000 n=50)
memmove/167 8.194n ± 0% 7.168n ± 1% -12.52% (p=0.000 n=50)
memmove/168 8.214n ± 0% 7.188n ± 1% -12.50% (p=0.000 n=50)
memmove/169 8.220n ± 0% 7.242n ± 1% -11.90% (p=0.000 n=50)
memmove/170 8.228n ± 0% 7.244n ± 1% -11.96% (p=0.000 n=50)
memmove/171 8.263n ± 0% 7.184n ± 0% -13.06% (p=0.000 n=50)
memmove/172 8.259n ± 0% 7.325n ± 1% -11.31% (p=0.000 n=50)
memmove/173 8.271n ± 0% 7.225n ± 0% -12.65% (p=0.000 n=50)
memmove/174 8.284n ± 0% 7.287n ± 1% -12.04% (p=0.000 n=50)
memmove/175 8.289n ± 0% 7.282n ± 1% -12.15% (p=0.000 n=50)
memmove/176 8.309n ± 0% 7.328n ± 1% -11.81% (p=0.000 n=50)
memmove/177 8.317n ± 0% 7.264n ± 1% -12.67% (p=0.000 n=50)
memmove/178 8.302n ± 0% 7.342n ± 1% -11.57% (p=0.000 n=50)
memmove/179 8.309n ± 0% 7.357n ± 1% -11.45% (p=0.000 n=50)
memmove/180 8.304n ± 0% 7.318n ± 1% -11.87% (p=0.000 n=50)
memmove/181 8.312n ± 0% 7.363n ± 1% -11.42% (p=0.000 n=50)
memmove/182 8.315n ± 0% 7.320n ± 1% -11.96% (p=0.000 n=50)
memmove/183 8.330n ± 0% 7.286n ± 1% -12.53% (p=0.000 n=50)
memmove/184 8.310n ± 0% 7.324n ± 1% -11.86% (p=0.000 n=50)
memmove/185 8.303n ± 0% 7.267n ± 1% -12.47% (p=0.000 n=50)
memmove/186 8.287n ± 0% 7.312n ± 1% -11.76% (p=0.000 n=50)
memmove/187 8.298n ± 0% 7.395n ± 2% -10.88% (p=0.000 n=50)
memmove/188 8.296n ± 0% 7.339n ± 1% -11.54% (p=0.000 n=50)
memmove/189 8.306n ± 0% 7.299n ± 1% -12.12% (p=0.000 n=50)
memmove/190 8.281n ± 0% 7.309n ± 1% -11.74% (p=0.000 n=50)
memmove/191 8.299n ± 0% 7.282n ± 1% -12.26% (p=0.000 n=50)
memmove/192 8.281n ± 0% 7.335n ± 1% -11.41% (p=0.000 n=50)
memmove/193 8.299n ± 0% 7.325n ± 1% -11.74% (p=0.000 n=50)
memmove/194 8.641n ± 0% 8.034n ± 0% -7.02% (p=0.000 n=50)
memmove/195 8.667n ± 0% 8.073n ± 0% -6.85% (p=0.000 n=50)
memmove/196 8.666n ± 0% 8.030n ± 0% -7.34% (p=0.000 n=50)
memmove/197 8.660n ± 0% 8.096n ± 1% -6.51% (p=0.000 n=50)
memmove/198 8.688n ± 0% 8.047n ± 0% -7.39% (p=0.000 n=50)
memmove/199 8.678n ± 0% 8.061n ± 0% -7.11% (p=0.000 n=50)
memmove/200 8.669n ± 0% 8.034n ± 0% -7.32% (p=0.000 n=50)
memmove/201 8.692n ± 0% 8.061n ± 0% -7.26% (p=0.000 n=50)
memmove/202 8.668n ± 0% 8.060n ± 0% -7.02% (p=0.000 n=50)
memmove/203 8.687n ± 0% 8.066n ± 0% -7.15% (p=0.000 n=50)
memmove/204 8.699n ± 0% 8.076n ± 0% -7.16% (p=0.000 n=50)
memmove/205 8.676n ± 0% 8.085n ± 0% -6.82% (p=0.000 n=50)
memmove/206 8.684n ± 0% 8.101n ± 1% -6.71% (p=0.000 n=50)
memmove/207 8.725n ± 0% 8.099n ± 0% -7.18% (p=0.000 n=50)
memmove/208 8.674n ± 0% 8.073n ± 0% -6.92% (p=0.000 n=50)
memmove/209 8.697n ± 0% 8.088n ± 0% -7.01% (p=0.000 n=50)
memmove/210 8.733n ± 0% 8.076n ± 0% -7.53% (p=0.000 n=50)
memmove/211 8.732n ± 0% 8.104n ± 0% -7.19% (p=0.000 n=50)
memmove/212 8.730n ± 0% 8.091n ± 0% -7.32% (p=0.000 n=50)
memmove/213 8.728n ± 0% 8.100n ± 0% -7.19% (p=0.000 n=50)
memmove/214 8.744n ± 1% 8.081n ± 1% -7.57% (p=0.000 n=50)
memmove/215 8.734n ± 0% 8.150n ± 0% -6.68% (p=0.000 n=50)
memmove/216 8.748n ± 0% 8.116n ± 0% -7.23% (p=0.000 n=50)
memmove/217 8.751n ± 0% 8.129n ± 1% -7.11% (p=0.000 n=50)
memmove/218 8.747n ± 0% 8.114n ± 0% -7.23% (p=0.000 n=50)
memmove/219 8.733n ± 0% 8.159n ± 0% -6.57% (p=0.000 n=50)
memmove/220 8.764n ± 0% 8.145n ± 0% -7.06% (p=0.000 n=50)
memmove/221 8.764n ± 0% 8.142n ± 0% -7.10% (p=0.000 n=50)
memmove/222 8.775n ± 0% 8.152n ± 0% -7.10% (p=0.000 n=50)
memmove/223 8.771n ± 0% 8.143n ± 0% -7.16% (p=0.000 n=50)
memmove/224 8.778n ± 0% 8.175n ± 1% -6.87% (p=0.000 n=50)
memmove/225 8.794n ± 0% 8.138n ± 0% -7.45% (p=0.000 n=50)
memmove/226 10.13n ± 0% 10.06n ± 0% -0.71% (p=0.000 n=50)
memmove/227 10.14n ± 0% 10.08n ± 0% -0.53% (p=0.000 n=50)
memmove/228 10.13n ± 0% 10.08n ± 0% -0.56% (p=0.000 n=50)
memmove/229 10.17n ± 0% 10.11n ± 0% -0.56% (p=0.000 n=50)
memmove/230 10.17n ± 0% 10.13n ± 0% -0.38% (p=0.003 n=50)
memmove/231 10.16n ± 0% 10.12n ± 0% -0.41% (p=0.001 n=50)
memmove/232 10.19n ± 0% 10.12n ± 0% -0.67% (p=0.000 n=50)
memmove/233 10.21n ± 0% 10.14n ± 0% -0.71% (p=0.000 n=50)
memmove/234 10.24n ± 0% 10.16n ± 0% -0.79% (p=0.000 n=50)
memmove/235 10.24n ± 0% 10.16n ± 0% -0.76% (p=0.000 n=50)
memmove/236 10.25n ± 0% 10.16n ± 0% -0.81% (p=0.000 n=50)
memmove/237 10.24n ± 0% 10.17n ± 0% -0.69% (p=0.000 n=50)
memmove/238 10.27n ± 0% 10.19n ± 0% -0.79% (p=0.000 n=50)
memmove/239 10.29n ± 0% 10.19n ± 0% -0.90% (p=0.000 n=50)
memmove/240 10.30n ± 0% 10.20n ± 0% -0.95% (p=0.000 n=50)
memmove/241 10.29n ± 0% 10.20n ± 0% -0.91% (p=0.000 n=50)
memmove/242 10.30n ± 0% 10.22n ± 0% -0.80% (p=0.000 n=50)
memmove/243 10.32n ± 0% 10.23n ± 0% -0.87% (p=0.000 n=50)
memmove/244 10.32n ± 0% 10.24n ± 0% -0.74% (p=0.000 n=50)
memmove/245 10.33n ± 0% 10.23n ± 0% -0.97% (p=0.000 n=50)
memmove/246 10.33n ± 0% 10.24n ± 0% -0.92% (p=0.000 n=50)
memmove/247 10.31n ± 0% 10.24n ± 0% -0.69% (p=0.000 n=50)
memmove/248 10.32n ± 0% 10.26n ± 0% -0.55% (p=0.000 n=50)
memmove/249 10.33n ± 0% 10.28n ± 0% -0.52% (p=0.000 n=50)
memmove/250 10.34n ± 0% 10.27n ± 0% -0.66% (p=0.000 n=50)
memmove/251 10.32n ± 0% 10.27n ± 0% -0.45% (p=0.000 n=50)
memmove/252 10.34n ± 0% 10.30n ± 0% -0.39% (p=0.005 n=50)
memmove/253 10.33n ± 0% 10.27n ± 0% -0.57% (p=0.000 n=50)
memmove/254 10.33n ± 0% 10.27n ± 0% -0.54% (p=0.000 n=50)
memmove/255 10.34n ± 0% 10.29n ± 0% -0.50% (p=0.002 n=50)
memmove/256 10.36n ± 0% 10.31n ± 0% -0.44% (p=0.006 n=50)
memmove/257 10.33n ± 0% 10.29n ± 0% -0.36% (p=0.004 n=50)
geomean 6.142n 5.696n -7.26%
```
|
|
This is step 4 of
https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
|
|
prevent file copy/paste issues. (#66477)
|
|
This was generated using clang-tidy and clang-apply-replacements,
on src/string/*.cpp for just the llvmlibc-inline-function-decl
check, after applying https://reviews.llvm.org/D157164, and then
some manual fixup.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D157169
|
|
Fix a bunch more instances of incorrect use of the `static`
keyword and missing use of LIBC_INLINE and LIBC_INLINE_VAR
macros. Note that even forward declarations and generic template
declarations must follow the prescribed patterns for libc code so
that they match every definition, all template specializations.
Reviewed By: Caslyn
Differential Revision: https://reviews.llvm.org/D154260
|
|
This is based on ideas from @nafi to:
- use a branchless version of 'cmp' for 'uint32_t',
- completely resolve the lexicographic comparison through vector
operations when wide types are available. We also get rid of byte
reloads and serializing '__builtin_ctzll'.
I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.
The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.
Reviewed By: nafi3000
Differential Revision: https://reviews.llvm.org/D148717
|
|
Once integrated in our codebase the patch triggered a bunch of failing
tests. We do not yet understand where the bug is but we revert it to
move forward with integration.
This reverts commit 5e32765c15ab8df3d2635a2bb5078c5b1d5714d5.
|
|
Most of the time `memmove` is called on buffers that are disjoint, in that case we can use `memcpy` which is faster.
The additional test is branchless on x86, aarch64 and RISCV with the zbb extension (bitmanip).
On x86 this patch adds a latency of 2 to 3 cycles.
Before
```
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------
BM_Memmove/0/0_median 5.00 ns 5.00 ns 10 bytes_per_cycle=1.25477/s bytes_per_second=2.62933G/s items_per_second=199.87M/s __llvm_libc::memmove,memmove Google A
BM_Memmove/1/0_median 6.21 ns 6.21 ns 10 bytes_per_cycle=3.22173/s bytes_per_second=6.75106G/s items_per_second=160.955M/s __llvm_libc::memmove,memmove Google B
BM_Memmove/2/0_median 8.09 ns 8.09 ns 10 bytes_per_cycle=5.31462/s bytes_per_second=11.1366G/s items_per_second=123.603M/s __llvm_libc::memmove,memmove Google D
BM_Memmove/3/0_median 5.95 ns 5.95 ns 10 bytes_per_cycle=2.71865/s bytes_per_second=5.69687G/s items_per_second=167.967M/s __llvm_libc::memmove,memmove Google L
BM_Memmove/4/0_median 5.63 ns 5.63 ns 10 bytes_per_cycle=2.28294/s bytes_per_second=4.78383G/s items_per_second=177.615M/s __llvm_libc::memmove,memmove Google M
BM_Memmove/5/0_median 5.68 ns 5.68 ns 10 bytes_per_cycle=2.16798/s bytes_per_second=4.54295G/s items_per_second=176.015M/s __llvm_libc::memmove,memmove Google Q
BM_Memmove/6/0_median 7.46 ns 7.46 ns 10 bytes_per_cycle=3.97619/s bytes_per_second=8.332G/s items_per_second=134.044M/s __llvm_libc::memmove,memmove Google S
BM_Memmove/7/0_median 5.40 ns 5.40 ns 10 bytes_per_cycle=1.79695/s bytes_per_second=3.76546G/s items_per_second=185.211M/s __llvm_libc::memmove,memmove Google U
BM_Memmove/8/0_median 5.62 ns 5.62 ns 10 bytes_per_cycle=3.18747/s bytes_per_second=6.67927G/s items_per_second=177.983M/s __llvm_libc::memmove,memmove Google W
BM_Memmove/9/0_median 101 ns 101 ns 10 bytes_per_cycle=9.77359/s bytes_per_second=20.4803G/s items_per_second=9.9333M/s __llvm_libc::memmove,uniform 384 to 4096
```
After
```
BM_Memmove/0/0_median 3.57 ns 3.57 ns 10 bytes_per_cycle=1.71375/s bytes_per_second=3.59112G/s items_per_second=280.411M/s __llvm_libc::memmove,memmove Google A
BM_Memmove/1/0_median 4.52 ns 4.52 ns 10 bytes_per_cycle=4.47557/s bytes_per_second=9.37843G/s items_per_second=221.427M/s __llvm_libc::memmove,memmove Google B
BM_Memmove/2/0_median 5.70 ns 5.70 ns 10 bytes_per_cycle=7.37396/s bytes_per_second=15.4519G/s items_per_second=175.399M/s __llvm_libc::memmove,memmove Google D
BM_Memmove/3/0_median 4.47 ns 4.47 ns 10 bytes_per_cycle=3.4148/s bytes_per_second=7.15563G/s items_per_second=223.743M/s __llvm_libc::memmove,memmove Google L
BM_Memmove/4/0_median 4.53 ns 4.53 ns 10 bytes_per_cycle=2.86071/s bytes_per_second=5.99454G/s items_per_second=220.69M/s __llvm_libc::memmove,memmove Google M
BM_Memmove/5/0_median 4.19 ns 4.19 ns 10 bytes_per_cycle=2.5484/s bytes_per_second=5.3401G/s items_per_second=238.924M/s __llvm_libc::memmove,memmove Google Q
BM_Memmove/6/0_median 5.02 ns 5.02 ns 10 bytes_per_cycle=5.94164/s bytes_per_second=12.4505G/s items_per_second=199.14M/s __llvm_libc::memmove,memmove Google S
BM_Memmove/7/0_median 4.03 ns 4.03 ns 10 bytes_per_cycle=2.47028/s bytes_per_second=5.17641G/s items_per_second=247.906M/s __llvm_libc::memmove,memmove Google U
BM_Memmove/8/0_median 4.70 ns 4.70 ns 10 bytes_per_cycle=3.84975/s bytes_per_second=8.06706G/s items_per_second=212.72M/s __llvm_libc::memmove,memmove Google W
BM_Memmove/9/0_median 90.7 ns 90.7 ns 10 bytes_per_cycle=10.8681/s bytes_per_second=22.7739G/s items_per_second=11.02M/s __llvm_libc::memmove,uniform 384 to 4096
```
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D152811
|
|
This is based on ideas from @nafi to:
- use a branchless version of 'cmp' for 'uint32_t',
- completely resolve the lexicographic comparison through vector
operations when wide types are available. We also get rid of byte
reloads and serializing '__builtin_ctzll'.
I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.
The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.
Reviewed By: nafi3000
Differential Revision: https://reviews.llvm.org/D148717
|
|
This broke aarch64 debug buildbot https://lab.llvm.org/buildbot/#/builders/223/builds/21703
This reverts commit bd4f978754758d5ef29d1f10370f45362da3de37.
|
|
This is based on ideas from @nafi to:
- use a branchless version of 'cmp' for 'uint32_t',
- completely resolve the lexicographic comparison through vector
operations when wide types are available. We also get rid of byte
reloads and serializing '__builtin_ctzll'.
I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.
The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.
Reviewed By: nafi3000
Differential Revision: https://reviews.llvm.org/D148717
|
|
This reverts commit 9ec6ebd3ceabb29482aa18a64b943788b65223dc.
The patch broke RISCV and aarch64 builtbots.
|
|
This is based on ideas from @nafi to:
- use a branchless version of 'cmp' for 'uint32_t',
- completely resolve the lexicographic comparison through vector
operations when wide types are available. We also get rid of byte
reloads and serializing '__builtin_ctzll'.
I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.
The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.
Reviewed By: nafi3000
Differential Revision: https://reviews.llvm.org/D148717
|
|
all rounding modes.
Implement double precision log2 function correctly rounded to all
rounding modes.
See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.
**Performance**
- For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%.
- Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call;
-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call;
-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call;
-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call;
```
- Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2 --latency
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call;
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call;
-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call;
-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call;
-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call;
```
- Accurate pass latency:
```
$ ./perf.sh log2 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
177.632
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
231.332
-- LIBC latency -- with FMA
459.751
-- LIBC latency -- without FMA
463.850
```
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D150374
|
|
[libc] Add optimized bcmp for RISCV
This patch adds two versions of bcmp optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.
Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Bcmp on a quad core Linux starfive RISCV 64 board running at 1.5GHz.
Before
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
L1 Instruction 32 KiB (x4)
L1 Data 32 KiB (x4)
L2 Unified 2048 KiB (x1)
Load Average: 7.03, 5.98, 3.71
----------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------
BM_Bcmp/0/0 102 ns 60.5 ns 11662336 bytes_per_cycle=0.122696/s bytes_per_second=175.518M/s items_per_second=16.5258M/s __llvm_libc::bcmp,memcmp Google A
BM_Bcmp/1/0 328 ns 172 ns 3737600 bytes_per_cycle=0.15256/s bytes_per_second=218.238M/s items_per_second=5.80575M/s __llvm_libc::bcmp,memcmp Google B
BM_Bcmp/2/0 199 ns 99.7 ns 7019520 bytes_per_cycle=0.141897/s bytes_per_second=202.986M/s items_per_second=10.032M/s __llvm_libc::bcmp,memcmp Google D
BM_Bcmp/3/0 173 ns 86.5 ns 8361984 bytes_per_cycle=0.13863/s bytes_per_second=198.312M/s items_per_second=11.5669M/s __llvm_libc::bcmp,memcmp Google L
BM_Bcmp/4/0 105 ns 51.8 ns 13213696 bytes_per_cycle=0.116399/s bytes_per_second=166.51M/s items_per_second=19.2931M/s __llvm_libc::bcmp,memcmp Google M
BM_Bcmp/5/0 167 ns 93.9 ns 7853056 bytes_per_cycle=0.139432/s bytes_per_second=199.459M/s items_per_second=10.6503M/s __llvm_libc::bcmp,memcmp Google Q
BM_Bcmp/6/0 262 ns 165 ns 3931136 bytes_per_cycle=0.151516/s bytes_per_second=216.745M/s items_per_second=6.07091M/s __llvm_libc::bcmp,memcmp Google S
BM_Bcmp/7/0 168 ns 105 ns 6665216 bytes_per_cycle=0.143159/s bytes_per_second=204.791M/s items_per_second=9.52163M/s __llvm_libc::bcmp,memcmp Google U
BM_Bcmp/8/0 108 ns 68.0 ns 10175488 bytes_per_cycle=0.125504/s bytes_per_second=179.535M/s items_per_second=14.701M/s __llvm_libc::bcmp,memcmp Google W
BM_Bcmp/9/0 15371 ns 9007 ns 78848 bytes_per_cycle=0.166128/s bytes_per_second=237.648M/s items_per_second=111.031k/s __llvm_libc::bcmp,uniform 384 to 4096
```
After
```
BM_Bcmp/0/0 74.2 ns 49.7 ns 14306304 bytes_per_cycle=0.148927/s bytes_per_second=213.042M/s items_per_second=20.1101M/s __llvm_libc::bcmp,memcmp Google A
BM_Bcmp/1/0 108 ns 68.1 ns 10350592 bytes_per_cycle=0.411197/s bytes_per_second=588.222M/s items_per_second=14.6849M/s __llvm_libc::bcmp,memcmp Google B
BM_Bcmp/2/0 80.2 ns 56.0 ns 12386304 bytes_per_cycle=0.258588/s bytes_per_second=369.912M/s items_per_second=17.8585M/s __llvm_libc::bcmp,memcmp Google D
BM_Bcmp/3/0 92.4 ns 55.7 ns 12555264 bytes_per_cycle=0.206835/s bytes_per_second=295.88M/s items_per_second=17.943M/s __llvm_libc::bcmp,memcmp Google L
BM_Bcmp/4/0 79.3 ns 46.8 ns 14288896 bytes_per_cycle=0.125872/s bytes_per_second=180.061M/s items_per_second=21.3611M/s __llvm_libc::bcmp,memcmp Google M
BM_Bcmp/5/0 98.0 ns 57.9 ns 12232704 bytes_per_cycle=0.268815/s bytes_per_second=384.543M/s items_per_second=17.2711M/s __llvm_libc::bcmp,memcmp Google Q
BM_Bcmp/6/0 132 ns 65.5 ns 10474496 bytes_per_cycle=0.417246/s bytes_per_second=596.875M/s items_per_second=15.2673M/s __llvm_libc::bcmp,memcmp Google S
BM_Bcmp/7/0 101 ns 60.9 ns 11505664 bytes_per_cycle=0.253733/s bytes_per_second=362.968M/s items_per_second=16.4202M/s __llvm_libc::bcmp,memcmp Google U
BM_Bcmp/8/0 72.5 ns 50.2 ns 14082048 bytes_per_cycle=0.183262/s bytes_per_second=262.158M/s items_per_second=19.9271M/s __llvm_libc::bcmp,memcmp Google W
BM_Bcmp/9/0 852 ns 803 ns 854016 bytes_per_cycle=1.85028/s bytes_per_second=2.58481G/s items_per_second=1.24597M/s __llvm_libc::bcmp,uniform 384 to 4096
```
For comparison with glibc
```
BM_Bcmp/0/0 106 ns 52.6 ns 12906496 bytes_per_cycle=0.142072/s bytes_per_second=203.235M/s items_per_second=19.0271M/s glibc::bcmp,memcmp Google A
BM_Bcmp/1/0 132 ns 77.1 ns 8905728 bytes_per_cycle=0.365072/s bytes_per_second=522.239M/s items_per_second=12.9782M/s glibc::bcmp,memcmp Google B
BM_Bcmp/2/0 122 ns 62.3 ns 10909696 bytes_per_cycle=0.222667/s bytes_per_second=318.527M/s items_per_second=16.0563M/s glibc::bcmp,memcmp Google D
BM_Bcmp/3/0 99.5 ns 64.2 ns 11074560 bytes_per_cycle=0.185126/s bytes_per_second=264.825M/s items_per_second=15.5674M/s glibc::bcmp,memcmp Google L
BM_Bcmp/4/0 86.6 ns 50.2 ns 13488128 bytes_per_cycle=0.117941/s bytes_per_second=168.717M/s items_per_second=19.9053M/s glibc::bcmp,memcmp Google M
BM_Bcmp/5/0 106 ns 61.4 ns 11344896 bytes_per_cycle=0.248968/s bytes_per_second=356.151M/s items_per_second=16.284M/s glibc::bcmp,memcmp Google Q
BM_Bcmp/6/0 145 ns 71.9 ns 10046464 bytes_per_cycle=0.389814/s bytes_per_second=557.633M/s items_per_second=13.9019M/s glibc::bcmp,memcmp Google S
BM_Bcmp/7/0 119 ns 65.6 ns 10718208 bytes_per_cycle=0.243756/s bytes_per_second=348.696M/s items_per_second=15.2329M/s glibc::bcmp,memcmp Google U
BM_Bcmp/8/0 86.4 ns 54.5 ns 13250560 bytes_per_cycle=0.154831/s bytes_per_second=221.488M/s items_per_second=18.3532M/s glibc::bcmp,memcmp Google W
BM_Bcmp/9/0 1090 ns 604 ns 1186816 bytes_per_cycle=2.53848/s bytes_per_second=3.54622G/s items_per_second=1.65598M/s glibc::bcmp,uniform 384 to 4096
```
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D150567
|
|
This patch adds two versions of memcpy optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.
Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memcpy` on a quad core Linux starfive RISCV 64 board running at 1.5GHz.
Before:
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
L1 Instruction 32 KiB (x4)
L1 Data 32 KiB (x4)
L2 Unified 2048 KiB (x1)
------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------
BM_Memcpy/0/0 474 ns 474 ns 1483776 bytes_per_cycle=0.243492/s bytes_per_second=348.318M/s items_per_second=2.11097M/s __llvm_libc::memcpy,memcpy Google A
BM_Memcpy/1/0 210 ns 209 ns 3649536 bytes_per_cycle=0.233819/s bytes_per_second=334.481M/s items_per_second=4.77519M/s __llvm_libc::memcpy,memcpy Google B
BM_Memcpy/2/0 1814 ns 1814 ns 396288 bytes_per_cycle=0.247899/s bytes_per_second=354.622M/s items_per_second=551.402k/s __llvm_libc::memcpy,memcpy Google D
BM_Memcpy/3/0 89.3 ns 89.2 ns 7459840 bytes_per_cycle=0.217415/s bytes_per_second=311.014M/s items_per_second=11.2071M/s __llvm_libc::memcpy,memcpy Google L
BM_Memcpy/4/0 134 ns 134 ns 3815424 bytes_per_cycle=0.226584/s bytes_per_second=324.131M/s items_per_second=7.44567M/s __llvm_libc::memcpy,memcpy Google M
BM_Memcpy/5/0 52.8 ns 52.6 ns 11001856 bytes_per_cycle=0.194893/s bytes_per_second=278.797M/s items_per_second=19.0284M/s __llvm_libc::memcpy,memcpy Google Q
BM_Memcpy/6/0 180 ns 180 ns 4101120 bytes_per_cycle=0.231884/s bytes_per_second=331.713M/s items_per_second=5.55957M/s __llvm_libc::memcpy,memcpy Google S
BM_Memcpy/7/0 195 ns 195 ns 3906560 bytes_per_cycle=0.232951/s bytes_per_second=333.239M/s items_per_second=5.1217M/s __llvm_libc::memcpy,memcpy Google U
BM_Memcpy/8/0 152 ns 152 ns 4789248 bytes_per_cycle=0.227507/s bytes_per_second=325.452M/s items_per_second=6.58187M/s __llvm_libc::memcpy,memcpy Google W
BM_Memcpy/9/0 6036 ns 6033 ns 118784 bytes_per_cycle=0.249158/s bytes_per_second=356.423M/s items_per_second=165.75k/s __llvm_libc::memcpy,uniform 384 to 4096
```
After:
```
BM_Memcpy/0/0 126 ns 126 ns 5770240 bytes_per_cycle=1.04707/s bytes_per_second=1.46273G/s items_per_second=7.9385M/s __llvm_libc::memcpy,memcpy Google A
BM_Memcpy/1/0 75.1 ns 75.0 ns 10204160 bytes_per_cycle=0.691143/s bytes_per_second=988.687M/s items_per_second=13.3289M/s __llvm_libc::memcpy,memcpy Google B
BM_Memcpy/2/0 333 ns 333 ns 2174976 bytes_per_cycle=1.39297/s bytes_per_second=1.94596G/s items_per_second=3.00002M/s __llvm_libc::memcpy,memcpy Google D
BM_Memcpy/3/0 49.6 ns 49.5 ns 16092160 bytes_per_cycle=0.710161/s bytes_per_second=1015.89M/s items_per_second=20.1844M/s __llvm_libc::memcpy,memcpy Google L
BM_Memcpy/4/0 57.7 ns 57.7 ns 11213824 bytes_per_cycle=0.561557/s bytes_per_second=803.314M/s items_per_second=17.3228M/s __llvm_libc::memcpy,memcpy Google M
BM_Memcpy/5/0 48.0 ns 47.9 ns 16437248 bytes_per_cycle=0.346708/s bytes_per_second=495.97M/s items_per_second=20.8571M/s __llvm_libc::memcpy,memcpy Google Q
BM_Memcpy/6/0 67.5 ns 67.5 ns 10616832 bytes_per_cycle=0.614173/s bytes_per_second=878.582M/s items_per_second=14.8142M/s __llvm_libc::memcpy,memcpy Google S
BM_Memcpy/7/0 84.7 ns 84.6 ns 10480640 bytes_per_cycle=0.819077/s bytes_per_second=1.14424G/s items_per_second=11.8174M/s __llvm_libc::memcpy,memcpy Google U
BM_Memcpy/8/0 61.7 ns 61.6 ns 11191296 bytes_per_cycle=0.550078/s bytes_per_second=786.893M/s items_per_second=16.2279M/s __llvm_libc::memcpy,memcpy Google W
BM_Memcpy/9/0 981 ns 981 ns 703488 bytes_per_cycle=1.52333/s bytes_per_second=2.12807G/s items_per_second=1019.81k/s __llvm_libc::memcpy,uniform 384 to 4096
```
It is not as good as glibc for now so there's room for improvement. I suspect a path pumping 16 bytes at once given the doubled numbers for large copies.
```
BM_Memcpy/0/1 146 ns 82.5 ns 8576000 bytes_per_cycle=1.35236/s bytes_per_second=1.88922G/s items_per_second=12.1169M/s glibc memcpy,memcpy Google A
BM_Memcpy/1/1 112 ns 63.7 ns 10634240 bytes_per_cycle=0.628018/s bytes_per_second=898.387M/s items_per_second=15.702M/s glibc memcpy,memcpy Google B
BM_Memcpy/2/1 315 ns 180 ns 4079616 bytes_per_cycle=2.65229/s bytes_per_second=3.7052G/s items_per_second=5.54764M/s glibc memcpy,memcpy Google D
BM_Memcpy/3/1 85.3 ns 43.1 ns 15854592 bytes_per_cycle=0.774164/s bytes_per_second=1107.45M/s items_per_second=23.2249M/s glibc memcpy,memcpy Google L
BM_Memcpy/4/1 105 ns 54.3 ns 13427712 bytes_per_cycle=0.7793/s bytes_per_second=1114.8M/s items_per_second=18.4109M/s glibc memcpy,memcpy Google M
BM_Memcpy/5/1 77.1 ns 43.2 ns 16476160 bytes_per_cycle=0.279808/s bytes_per_second=400.269M/s items_per_second=23.1428M/s glibc memcpy,memcpy Google Q
BM_Memcpy/6/1 112 ns 62.7 ns 11236352 bytes_per_cycle=0.676078/s bytes_per_second=967.137M/s items_per_second=15.9387M/s glibc memcpy,memcpy Google S
BM_Memcpy/7/1 131 ns 65.5 ns 11751424 bytes_per_cycle=0.965616/s bytes_per_second=1.34895G/s items_per_second=15.2762M/s glibc memcpy,memcpy Google U
BM_Memcpy/8/1 104 ns 55.0 ns 12314624 bytes_per_cycle=0.583336/s bytes_per_second=834.468M/s items_per_second=18.1937M/s glibc memcpy,memcpy Google W
BM_Memcpy/9/1 932 ns 466 ns 1480704 bytes_per_cycle=3.17342/s bytes_per_second=4.43321G/s items_per_second=2.14679M/s glibc memcpy,uniform 384 to 4096
```
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D150202
|
|
|
|
|
|
|
|
|
|
|
|
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D142398
|
|
This is a first step to support GCC. This patch adds support for builtin and feature detection.
Differential Revision: https://reviews.llvm.org/D139712
|
|
`cpp::byte` is better than `char` which -depending on platform- can be `signed char` or `unsigned char`. This has introduced subtle arithmetic errors.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose.
This patch provides rationale for the framework, implementation and unit tests but the functions themselves are still using the previous version. The change in implementation will come in a follow up patch.
Differential Revision: https://reviews.llvm.org/D136292
|
|
This breaks llvm-libc build bots:
- libc-x86_64-debian-dbg-asan
- libc-x86_64-debian-fullbuild-dbg-asan
Address sanitizers fail with "AddressSanitizer: invalid alignment requested in aligned_alloc: 64, alignment must be a power of two and the requested size 0x41 must be a multiple of alignment (thread T0)"
- libc-aarch64-ubuntu-dbg
- libc-aarch64-ubuntu-fullbuild-dbg
https://lab.llvm.org/buildbot/#/builders/223/builds/8877/steps/7/logs/stdio
- libc-arm32-debian-dbg
https://lab.llvm.org/buildbot/#/builders/229/builds/5201/steps/7/logs/stdio
This reverts commit 903cc71a82431d79e5fb541946a9e7c93750e374.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose.
This patch provides rationale for the framework, implementation and unit tests but the functions themselves are still using the previous version. The change in implementation will come in a follow up patch.
Differential Revision: https://reviews.llvm.org/D136292
|
|
|