summaryrefslogtreecommitdiff
path: root/libc/src/string/memory_utils/utils.h
AgeCommit message (Collapse)Author
2025-09-12[libc] Some MSVC compatibility changes for src/string/memory_utils. (#158393)lntue
2025-07-23[libc][NFC] Add stdint.h proxy header to fix dependency issue with ↵lntue
<stdint.h> includes. (#150303) https://github.com/llvm/llvm-project/issues/149993
2025-06-26[libc] Improve memcpy for ARM Cortex-M supporting unaligned accesses. (#144872)Guillaume Chatelet
This implementation has been compiled with the [pigweed toolchain](https://pigweed.dev/toolchain.html) and tested on: - Raspberry Pi Pico 2 with the following options\ `--target=armv8m.main-none-eabi` `-march=armv8m.main+fp+dsp` `-mcpu=cortex-m33` - Raspberry Pi Pico with the following options\ `--target=armv6m-none-eabi` `-march=armv6m` `-mcpu=cortex-m0+` They both compile down to a little bit more than 200 bytes and are between 2 and 10 times faster than byte per byte copies. For best performance the following options can be set in the `libc/config/baremetal/arm/config.json` ``` { "codegen": { "LIBC_CONF_KEEP_FRAME_POINTER": { "value": false } }, "general": { "LIBC_ADD_NULL_CHECKS": { "value": false } } } ```
2025-03-10[libc] Add `-Wno-sign-conversion` & re-attempt `-Wconversion` (#129811)Vinay Deshmukh
Relates to https://github.com/llvm/llvm-project/issues/119281#issuecomment-2699470459
2025-03-05Revert "[libc] Enable -Wconversion for tests. (#127523)"Augie Fackler
This reverts commit 1e6e845d49a336e9da7ca6c576ec45c0b419b5f6 because it changed the 1st parameter of adjust() to be unsigned, but libc itself calls adjust() with a negative argument in align_backward() in op_generic.h.
2025-03-04[libc] Enable -Wconversion for tests. (#127523)Vinay Deshmukh
Relates to: #119281
2025-02-05[libc] Fix all imports of src/string/memory_utils (#114939)Krishna Pandey
Fixed imports for all files *within* `libc/src/string/memory_utils`. Note: This doesn't include **all** files that need to be fixed. Fixes #86579
2024-11-13[libc] Rename libc/src/__support/endian.h to endian_internal.h (#115950)Daniel Thornburgh
This prevents a conflict with the Linux system endian.h when built in overlay mode for CPP files in __support. This issue appeared in PR #106259.
2024-07-12[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)Petr Hosek
This is a part of #97655.
2024-07-12Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace ↵Mehdi Amini
declaration" (#98593) Reverts llvm/llvm-project#98075 bots are broken
2024-07-11[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075)Petr Hosek
This is a part of #97655.
2024-05-31[libc][NFC] Allow compilation of `memcpy` with `-m32` (#93790)Guillaume Chatelet
Needed to support i386 (#93709).
2024-03-27[libc] Remove obsolete LIBC_HAS_BUILTIN macro (#86554)Marc Auberer
Fixes #86546 and removes the macro `LIBC_HAS_BUILTIN`. This was necessary to support older compilers that did not support `__has_builtin`. All of the compilers we support already have this builtin. See: https://libc.llvm.org/compiler_support.html All uses now use `__has_builtin` directly cc @nickdesaulniers
2024-03-05[libc] fix readability-identifier-naming in memory_utils/utils.h (#83919)Nick Desaulniers
Fixes: libc/src/string/memory_utils/utils.h:345:13: warning: invalid case style for member 'offset_' [readability-identifier-naming] Having a trailing underscore for members is a google3 style, not LLVM style. Removing the underscore is insufficient, as we would then have 2 members with the same identifier which is not allowed (it is a compile time error). Remove the getter, and just access the renamed member that's now made public.
2024-03-05[libc] fix more readability-identifier-naming lints (#83914)Nick Desaulniers
Found via: $ ninja -k2000 libc-lint 2>&1 | grep readability-identifier-naming Auto fixed via: $ clang-tidy -p build/compile_commands.json \ -checks="-*,readability-identifier-naming" \ <filename> --fix This doesn't fix all instances, just the obvious simple cases where it makes sense to change the identifier names. Subsequent PRs will fix up the stragglers.
2024-02-28[libc] fix readability-identifier-naming.ConstexprFunctionCase (#83345)Nick Desaulniers
Codify that we use lower_case for readability-identifier-naming.ConstexprFunctionCase and then fix the 11 violations (rather than codify UPPER_CASE and have to fix the 170 violations).
2024-01-18[libc][NFC] Selectively disable GCC warnings (#78462)Guillaume Chatelet
2023-12-05[reland][libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h ↵Guillaume Chatelet
instead (#73939) (#74446) Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h` that was still using `deferred_static_assert`.
2023-12-05Revert "[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h ↵Guillaume Chatelet
instead" (#74444) Reverts llvm/llvm-project#73939 This broke libc-aarch64-ubuntu build bot https://lab.llvm.org/buildbot/#/builders/138/builds/56186
2023-12-05[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead (#73939)Guillaume Chatelet
2023-12-04[libc] Fix UB in memory utils (#74295)Guillaume Chatelet
The [standard](https://eel.is/c++draft/expr.add#4.3) forbids forming pointers to invalid objects even if the pointer is never read from or written to. This patch makes sure that we don't do pointer arithmetic on invalid pointers. Co-authored-by: Vitaly Buka <vitalybuka@google.com>
2023-11-10[libc] Adding a version of memset with software prefetching (#70857)doshimili
Software prefetching helps recover performance when hardware prefetching is disabled. The 'LIBC_COPT_MEMSET_X86_USE_SOFTWARE_PREFETCHING' compile time option allows users to use this patch.
2023-10-24[libc] Speed up memmove overlapping check (#70017)Dmitry Vyukov
Use a check that requries fewer instructions and cheaper. Current code: ``` 1b704: 48 39 f7 cmp %rsi,%rdi 1b707: 48 89 f0 mov %rsi,%rax 1b70a: 48 0f 47 c7 cmova %rdi,%rax 1b70e: 48 89 f9 mov %rdi,%rcx 1b711: 48 0f 47 ce cmova %rsi,%rcx 1b715: 48 01 d1 add %rdx,%rcx 1b718: 48 39 c1 cmp %rax,%rcx ``` New code: ``` 1b704: 48 89 f8 mov %rdi,%rax 1b707: 48 29 f0 sub %rsi,%rax 1b70a: 48 89 c1 mov %rax,%rcx 1b70d: 48 f7 d9 neg %rcx 1b710: 48 0f 48 c8 cmovs %rax,%rcx 1b714: 48 39 d1 cmp %rdx,%rcx ``` ``` │ baseline │ disjoint │ │ sec/op │ sec/op vs base │ memmove/Google_A 3.910n ± 0% 3.861n ± 1% -1.26% (p=0.000 n=50) ``` ``` │ baseline │ disjoint │ │ sec/op │ sec/op vs base │ memmove/1 2.724n ± 3% 2.441n ± 0% -10.37% (n=50) memmove/2 2.878n ± 0% 2.713n ± 0% -5.73% (n=50) memmove/3 2.835n ± 0% 2.593n ± 0% -8.54% (n=50) memmove/4 3.032n ± 0% 2.776n ± 0% -8.45% (p=0.000 n=50) memmove/5 2.833n ± 0% 2.600n ± 0% -8.20% (p=0.000 n=50) memmove/6 2.758n ± 0% 2.744n ± 0% -0.52% (p=0.000 n=50) memmove/7 2.762n ± 0% 2.744n ± 0% -0.63% (p=0.000 n=50) memmove/8 2.763n ± 0% 2.750n ± 0% -0.46% (p=0.000 n=50) memmove/9 3.182n ± 0% 3.269n ± 0% +2.75% (p=0.000 n=50) memmove/10 3.185n ± 0% 3.270n ± 0% +2.64% (p=0.000 n=50) memmove/11 3.188n ± 0% 3.277n ± 0% +2.79% (p=0.000 n=50) memmove/12 3.190n ± 0% 3.279n ± 0% +2.82% (p=0.000 n=50) memmove/13 3.194n ± 0% 3.281n ± 0% +2.73% (p=0.000 n=50) memmove/14 3.197n ± 0% 3.285n ± 0% +2.77% (p=0.000 n=50) memmove/15 3.198n ± 0% 3.282n ± 0% +2.62% (p=0.000 n=50) memmove/16 3.201n ± 0% 3.284n ± 0% +2.61% (p=0.000 n=50) memmove/17 3.564n ± 0% 3.320n ± 0% -6.86% (p=0.000 n=50) memmove/18 3.572n ± 0% 3.313n ± 0% -7.25% (p=0.000 n=50) memmove/19 3.572n ± 0% 3.325n ± 0% -6.94% (p=0.000 n=50) memmove/20 3.575n ± 0% 3.319n ± 0% -7.15% (p=0.000 n=50) memmove/21 3.578n ± 0% 3.327n ± 0% -7.03% (p=0.000 n=50) memmove/22 3.581n ± 0% 3.330n ± 0% -7.01% (p=0.000 n=50) memmove/23 3.582n ± 0% 3.354n ± 1% -6.37% (p=0.000 n=50) memmove/24 3.587n ± 0% 3.347n ± 1% -6.71% (p=0.000 n=50) memmove/25 3.591n ± 0% 3.320n ± 0% -7.55% (p=0.000 n=50) memmove/26 3.593n ± 0% 3.348n ± 0% -6.82% (p=0.000 n=50) memmove/27 3.596n ± 0% 3.346n ± 0% -6.94% (p=0.000 n=50) memmove/28 3.597n ± 0% 3.357n ± 0% -6.67% (p=0.000 n=50) memmove/29 3.601n ± 0% 3.340n ± 0% -7.23% (p=0.000 n=50) memmove/30 3.602n ± 0% 3.345n ± 0% -7.12% (p=0.000 n=50) memmove/31 3.608n ± 0% 3.357n ± 0% -6.94% (p=0.000 n=50) memmove/32 3.605n ± 0% 3.352n ± 0% -7.01% (p=0.000 n=50) memmove/33 4.128n ± 1% 3.829n ± 0% -7.23% (p=0.000 n=50) memmove/34 4.149n ± 0% 3.836n ± 0% -7.54% (p=0.000 n=50) memmove/35 4.134n ± 0% 3.839n ± 0% -7.15% (n=50) memmove/36 4.151n ± 0% 3.842n ± 0% -7.45% (n=50) memmove/37 4.152n ± 0% 3.841n ± 0% -7.49% (p=0.000 n=50) memmove/38 4.159n ± 0% 3.844n ± 0% -7.58% (p=0.000 n=50) memmove/39 4.165n ± 0% 3.841n ± 0% -7.78% (p=0.000 n=50) memmove/40 4.162n ± 0% 3.837n ± 0% -7.81% (p=0.000 n=50) memmove/41 4.161n ± 0% 3.845n ± 0% -7.58% (p=0.000 n=50) memmove/42 4.164n ± 0% 3.851n ± 0% -7.53% (p=0.000 n=50) memmove/43 4.165n ± 0% 3.843n ± 0% -7.74% (p=0.000 n=50) memmove/44 4.175n ± 0% 3.847n ± 0% -7.83% (p=0.000 n=50) memmove/45 4.170n ± 0% 3.849n ± 0% -7.70% (p=0.000 n=50) memmove/46 4.175n ± 0% 3.850n ± 0% -7.79% (p=0.000 n=50) memmove/47 4.180n ± 0% 3.851n ± 0% -7.87% (p=0.000 n=50) memmove/48 4.178n ± 0% 3.852n ± 0% -7.81% (p=0.000 n=50) memmove/49 4.175n ± 0% 3.851n ± 0% -7.76% (n=50) memmove/50 4.178n ± 0% 3.855n ± 0% -7.73% (p=0.000 n=50) memmove/51 4.190n ± 0% 3.859n ± 0% -7.91% (p=0.000 n=50) memmove/52 4.188n ± 0% 3.859n ± 0% -7.84% (p=0.000 n=50) memmove/53 4.191n ± 0% 3.863n ± 0% -7.82% (p=0.000 n=50) memmove/54 4.192n ± 0% 3.860n ± 0% -7.91% (p=0.000 n=50) memmove/55 4.192n ± 0% 3.869n ± 0% -7.70% (p=0.000 n=50) memmove/56 4.204n ± 0% 3.866n ± 0% -8.05% (p=0.000 n=50) memmove/57 4.198n ± 0% 3.864n ± 0% -7.95% (p=0.000 n=50) memmove/58 4.202n ± 0% 3.865n ± 0% -8.02% (p=0.000 n=50) memmove/59 4.208n ± 0% 3.868n ± 0% -8.09% (p=0.000 n=50) memmove/60 4.205n ± 0% 3.873n ± 0% -7.89% (p=0.000 n=50) memmove/61 4.212n ± 0% 3.872n ± 0% -8.08% (p=0.000 n=50) memmove/62 4.214n ± 0% 3.870n ± 0% -8.16% (p=0.000 n=50) memmove/63 4.215n ± 0% 3.877n ± 0% -8.02% (p=0.000 n=50) memmove/64 4.217n ± 0% 3.881n ± 0% -7.99% (p=0.000 n=50) memmove/65 4.990n ± 0% 4.683n ± 0% -6.15% (p=0.000 n=50) memmove/66 5.022n ± 0% 4.719n ± 0% -6.03% (p=0.000 n=50) memmove/67 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50) memmove/68 5.035n ± 0% 4.724n ± 0% -6.18% (p=0.000 n=50) memmove/69 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50) memmove/70 5.040n ± 0% 4.728n ± 0% -6.19% (p=0.000 n=50) memmove/71 5.053n ± 0% 4.728n ± 0% -6.43% (p=0.000 n=50) memmove/72 5.050n ± 0% 4.732n ± 0% -6.29% (p=0.000 n=50) memmove/73 5.049n ± 0% 4.733n ± 0% -6.24% (p=0.000 n=50) memmove/74 5.054n ± 0% 4.734n ± 0% -6.34% (p=0.000 n=50) memmove/75 5.063n ± 0% 4.736n ± 0% -6.46% (p=0.000 n=50) memmove/76 5.046n ± 0% 4.741n ± 0% -6.04% (p=0.000 n=50) memmove/77 5.057n ± 0% 4.741n ± 0% -6.25% (p=0.000 n=50) memmove/78 5.077n ± 0% 4.739n ± 0% -6.65% (p=0.000 n=50) memmove/79 5.074n ± 0% 4.746n ± 0% -6.46% (p=0.000 n=50) memmove/80 5.085n ± 0% 4.747n ± 0% -6.65% (p=0.000 n=50) memmove/81 5.077n ± 0% 4.735n ± 0% -6.74% (p=0.000 n=50) memmove/82 5.087n ± 0% 4.747n ± 0% -6.68% (p=0.000 n=50) memmove/83 5.087n ± 0% 4.754n ± 0% -6.56% (p=0.000 n=50) memmove/84 5.096n ± 0% 4.753n ± 0% -6.73% (p=0.000 n=50) memmove/85 5.082n ± 0% 4.749n ± 0% -6.55% (p=0.000 n=50) memmove/86 5.103n ± 0% 4.752n ± 0% -6.87% (p=0.000 n=50) memmove/87 5.096n ± 0% 4.760n ± 0% -6.61% (p=0.000 n=50) memmove/88 5.099n ± 0% 4.765n ± 0% -6.55% (p=0.000 n=50) memmove/89 5.104n ± 0% 4.757n ± 0% -6.79% (p=0.000 n=50) memmove/90 5.117n ± 0% 4.767n ± 0% -6.84% (p=0.000 n=50) memmove/91 5.100n ± 0% 4.766n ± 0% -6.54% (p=0.000 n=50) memmove/92 5.103n ± 0% 4.763n ± 0% -6.67% (p=0.000 n=50) memmove/93 5.115n ± 0% 4.772n ± 0% -6.71% (p=0.000 n=50) memmove/94 5.117n ± 0% 4.769n ± 0% -6.80% (p=0.000 n=50) memmove/95 5.131n ± 0% 4.775n ± 0% -6.94% (p=0.000 n=50) memmove/96 5.129n ± 0% 4.772n ± 0% -6.97% (p=0.000 n=50) memmove/97 5.130n ± 0% 4.764n ± 0% -7.13% (p=0.000 n=50) memmove/98 5.134n ± 0% 4.780n ± 0% -6.89% (p=0.000 n=50) memmove/99 5.141n ± 0% 4.780n ± 0% -7.03% (p=0.000 n=50) memmove/100 5.141n ± 0% 4.780n ± 0% -7.02% (p=0.000 n=50) memmove/101 5.150n ± 0% 4.782n ± 0% -7.14% (p=0.000 n=50) memmove/102 5.150n ± 0% 4.790n ± 0% -6.99% (p=0.000 n=50) memmove/103 5.156n ± 0% 4.788n ± 0% -7.14% (n=50) memmove/104 5.157n ± 0% 4.793n ± 0% -7.05% (p=0.000 n=50) memmove/105 5.147n ± 0% 4.791n ± 0% -6.90% (p=0.000 n=50) memmove/106 5.167n ± 0% 4.793n ± 0% -7.23% (p=0.000 n=50) memmove/107 5.165n ± 0% 4.801n ± 0% -7.06% (p=0.000 n=50) memmove/108 5.173n ± 0% 4.800n ± 0% -7.21% (p=0.000 n=50) memmove/109 5.173n ± 0% 4.797n ± 0% -7.27% (p=0.000 n=50) memmove/110 5.171n ± 0% 4.808n ± 0% -7.01% (p=0.000 n=50) memmove/111 5.180n ± 0% 4.799n ± 0% -7.36% (p=0.000 n=50) memmove/112 5.185n ± 0% 4.812n ± 0% -7.19% (p=0.000 n=50) memmove/113 5.187n ± 0% 4.797n ± 0% -7.53% (p=0.000 n=50) memmove/114 5.183n ± 0% 4.809n ± 0% -7.21% (n=50) memmove/115 5.193n ± 0% 4.811n ± 0% -7.36% (p=0.000 n=50) memmove/116 5.196n ± 0% 4.815n ± 0% -7.32% (p=0.000 n=50) memmove/117 5.199n ± 0% 4.816n ± 0% -7.37% (p=0.000 n=50) memmove/118 5.198n ± 0% 4.811n ± 0% -7.45% (p=0.000 n=50) memmove/119 5.203n ± 0% 4.818n ± 0% -7.40% (p=0.000 n=50) memmove/120 5.195n ± 0% 4.823n ± 0% -7.16% (p=0.000 n=50) memmove/121 5.203n ± 0% 4.812n ± 0% -7.51% (p=0.000 n=50) memmove/122 5.204n ± 0% 4.818n ± 0% -7.42% (n=50) memmove/123 5.202n ± 0% 4.822n ± 0% -7.31% (p=0.000 n=50) memmove/124 5.216n ± 0% 4.823n ± 0% -7.54% (p=0.000 n=50) memmove/125 5.227n ± 0% 4.823n ± 0% -7.72% (p=0.000 n=50) memmove/126 5.235n ± 0% 4.830n ± 0% -7.74% (p=0.000 n=50) memmove/127 5.237n ± 0% 4.833n ± 0% -7.72% (p=0.000 n=50) memmove/128 5.241n ± 0% 4.832n ± 0% -7.81% (p=0.000 n=50) memmove/129 6.460n ± 0% 5.858n ± 0% -9.31% (p=0.000 n=50) memmove/130 7.539n ± 0% 6.634n ± 0% -12.00% (p=0.000 n=50) memmove/131 7.542n ± 0% 6.623n ± 0% -12.18% (p=0.000 n=50) memmove/132 7.527n ± 0% 6.667n ± 1% -11.43% (p=0.000 n=50) memmove/133 7.521n ± 0% 6.631n ± 0% -11.83% (p=0.000 n=50) memmove/134 7.531n ± 0% 6.642n ± 0% -11.81% (p=0.000 n=50) memmove/135 7.541n ± 0% 6.692n ± 1% -11.25% (p=0.000 n=50) memmove/136 7.549n ± 0% 6.657n ± 0% -11.81% (p=0.000 n=50) memmove/137 7.544n ± 0% 6.646n ± 0% -11.90% (p=0.000 n=50) memmove/138 7.557n ± 0% 6.673n ± 1% -11.70% (p=0.000 n=50) memmove/139 7.545n ± 0% 6.654n ± 0% -11.81% (n=50) memmove/140 7.559n ± 0% 6.680n ± 1% -11.63% (p=0.000 n=50) memmove/141 7.560n ± 0% 6.664n ± 0% -11.85% (p=0.000 n=50) memmove/142 7.556n ± 0% 6.679n ± 0% -11.62% (p=0.000 n=50) memmove/143 7.570n ± 0% 6.683n ± 1% -11.71% (p=0.000 n=50) memmove/144 7.586n ± 0% 6.683n ± 0% -11.91% (p=0.000 n=50) memmove/145 7.593n ± 0% 6.665n ± 0% -12.22% (p=0.000 n=50) memmove/146 7.591n ± 0% 6.665n ± 0% -12.20% (p=0.000 n=50) memmove/147 7.598n ± 0% 6.665n ± 0% -12.27% (p=0.000 n=50) memmove/148 7.598n ± 0% 6.670n ± 0% -12.21% (p=0.000 n=50) memmove/149 7.593n ± 0% 6.691n ± 0% -11.88% (p=0.000 n=50) memmove/150 7.625n ± 0% 6.713n ± 1% -11.97% (p=0.000 n=50) memmove/151 7.603n ± 0% 6.710n ± 1% -11.74% (p=0.000 n=50) memmove/152 7.613n ± 0% 6.701n ± 1% -11.97% (p=0.000 n=50) memmove/153 7.595n ± 0% 6.710n ± 0% -11.65% (p=0.000 n=50) memmove/154 7.614n ± 0% 6.721n ± 0% -11.74% (p=0.000 n=50) memmove/155 7.615n ± 0% 6.709n ± 0% -11.89% (p=0.000 n=50) memmove/156 7.613n ± 0% 6.693n ± 0% -12.08% (p=0.000 n=50) memmove/157 7.628n ± 0% 6.708n ± 0% -12.05% (p=0.000 n=50) memmove/158 7.629n ± 0% 6.706n ± 0% -12.10% (p=0.000 n=50) memmove/159 7.639n ± 0% 6.724n ± 0% -11.98% (p=0.000 n=50) memmove/160 7.619n ± 0% 6.702n ± 0% -12.04% (p=0.000 n=50) memmove/161 7.653n ± 0% 6.698n ± 0% -12.49% (p=0.000 n=50) memmove/162 8.104n ± 0% 7.140n ± 1% -11.89% (p=0.000 n=50) memmove/163 8.141n ± 0% 7.187n ± 1% -11.72% (p=0.000 n=50) memmove/164 8.154n ± 0% 7.107n ± 0% -12.84% (p=0.000 n=50) memmove/165 8.143n ± 0% 7.117n ± 0% -12.59% (p=0.000 n=50) memmove/166 8.176n ± 0% 7.110n ± 0% -13.04% (p=0.000 n=50) memmove/167 8.194n ± 0% 7.168n ± 1% -12.52% (p=0.000 n=50) memmove/168 8.214n ± 0% 7.188n ± 1% -12.50% (p=0.000 n=50) memmove/169 8.220n ± 0% 7.242n ± 1% -11.90% (p=0.000 n=50) memmove/170 8.228n ± 0% 7.244n ± 1% -11.96% (p=0.000 n=50) memmove/171 8.263n ± 0% 7.184n ± 0% -13.06% (p=0.000 n=50) memmove/172 8.259n ± 0% 7.325n ± 1% -11.31% (p=0.000 n=50) memmove/173 8.271n ± 0% 7.225n ± 0% -12.65% (p=0.000 n=50) memmove/174 8.284n ± 0% 7.287n ± 1% -12.04% (p=0.000 n=50) memmove/175 8.289n ± 0% 7.282n ± 1% -12.15% (p=0.000 n=50) memmove/176 8.309n ± 0% 7.328n ± 1% -11.81% (p=0.000 n=50) memmove/177 8.317n ± 0% 7.264n ± 1% -12.67% (p=0.000 n=50) memmove/178 8.302n ± 0% 7.342n ± 1% -11.57% (p=0.000 n=50) memmove/179 8.309n ± 0% 7.357n ± 1% -11.45% (p=0.000 n=50) memmove/180 8.304n ± 0% 7.318n ± 1% -11.87% (p=0.000 n=50) memmove/181 8.312n ± 0% 7.363n ± 1% -11.42% (p=0.000 n=50) memmove/182 8.315n ± 0% 7.320n ± 1% -11.96% (p=0.000 n=50) memmove/183 8.330n ± 0% 7.286n ± 1% -12.53% (p=0.000 n=50) memmove/184 8.310n ± 0% 7.324n ± 1% -11.86% (p=0.000 n=50) memmove/185 8.303n ± 0% 7.267n ± 1% -12.47% (p=0.000 n=50) memmove/186 8.287n ± 0% 7.312n ± 1% -11.76% (p=0.000 n=50) memmove/187 8.298n ± 0% 7.395n ± 2% -10.88% (p=0.000 n=50) memmove/188 8.296n ± 0% 7.339n ± 1% -11.54% (p=0.000 n=50) memmove/189 8.306n ± 0% 7.299n ± 1% -12.12% (p=0.000 n=50) memmove/190 8.281n ± 0% 7.309n ± 1% -11.74% (p=0.000 n=50) memmove/191 8.299n ± 0% 7.282n ± 1% -12.26% (p=0.000 n=50) memmove/192 8.281n ± 0% 7.335n ± 1% -11.41% (p=0.000 n=50) memmove/193 8.299n ± 0% 7.325n ± 1% -11.74% (p=0.000 n=50) memmove/194 8.641n ± 0% 8.034n ± 0% -7.02% (p=0.000 n=50) memmove/195 8.667n ± 0% 8.073n ± 0% -6.85% (p=0.000 n=50) memmove/196 8.666n ± 0% 8.030n ± 0% -7.34% (p=0.000 n=50) memmove/197 8.660n ± 0% 8.096n ± 1% -6.51% (p=0.000 n=50) memmove/198 8.688n ± 0% 8.047n ± 0% -7.39% (p=0.000 n=50) memmove/199 8.678n ± 0% 8.061n ± 0% -7.11% (p=0.000 n=50) memmove/200 8.669n ± 0% 8.034n ± 0% -7.32% (p=0.000 n=50) memmove/201 8.692n ± 0% 8.061n ± 0% -7.26% (p=0.000 n=50) memmove/202 8.668n ± 0% 8.060n ± 0% -7.02% (p=0.000 n=50) memmove/203 8.687n ± 0% 8.066n ± 0% -7.15% (p=0.000 n=50) memmove/204 8.699n ± 0% 8.076n ± 0% -7.16% (p=0.000 n=50) memmove/205 8.676n ± 0% 8.085n ± 0% -6.82% (p=0.000 n=50) memmove/206 8.684n ± 0% 8.101n ± 1% -6.71% (p=0.000 n=50) memmove/207 8.725n ± 0% 8.099n ± 0% -7.18% (p=0.000 n=50) memmove/208 8.674n ± 0% 8.073n ± 0% -6.92% (p=0.000 n=50) memmove/209 8.697n ± 0% 8.088n ± 0% -7.01% (p=0.000 n=50) memmove/210 8.733n ± 0% 8.076n ± 0% -7.53% (p=0.000 n=50) memmove/211 8.732n ± 0% 8.104n ± 0% -7.19% (p=0.000 n=50) memmove/212 8.730n ± 0% 8.091n ± 0% -7.32% (p=0.000 n=50) memmove/213 8.728n ± 0% 8.100n ± 0% -7.19% (p=0.000 n=50) memmove/214 8.744n ± 1% 8.081n ± 1% -7.57% (p=0.000 n=50) memmove/215 8.734n ± 0% 8.150n ± 0% -6.68% (p=0.000 n=50) memmove/216 8.748n ± 0% 8.116n ± 0% -7.23% (p=0.000 n=50) memmove/217 8.751n ± 0% 8.129n ± 1% -7.11% (p=0.000 n=50) memmove/218 8.747n ± 0% 8.114n ± 0% -7.23% (p=0.000 n=50) memmove/219 8.733n ± 0% 8.159n ± 0% -6.57% (p=0.000 n=50) memmove/220 8.764n ± 0% 8.145n ± 0% -7.06% (p=0.000 n=50) memmove/221 8.764n ± 0% 8.142n ± 0% -7.10% (p=0.000 n=50) memmove/222 8.775n ± 0% 8.152n ± 0% -7.10% (p=0.000 n=50) memmove/223 8.771n ± 0% 8.143n ± 0% -7.16% (p=0.000 n=50) memmove/224 8.778n ± 0% 8.175n ± 1% -6.87% (p=0.000 n=50) memmove/225 8.794n ± 0% 8.138n ± 0% -7.45% (p=0.000 n=50) memmove/226 10.13n ± 0% 10.06n ± 0% -0.71% (p=0.000 n=50) memmove/227 10.14n ± 0% 10.08n ± 0% -0.53% (p=0.000 n=50) memmove/228 10.13n ± 0% 10.08n ± 0% -0.56% (p=0.000 n=50) memmove/229 10.17n ± 0% 10.11n ± 0% -0.56% (p=0.000 n=50) memmove/230 10.17n ± 0% 10.13n ± 0% -0.38% (p=0.003 n=50) memmove/231 10.16n ± 0% 10.12n ± 0% -0.41% (p=0.001 n=50) memmove/232 10.19n ± 0% 10.12n ± 0% -0.67% (p=0.000 n=50) memmove/233 10.21n ± 0% 10.14n ± 0% -0.71% (p=0.000 n=50) memmove/234 10.24n ± 0% 10.16n ± 0% -0.79% (p=0.000 n=50) memmove/235 10.24n ± 0% 10.16n ± 0% -0.76% (p=0.000 n=50) memmove/236 10.25n ± 0% 10.16n ± 0% -0.81% (p=0.000 n=50) memmove/237 10.24n ± 0% 10.17n ± 0% -0.69% (p=0.000 n=50) memmove/238 10.27n ± 0% 10.19n ± 0% -0.79% (p=0.000 n=50) memmove/239 10.29n ± 0% 10.19n ± 0% -0.90% (p=0.000 n=50) memmove/240 10.30n ± 0% 10.20n ± 0% -0.95% (p=0.000 n=50) memmove/241 10.29n ± 0% 10.20n ± 0% -0.91% (p=0.000 n=50) memmove/242 10.30n ± 0% 10.22n ± 0% -0.80% (p=0.000 n=50) memmove/243 10.32n ± 0% 10.23n ± 0% -0.87% (p=0.000 n=50) memmove/244 10.32n ± 0% 10.24n ± 0% -0.74% (p=0.000 n=50) memmove/245 10.33n ± 0% 10.23n ± 0% -0.97% (p=0.000 n=50) memmove/246 10.33n ± 0% 10.24n ± 0% -0.92% (p=0.000 n=50) memmove/247 10.31n ± 0% 10.24n ± 0% -0.69% (p=0.000 n=50) memmove/248 10.32n ± 0% 10.26n ± 0% -0.55% (p=0.000 n=50) memmove/249 10.33n ± 0% 10.28n ± 0% -0.52% (p=0.000 n=50) memmove/250 10.34n ± 0% 10.27n ± 0% -0.66% (p=0.000 n=50) memmove/251 10.32n ± 0% 10.27n ± 0% -0.45% (p=0.000 n=50) memmove/252 10.34n ± 0% 10.30n ± 0% -0.39% (p=0.005 n=50) memmove/253 10.33n ± 0% 10.27n ± 0% -0.57% (p=0.000 n=50) memmove/254 10.33n ± 0% 10.27n ± 0% -0.54% (p=0.000 n=50) memmove/255 10.34n ± 0% 10.29n ± 0% -0.50% (p=0.002 n=50) memmove/256 10.36n ± 0% 10.31n ± 0% -0.44% (p=0.006 n=50) memmove/257 10.33n ± 0% 10.29n ± 0% -0.36% (p=0.004 n=50) geomean 6.142n 5.696n -7.26% ```
2023-09-26[libc] Mass replace enclosing namespace (#67032)Guillaume Chatelet
This is step 4 of https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
2023-09-21[libc][clang-tidy] Add llvm-header-guard to get consistant naming and ↵Guillaume Chatelet
prevent file copy/paste issues. (#66477)
2023-08-07[libc] Clean up required LIBC_INLINE uses in src/stringRoland McGrath
This was generated using clang-tidy and clang-apply-replacements, on src/string/*.cpp for just the llvmlibc-inline-function-decl check, after applying https://reviews.llvm.org/D157164, and then some manual fixup. Reviewed By: abrachet Differential Revision: https://reviews.llvm.org/D157169
2023-06-30[libc] Fix more inline definitionsRoland McGrath
Fix a bunch more instances of incorrect use of the `static` keyword and missing use of LIBC_INLINE and LIBC_INLINE_VAR macros. Note that even forward declarations and generic template declarations must follow the prescribed patterns for libc code so that they match every definition, all template specializations. Reviewed By: Caslyn Differential Revision: https://reviews.llvm.org/D154260
2023-06-30[libc] Improve memcmp latency and codegenGuillaume Chatelet
This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717
2023-06-21Revert D148717 "[libc] Improve memcmp latency and codegen"Guillaume Chatelet
Once integrated in our codebase the patch triggered a bunch of failing tests. We do not yet understand where the bug is but we revert it to move forward with integration. This reverts commit 5e32765c15ab8df3d2635a2bb5078c5b1d5714d5.
2023-06-14[libc] Dispatch memmove to memcpy when buffers are disjointGuillaume Chatelet
Most of the time `memmove` is called on buffers that are disjoint, in that case we can use `memcpy` which is faster. The additional test is branchless on x86, aarch64 and RISCV with the zbb extension (bitmanip). On x86 this patch adds a latency of 2 to 3 cycles. Before ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------- BM_Memmove/0/0_median 5.00 ns 5.00 ns 10 bytes_per_cycle=1.25477/s bytes_per_second=2.62933G/s items_per_second=199.87M/s __llvm_libc::memmove,memmove Google A BM_Memmove/1/0_median 6.21 ns 6.21 ns 10 bytes_per_cycle=3.22173/s bytes_per_second=6.75106G/s items_per_second=160.955M/s __llvm_libc::memmove,memmove Google B BM_Memmove/2/0_median 8.09 ns 8.09 ns 10 bytes_per_cycle=5.31462/s bytes_per_second=11.1366G/s items_per_second=123.603M/s __llvm_libc::memmove,memmove Google D BM_Memmove/3/0_median 5.95 ns 5.95 ns 10 bytes_per_cycle=2.71865/s bytes_per_second=5.69687G/s items_per_second=167.967M/s __llvm_libc::memmove,memmove Google L BM_Memmove/4/0_median 5.63 ns 5.63 ns 10 bytes_per_cycle=2.28294/s bytes_per_second=4.78383G/s items_per_second=177.615M/s __llvm_libc::memmove,memmove Google M BM_Memmove/5/0_median 5.68 ns 5.68 ns 10 bytes_per_cycle=2.16798/s bytes_per_second=4.54295G/s items_per_second=176.015M/s __llvm_libc::memmove,memmove Google Q BM_Memmove/6/0_median 7.46 ns 7.46 ns 10 bytes_per_cycle=3.97619/s bytes_per_second=8.332G/s items_per_second=134.044M/s __llvm_libc::memmove,memmove Google S BM_Memmove/7/0_median 5.40 ns 5.40 ns 10 bytes_per_cycle=1.79695/s bytes_per_second=3.76546G/s items_per_second=185.211M/s __llvm_libc::memmove,memmove Google U BM_Memmove/8/0_median 5.62 ns 5.62 ns 10 bytes_per_cycle=3.18747/s bytes_per_second=6.67927G/s items_per_second=177.983M/s __llvm_libc::memmove,memmove Google W BM_Memmove/9/0_median 101 ns 101 ns 10 bytes_per_cycle=9.77359/s bytes_per_second=20.4803G/s items_per_second=9.9333M/s __llvm_libc::memmove,uniform 384 to 4096 ``` After ``` BM_Memmove/0/0_median 3.57 ns 3.57 ns 10 bytes_per_cycle=1.71375/s bytes_per_second=3.59112G/s items_per_second=280.411M/s __llvm_libc::memmove,memmove Google A BM_Memmove/1/0_median 4.52 ns 4.52 ns 10 bytes_per_cycle=4.47557/s bytes_per_second=9.37843G/s items_per_second=221.427M/s __llvm_libc::memmove,memmove Google B BM_Memmove/2/0_median 5.70 ns 5.70 ns 10 bytes_per_cycle=7.37396/s bytes_per_second=15.4519G/s items_per_second=175.399M/s __llvm_libc::memmove,memmove Google D BM_Memmove/3/0_median 4.47 ns 4.47 ns 10 bytes_per_cycle=3.4148/s bytes_per_second=7.15563G/s items_per_second=223.743M/s __llvm_libc::memmove,memmove Google L BM_Memmove/4/0_median 4.53 ns 4.53 ns 10 bytes_per_cycle=2.86071/s bytes_per_second=5.99454G/s items_per_second=220.69M/s __llvm_libc::memmove,memmove Google M BM_Memmove/5/0_median 4.19 ns 4.19 ns 10 bytes_per_cycle=2.5484/s bytes_per_second=5.3401G/s items_per_second=238.924M/s __llvm_libc::memmove,memmove Google Q BM_Memmove/6/0_median 5.02 ns 5.02 ns 10 bytes_per_cycle=5.94164/s bytes_per_second=12.4505G/s items_per_second=199.14M/s __llvm_libc::memmove,memmove Google S BM_Memmove/7/0_median 4.03 ns 4.03 ns 10 bytes_per_cycle=2.47028/s bytes_per_second=5.17641G/s items_per_second=247.906M/s __llvm_libc::memmove,memmove Google U BM_Memmove/8/0_median 4.70 ns 4.70 ns 10 bytes_per_cycle=3.84975/s bytes_per_second=8.06706G/s items_per_second=212.72M/s __llvm_libc::memmove,memmove Google W BM_Memmove/9/0_median 90.7 ns 90.7 ns 10 bytes_per_cycle=10.8681/s bytes_per_second=22.7739G/s items_per_second=11.02M/s __llvm_libc::memmove,uniform 384 to 4096 ``` Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D152811
2023-06-12[libc] Improve memcmp latency and codegenGuillaume Chatelet
This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717
2023-06-12Revert D148717 "[libc] Improve memcmp latency and codegen"Guillaume Chatelet
This broke aarch64 debug buildbot https://lab.llvm.org/buildbot/#/builders/223/builds/21703 This reverts commit bd4f978754758d5ef29d1f10370f45362da3de37.
2023-06-12[libc] Improve memcmp latency and codegenGuillaume Chatelet
This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717
2023-06-05Revert D148717 "[libc] Improve memcmp latency and codegen"Guillaume Chatelet
This reverts commit 9ec6ebd3ceabb29482aa18a64b943788b65223dc. The patch broke RISCV and aarch64 builtbots.
2023-06-05[libc] Improve memcmp latency and codegenGuillaume Chatelet
This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717
2023-05-23[libc][math] Implement double precision log2 function correctly rounded to ↵Tue Ly
all rounding modes. Implement double precision log2 function correctly rounded to all rounding modes. See https://reviews.llvm.org/D150014 for a more detail description of the algorithm. **Performance** - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%. - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log2 --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log2 --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 177.632 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 231.332 -- LIBC latency -- with FMA 459.751 -- LIBC latency -- without FMA 463.850 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D150374
2023-05-16[libc] Add optimized bcmp for RISCVGuillaume Chatelet
[libc] Add optimized bcmp for RISCV This patch adds two versions of bcmp optimized for architectures where unaligned accesses are either illegal or extremely slow. It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well. Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Bcmp on a quad core Linux starfive RISCV 64 board running at 1.5GHz. Before ``` Run on (4 X 1500 MHz CPU s) CPU Caches: L1 Instruction 32 KiB (x4) L1 Data 32 KiB (x4) L2 Unified 2048 KiB (x1) Load Average: 7.03, 5.98, 3.71 ---------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------- BM_Bcmp/0/0 102 ns 60.5 ns 11662336 bytes_per_cycle=0.122696/s bytes_per_second=175.518M/s items_per_second=16.5258M/s __llvm_libc::bcmp,memcmp Google A BM_Bcmp/1/0 328 ns 172 ns 3737600 bytes_per_cycle=0.15256/s bytes_per_second=218.238M/s items_per_second=5.80575M/s __llvm_libc::bcmp,memcmp Google B BM_Bcmp/2/0 199 ns 99.7 ns 7019520 bytes_per_cycle=0.141897/s bytes_per_second=202.986M/s items_per_second=10.032M/s __llvm_libc::bcmp,memcmp Google D BM_Bcmp/3/0 173 ns 86.5 ns 8361984 bytes_per_cycle=0.13863/s bytes_per_second=198.312M/s items_per_second=11.5669M/s __llvm_libc::bcmp,memcmp Google L BM_Bcmp/4/0 105 ns 51.8 ns 13213696 bytes_per_cycle=0.116399/s bytes_per_second=166.51M/s items_per_second=19.2931M/s __llvm_libc::bcmp,memcmp Google M BM_Bcmp/5/0 167 ns 93.9 ns 7853056 bytes_per_cycle=0.139432/s bytes_per_second=199.459M/s items_per_second=10.6503M/s __llvm_libc::bcmp,memcmp Google Q BM_Bcmp/6/0 262 ns 165 ns 3931136 bytes_per_cycle=0.151516/s bytes_per_second=216.745M/s items_per_second=6.07091M/s __llvm_libc::bcmp,memcmp Google S BM_Bcmp/7/0 168 ns 105 ns 6665216 bytes_per_cycle=0.143159/s bytes_per_second=204.791M/s items_per_second=9.52163M/s __llvm_libc::bcmp,memcmp Google U BM_Bcmp/8/0 108 ns 68.0 ns 10175488 bytes_per_cycle=0.125504/s bytes_per_second=179.535M/s items_per_second=14.701M/s __llvm_libc::bcmp,memcmp Google W BM_Bcmp/9/0 15371 ns 9007 ns 78848 bytes_per_cycle=0.166128/s bytes_per_second=237.648M/s items_per_second=111.031k/s __llvm_libc::bcmp,uniform 384 to 4096 ``` After ``` BM_Bcmp/0/0 74.2 ns 49.7 ns 14306304 bytes_per_cycle=0.148927/s bytes_per_second=213.042M/s items_per_second=20.1101M/s __llvm_libc::bcmp,memcmp Google A BM_Bcmp/1/0 108 ns 68.1 ns 10350592 bytes_per_cycle=0.411197/s bytes_per_second=588.222M/s items_per_second=14.6849M/s __llvm_libc::bcmp,memcmp Google B BM_Bcmp/2/0 80.2 ns 56.0 ns 12386304 bytes_per_cycle=0.258588/s bytes_per_second=369.912M/s items_per_second=17.8585M/s __llvm_libc::bcmp,memcmp Google D BM_Bcmp/3/0 92.4 ns 55.7 ns 12555264 bytes_per_cycle=0.206835/s bytes_per_second=295.88M/s items_per_second=17.943M/s __llvm_libc::bcmp,memcmp Google L BM_Bcmp/4/0 79.3 ns 46.8 ns 14288896 bytes_per_cycle=0.125872/s bytes_per_second=180.061M/s items_per_second=21.3611M/s __llvm_libc::bcmp,memcmp Google M BM_Bcmp/5/0 98.0 ns 57.9 ns 12232704 bytes_per_cycle=0.268815/s bytes_per_second=384.543M/s items_per_second=17.2711M/s __llvm_libc::bcmp,memcmp Google Q BM_Bcmp/6/0 132 ns 65.5 ns 10474496 bytes_per_cycle=0.417246/s bytes_per_second=596.875M/s items_per_second=15.2673M/s __llvm_libc::bcmp,memcmp Google S BM_Bcmp/7/0 101 ns 60.9 ns 11505664 bytes_per_cycle=0.253733/s bytes_per_second=362.968M/s items_per_second=16.4202M/s __llvm_libc::bcmp,memcmp Google U BM_Bcmp/8/0 72.5 ns 50.2 ns 14082048 bytes_per_cycle=0.183262/s bytes_per_second=262.158M/s items_per_second=19.9271M/s __llvm_libc::bcmp,memcmp Google W BM_Bcmp/9/0 852 ns 803 ns 854016 bytes_per_cycle=1.85028/s bytes_per_second=2.58481G/s items_per_second=1.24597M/s __llvm_libc::bcmp,uniform 384 to 4096 ``` For comparison with glibc ``` BM_Bcmp/0/0 106 ns 52.6 ns 12906496 bytes_per_cycle=0.142072/s bytes_per_second=203.235M/s items_per_second=19.0271M/s glibc::bcmp,memcmp Google A BM_Bcmp/1/0 132 ns 77.1 ns 8905728 bytes_per_cycle=0.365072/s bytes_per_second=522.239M/s items_per_second=12.9782M/s glibc::bcmp,memcmp Google B BM_Bcmp/2/0 122 ns 62.3 ns 10909696 bytes_per_cycle=0.222667/s bytes_per_second=318.527M/s items_per_second=16.0563M/s glibc::bcmp,memcmp Google D BM_Bcmp/3/0 99.5 ns 64.2 ns 11074560 bytes_per_cycle=0.185126/s bytes_per_second=264.825M/s items_per_second=15.5674M/s glibc::bcmp,memcmp Google L BM_Bcmp/4/0 86.6 ns 50.2 ns 13488128 bytes_per_cycle=0.117941/s bytes_per_second=168.717M/s items_per_second=19.9053M/s glibc::bcmp,memcmp Google M BM_Bcmp/5/0 106 ns 61.4 ns 11344896 bytes_per_cycle=0.248968/s bytes_per_second=356.151M/s items_per_second=16.284M/s glibc::bcmp,memcmp Google Q BM_Bcmp/6/0 145 ns 71.9 ns 10046464 bytes_per_cycle=0.389814/s bytes_per_second=557.633M/s items_per_second=13.9019M/s glibc::bcmp,memcmp Google S BM_Bcmp/7/0 119 ns 65.6 ns 10718208 bytes_per_cycle=0.243756/s bytes_per_second=348.696M/s items_per_second=15.2329M/s glibc::bcmp,memcmp Google U BM_Bcmp/8/0 86.4 ns 54.5 ns 13250560 bytes_per_cycle=0.154831/s bytes_per_second=221.488M/s items_per_second=18.3532M/s glibc::bcmp,memcmp Google W BM_Bcmp/9/0 1090 ns 604 ns 1186816 bytes_per_cycle=2.53848/s bytes_per_second=3.54622G/s items_per_second=1.65598M/s glibc::bcmp,uniform 384 to 4096 ``` Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D150567
2023-05-10[libc] Add optimized memcpy for RISCVGuillaume Chatelet
This patch adds two versions of memcpy optimized for architectures where unaligned accesses are either illegal or extremely slow. It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well. Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memcpy` on a quad core Linux starfive RISCV 64 board running at 1.5GHz. Before: ``` Run on (4 X 1500 MHz CPU s) CPU Caches: L1 Instruction 32 KiB (x4) L1 Data 32 KiB (x4) L2 Unified 2048 KiB (x1) ------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------ BM_Memcpy/0/0 474 ns 474 ns 1483776 bytes_per_cycle=0.243492/s bytes_per_second=348.318M/s items_per_second=2.11097M/s __llvm_libc::memcpy,memcpy Google A BM_Memcpy/1/0 210 ns 209 ns 3649536 bytes_per_cycle=0.233819/s bytes_per_second=334.481M/s items_per_second=4.77519M/s __llvm_libc::memcpy,memcpy Google B BM_Memcpy/2/0 1814 ns 1814 ns 396288 bytes_per_cycle=0.247899/s bytes_per_second=354.622M/s items_per_second=551.402k/s __llvm_libc::memcpy,memcpy Google D BM_Memcpy/3/0 89.3 ns 89.2 ns 7459840 bytes_per_cycle=0.217415/s bytes_per_second=311.014M/s items_per_second=11.2071M/s __llvm_libc::memcpy,memcpy Google L BM_Memcpy/4/0 134 ns 134 ns 3815424 bytes_per_cycle=0.226584/s bytes_per_second=324.131M/s items_per_second=7.44567M/s __llvm_libc::memcpy,memcpy Google M BM_Memcpy/5/0 52.8 ns 52.6 ns 11001856 bytes_per_cycle=0.194893/s bytes_per_second=278.797M/s items_per_second=19.0284M/s __llvm_libc::memcpy,memcpy Google Q BM_Memcpy/6/0 180 ns 180 ns 4101120 bytes_per_cycle=0.231884/s bytes_per_second=331.713M/s items_per_second=5.55957M/s __llvm_libc::memcpy,memcpy Google S BM_Memcpy/7/0 195 ns 195 ns 3906560 bytes_per_cycle=0.232951/s bytes_per_second=333.239M/s items_per_second=5.1217M/s __llvm_libc::memcpy,memcpy Google U BM_Memcpy/8/0 152 ns 152 ns 4789248 bytes_per_cycle=0.227507/s bytes_per_second=325.452M/s items_per_second=6.58187M/s __llvm_libc::memcpy,memcpy Google W BM_Memcpy/9/0 6036 ns 6033 ns 118784 bytes_per_cycle=0.249158/s bytes_per_second=356.423M/s items_per_second=165.75k/s __llvm_libc::memcpy,uniform 384 to 4096 ``` After: ``` BM_Memcpy/0/0 126 ns 126 ns 5770240 bytes_per_cycle=1.04707/s bytes_per_second=1.46273G/s items_per_second=7.9385M/s __llvm_libc::memcpy,memcpy Google A BM_Memcpy/1/0 75.1 ns 75.0 ns 10204160 bytes_per_cycle=0.691143/s bytes_per_second=988.687M/s items_per_second=13.3289M/s __llvm_libc::memcpy,memcpy Google B BM_Memcpy/2/0 333 ns 333 ns 2174976 bytes_per_cycle=1.39297/s bytes_per_second=1.94596G/s items_per_second=3.00002M/s __llvm_libc::memcpy,memcpy Google D BM_Memcpy/3/0 49.6 ns 49.5 ns 16092160 bytes_per_cycle=0.710161/s bytes_per_second=1015.89M/s items_per_second=20.1844M/s __llvm_libc::memcpy,memcpy Google L BM_Memcpy/4/0 57.7 ns 57.7 ns 11213824 bytes_per_cycle=0.561557/s bytes_per_second=803.314M/s items_per_second=17.3228M/s __llvm_libc::memcpy,memcpy Google M BM_Memcpy/5/0 48.0 ns 47.9 ns 16437248 bytes_per_cycle=0.346708/s bytes_per_second=495.97M/s items_per_second=20.8571M/s __llvm_libc::memcpy,memcpy Google Q BM_Memcpy/6/0 67.5 ns 67.5 ns 10616832 bytes_per_cycle=0.614173/s bytes_per_second=878.582M/s items_per_second=14.8142M/s __llvm_libc::memcpy,memcpy Google S BM_Memcpy/7/0 84.7 ns 84.6 ns 10480640 bytes_per_cycle=0.819077/s bytes_per_second=1.14424G/s items_per_second=11.8174M/s __llvm_libc::memcpy,memcpy Google U BM_Memcpy/8/0 61.7 ns 61.6 ns 11191296 bytes_per_cycle=0.550078/s bytes_per_second=786.893M/s items_per_second=16.2279M/s __llvm_libc::memcpy,memcpy Google W BM_Memcpy/9/0 981 ns 981 ns 703488 bytes_per_cycle=1.52333/s bytes_per_second=2.12807G/s items_per_second=1019.81k/s __llvm_libc::memcpy,uniform 384 to 4096 ``` It is not as good as glibc for now so there's room for improvement. I suspect a path pumping 16 bytes at once given the doubled numbers for large copies. ``` BM_Memcpy/0/1 146 ns 82.5 ns 8576000 bytes_per_cycle=1.35236/s bytes_per_second=1.88922G/s items_per_second=12.1169M/s glibc memcpy,memcpy Google A BM_Memcpy/1/1 112 ns 63.7 ns 10634240 bytes_per_cycle=0.628018/s bytes_per_second=898.387M/s items_per_second=15.702M/s glibc memcpy,memcpy Google B BM_Memcpy/2/1 315 ns 180 ns 4079616 bytes_per_cycle=2.65229/s bytes_per_second=3.7052G/s items_per_second=5.54764M/s glibc memcpy,memcpy Google D BM_Memcpy/3/1 85.3 ns 43.1 ns 15854592 bytes_per_cycle=0.774164/s bytes_per_second=1107.45M/s items_per_second=23.2249M/s glibc memcpy,memcpy Google L BM_Memcpy/4/1 105 ns 54.3 ns 13427712 bytes_per_cycle=0.7793/s bytes_per_second=1114.8M/s items_per_second=18.4109M/s glibc memcpy,memcpy Google M BM_Memcpy/5/1 77.1 ns 43.2 ns 16476160 bytes_per_cycle=0.279808/s bytes_per_second=400.269M/s items_per_second=23.1428M/s glibc memcpy,memcpy Google Q BM_Memcpy/6/1 112 ns 62.7 ns 11236352 bytes_per_cycle=0.676078/s bytes_per_second=967.137M/s items_per_second=15.9387M/s glibc memcpy,memcpy Google S BM_Memcpy/7/1 131 ns 65.5 ns 11751424 bytes_per_cycle=0.965616/s bytes_per_second=1.34895G/s items_per_second=15.2762M/s glibc memcpy,memcpy Google U BM_Memcpy/8/1 104 ns 55.0 ns 12314624 bytes_per_cycle=0.583336/s bytes_per_second=834.468M/s items_per_second=18.1937M/s glibc memcpy,memcpy Google W BM_Memcpy/9/1 932 ns 466 ns 1480704 bytes_per_cycle=3.17342/s bytes_per_second=4.43321G/s items_per_second=2.14679M/s glibc memcpy,uniform 384 to 4096 ``` Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D150202
2023-02-09[libc] Introduce a config macro fileGuillaume Chatelet
2023-02-09[libc][NFC] separate macros in several targetsGuillaume Chatelet
2023-02-09[libc][NFC] Move compiler_features.h to properties subfolderGuillaume Chatelet
2023-02-07[libc][NFC] Rename compiler_feature macrosGuillaume Chatelet
2023-02-07[libc][NFC] Move compiler_features to macros folderGuillaume Chatelet
2023-01-24[libc][NFC] Another round of replacement of static inline with LIBC_INLINE.Siva Chandra Reddy
Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D142398
2022-12-13[libc] Add compiler, builtin and feature detectionGuillaume Chatelet
This is a first step to support GCC. This patch adds support for builtin and feature detection. Differential Revision: https://reviews.llvm.org/D139712
2022-10-24[libc] Use cpp::byte instead of char in mem* functionsGuillaume Chatelet
`cpp::byte` is better than `char` which -depending on platform- can be `signed char` or `unsigned char`. This has introduced subtle arithmetic errors.
2022-10-24[libc] mem* framework v3Guillaume Chatelet
This version is more composable and also simpler at the expense of being more explicit and more verbose. This patch provides rationale for the framework, implementation and unit tests but the functions themselves are still using the previous version. The change in implementation will come in a follow up patch. Differential Revision: https://reviews.llvm.org/D136292
2022-10-20Revert D136292 "[libc] mem* framework v3"Guillaume Chatelet
This breaks llvm-libc build bots: - libc-x86_64-debian-dbg-asan - libc-x86_64-debian-fullbuild-dbg-asan Address sanitizers fail with "AddressSanitizer: invalid alignment requested in aligned_alloc: 64, alignment must be a power of two and the requested size 0x41 must be a multiple of alignment (thread T0)" - libc-aarch64-ubuntu-dbg - libc-aarch64-ubuntu-fullbuild-dbg https://lab.llvm.org/buildbot/#/builders/223/builds/8877/steps/7/logs/stdio - libc-arm32-debian-dbg https://lab.llvm.org/buildbot/#/builders/229/builds/5201/steps/7/logs/stdio This reverts commit 903cc71a82431d79e5fb541946a9e7c93750e374.
2022-10-20[libc] mem* framework v3Guillaume Chatelet
This version is more composable and also simpler at the expense of being more explicit and more verbose. This patch provides rationale for the framework, implementation and unit tests but the functions themselves are still using the previous version. The change in implementation will come in a follow up patch. Differential Revision: https://reviews.llvm.org/D136292
2022-10-18[libc][NFC] Cleanup and document utils.hGuillaume Chatelet