| Age | Commit message (Collapse) | Author |
|
|
|
Relates to
https://github.com/llvm/llvm-project/issues/119281#issuecomment-2699470459
|
|
This reverts commit 1e6e845d49a336e9da7ca6c576ec45c0b419b5f6 because it
changed the 1st parameter of adjust() to be unsigned, but libc itself
calls adjust() with a negative argument in align_backward() in
op_generic.h.
|
|
Relates to: #119281
|
|
This is a part of #97655.
|
|
declaration" (#98593)
Reverts llvm/llvm-project#98075
bots are broken
|
|
This is a part of #97655.
|
|
instead (#73939) (#74446)
Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h`
that was still using `deferred_static_assert`.
|
|
instead" (#74444)
Reverts llvm/llvm-project#73939
This broke libc-aarch64-ubuntu build bot
https://lab.llvm.org/buildbot/#/builders/138/builds/56186
|
|
|
|
This is step 4 of
https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
|
|
Most of the time `memmove` is called on buffers that are disjoint, in that case we can use `memcpy` which is faster.
The additional test is branchless on x86, aarch64 and RISCV with the zbb extension (bitmanip).
On x86 this patch adds a latency of 2 to 3 cycles.
Before
```
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------
BM_Memmove/0/0_median 5.00 ns 5.00 ns 10 bytes_per_cycle=1.25477/s bytes_per_second=2.62933G/s items_per_second=199.87M/s __llvm_libc::memmove,memmove Google A
BM_Memmove/1/0_median 6.21 ns 6.21 ns 10 bytes_per_cycle=3.22173/s bytes_per_second=6.75106G/s items_per_second=160.955M/s __llvm_libc::memmove,memmove Google B
BM_Memmove/2/0_median 8.09 ns 8.09 ns 10 bytes_per_cycle=5.31462/s bytes_per_second=11.1366G/s items_per_second=123.603M/s __llvm_libc::memmove,memmove Google D
BM_Memmove/3/0_median 5.95 ns 5.95 ns 10 bytes_per_cycle=2.71865/s bytes_per_second=5.69687G/s items_per_second=167.967M/s __llvm_libc::memmove,memmove Google L
BM_Memmove/4/0_median 5.63 ns 5.63 ns 10 bytes_per_cycle=2.28294/s bytes_per_second=4.78383G/s items_per_second=177.615M/s __llvm_libc::memmove,memmove Google M
BM_Memmove/5/0_median 5.68 ns 5.68 ns 10 bytes_per_cycle=2.16798/s bytes_per_second=4.54295G/s items_per_second=176.015M/s __llvm_libc::memmove,memmove Google Q
BM_Memmove/6/0_median 7.46 ns 7.46 ns 10 bytes_per_cycle=3.97619/s bytes_per_second=8.332G/s items_per_second=134.044M/s __llvm_libc::memmove,memmove Google S
BM_Memmove/7/0_median 5.40 ns 5.40 ns 10 bytes_per_cycle=1.79695/s bytes_per_second=3.76546G/s items_per_second=185.211M/s __llvm_libc::memmove,memmove Google U
BM_Memmove/8/0_median 5.62 ns 5.62 ns 10 bytes_per_cycle=3.18747/s bytes_per_second=6.67927G/s items_per_second=177.983M/s __llvm_libc::memmove,memmove Google W
BM_Memmove/9/0_median 101 ns 101 ns 10 bytes_per_cycle=9.77359/s bytes_per_second=20.4803G/s items_per_second=9.9333M/s __llvm_libc::memmove,uniform 384 to 4096
```
After
```
BM_Memmove/0/0_median 3.57 ns 3.57 ns 10 bytes_per_cycle=1.71375/s bytes_per_second=3.59112G/s items_per_second=280.411M/s __llvm_libc::memmove,memmove Google A
BM_Memmove/1/0_median 4.52 ns 4.52 ns 10 bytes_per_cycle=4.47557/s bytes_per_second=9.37843G/s items_per_second=221.427M/s __llvm_libc::memmove,memmove Google B
BM_Memmove/2/0_median 5.70 ns 5.70 ns 10 bytes_per_cycle=7.37396/s bytes_per_second=15.4519G/s items_per_second=175.399M/s __llvm_libc::memmove,memmove Google D
BM_Memmove/3/0_median 4.47 ns 4.47 ns 10 bytes_per_cycle=3.4148/s bytes_per_second=7.15563G/s items_per_second=223.743M/s __llvm_libc::memmove,memmove Google L
BM_Memmove/4/0_median 4.53 ns 4.53 ns 10 bytes_per_cycle=2.86071/s bytes_per_second=5.99454G/s items_per_second=220.69M/s __llvm_libc::memmove,memmove Google M
BM_Memmove/5/0_median 4.19 ns 4.19 ns 10 bytes_per_cycle=2.5484/s bytes_per_second=5.3401G/s items_per_second=238.924M/s __llvm_libc::memmove,memmove Google Q
BM_Memmove/6/0_median 5.02 ns 5.02 ns 10 bytes_per_cycle=5.94164/s bytes_per_second=12.4505G/s items_per_second=199.14M/s __llvm_libc::memmove,memmove Google S
BM_Memmove/7/0_median 4.03 ns 4.03 ns 10 bytes_per_cycle=2.47028/s bytes_per_second=5.17641G/s items_per_second=247.906M/s __llvm_libc::memmove,memmove Google U
BM_Memmove/8/0_median 4.70 ns 4.70 ns 10 bytes_per_cycle=3.84975/s bytes_per_second=8.06706G/s items_per_second=212.72M/s __llvm_libc::memmove,memmove Google W
BM_Memmove/9/0_median 90.7 ns 90.7 ns 10 bytes_per_cycle=10.8681/s bytes_per_second=22.7739G/s items_per_second=11.02M/s __llvm_libc::memmove,uniform 384 to 4096
```
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D152811
|
|
all rounding modes.
Implement double precision log2 function correctly rounded to all
rounding modes.
See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.
**Performance**
- For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%.
- Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call;
-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call;
-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call;
-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call;
```
- Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2 --latency
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call;
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call;
-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call;
-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call;
-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call;
```
- Accurate pass latency:
```
$ ./perf.sh log2 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
177.632
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
231.332
-- LIBC latency -- with FMA
459.751
-- LIBC latency -- without FMA
463.850
```
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D150374
|
|
This patch adds two versions of memcpy optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.
Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memcpy` on a quad core Linux starfive RISCV 64 board running at 1.5GHz.
Before:
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
L1 Instruction 32 KiB (x4)
L1 Data 32 KiB (x4)
L2 Unified 2048 KiB (x1)
------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------
BM_Memcpy/0/0 474 ns 474 ns 1483776 bytes_per_cycle=0.243492/s bytes_per_second=348.318M/s items_per_second=2.11097M/s __llvm_libc::memcpy,memcpy Google A
BM_Memcpy/1/0 210 ns 209 ns 3649536 bytes_per_cycle=0.233819/s bytes_per_second=334.481M/s items_per_second=4.77519M/s __llvm_libc::memcpy,memcpy Google B
BM_Memcpy/2/0 1814 ns 1814 ns 396288 bytes_per_cycle=0.247899/s bytes_per_second=354.622M/s items_per_second=551.402k/s __llvm_libc::memcpy,memcpy Google D
BM_Memcpy/3/0 89.3 ns 89.2 ns 7459840 bytes_per_cycle=0.217415/s bytes_per_second=311.014M/s items_per_second=11.2071M/s __llvm_libc::memcpy,memcpy Google L
BM_Memcpy/4/0 134 ns 134 ns 3815424 bytes_per_cycle=0.226584/s bytes_per_second=324.131M/s items_per_second=7.44567M/s __llvm_libc::memcpy,memcpy Google M
BM_Memcpy/5/0 52.8 ns 52.6 ns 11001856 bytes_per_cycle=0.194893/s bytes_per_second=278.797M/s items_per_second=19.0284M/s __llvm_libc::memcpy,memcpy Google Q
BM_Memcpy/6/0 180 ns 180 ns 4101120 bytes_per_cycle=0.231884/s bytes_per_second=331.713M/s items_per_second=5.55957M/s __llvm_libc::memcpy,memcpy Google S
BM_Memcpy/7/0 195 ns 195 ns 3906560 bytes_per_cycle=0.232951/s bytes_per_second=333.239M/s items_per_second=5.1217M/s __llvm_libc::memcpy,memcpy Google U
BM_Memcpy/8/0 152 ns 152 ns 4789248 bytes_per_cycle=0.227507/s bytes_per_second=325.452M/s items_per_second=6.58187M/s __llvm_libc::memcpy,memcpy Google W
BM_Memcpy/9/0 6036 ns 6033 ns 118784 bytes_per_cycle=0.249158/s bytes_per_second=356.423M/s items_per_second=165.75k/s __llvm_libc::memcpy,uniform 384 to 4096
```
After:
```
BM_Memcpy/0/0 126 ns 126 ns 5770240 bytes_per_cycle=1.04707/s bytes_per_second=1.46273G/s items_per_second=7.9385M/s __llvm_libc::memcpy,memcpy Google A
BM_Memcpy/1/0 75.1 ns 75.0 ns 10204160 bytes_per_cycle=0.691143/s bytes_per_second=988.687M/s items_per_second=13.3289M/s __llvm_libc::memcpy,memcpy Google B
BM_Memcpy/2/0 333 ns 333 ns 2174976 bytes_per_cycle=1.39297/s bytes_per_second=1.94596G/s items_per_second=3.00002M/s __llvm_libc::memcpy,memcpy Google D
BM_Memcpy/3/0 49.6 ns 49.5 ns 16092160 bytes_per_cycle=0.710161/s bytes_per_second=1015.89M/s items_per_second=20.1844M/s __llvm_libc::memcpy,memcpy Google L
BM_Memcpy/4/0 57.7 ns 57.7 ns 11213824 bytes_per_cycle=0.561557/s bytes_per_second=803.314M/s items_per_second=17.3228M/s __llvm_libc::memcpy,memcpy Google M
BM_Memcpy/5/0 48.0 ns 47.9 ns 16437248 bytes_per_cycle=0.346708/s bytes_per_second=495.97M/s items_per_second=20.8571M/s __llvm_libc::memcpy,memcpy Google Q
BM_Memcpy/6/0 67.5 ns 67.5 ns 10616832 bytes_per_cycle=0.614173/s bytes_per_second=878.582M/s items_per_second=14.8142M/s __llvm_libc::memcpy,memcpy Google S
BM_Memcpy/7/0 84.7 ns 84.6 ns 10480640 bytes_per_cycle=0.819077/s bytes_per_second=1.14424G/s items_per_second=11.8174M/s __llvm_libc::memcpy,memcpy Google U
BM_Memcpy/8/0 61.7 ns 61.6 ns 11191296 bytes_per_cycle=0.550078/s bytes_per_second=786.893M/s items_per_second=16.2279M/s __llvm_libc::memcpy,memcpy Google W
BM_Memcpy/9/0 981 ns 981 ns 703488 bytes_per_cycle=1.52333/s bytes_per_second=2.12807G/s items_per_second=1019.81k/s __llvm_libc::memcpy,uniform 384 to 4096
```
It is not as good as glibc for now so there's room for improvement. I suspect a path pumping 16 bytes at once given the doubled numbers for large copies.
```
BM_Memcpy/0/1 146 ns 82.5 ns 8576000 bytes_per_cycle=1.35236/s bytes_per_second=1.88922G/s items_per_second=12.1169M/s glibc memcpy,memcpy Google A
BM_Memcpy/1/1 112 ns 63.7 ns 10634240 bytes_per_cycle=0.628018/s bytes_per_second=898.387M/s items_per_second=15.702M/s glibc memcpy,memcpy Google B
BM_Memcpy/2/1 315 ns 180 ns 4079616 bytes_per_cycle=2.65229/s bytes_per_second=3.7052G/s items_per_second=5.54764M/s glibc memcpy,memcpy Google D
BM_Memcpy/3/1 85.3 ns 43.1 ns 15854592 bytes_per_cycle=0.774164/s bytes_per_second=1107.45M/s items_per_second=23.2249M/s glibc memcpy,memcpy Google L
BM_Memcpy/4/1 105 ns 54.3 ns 13427712 bytes_per_cycle=0.7793/s bytes_per_second=1114.8M/s items_per_second=18.4109M/s glibc memcpy,memcpy Google M
BM_Memcpy/5/1 77.1 ns 43.2 ns 16476160 bytes_per_cycle=0.279808/s bytes_per_second=400.269M/s items_per_second=23.1428M/s glibc memcpy,memcpy Google Q
BM_Memcpy/6/1 112 ns 62.7 ns 11236352 bytes_per_cycle=0.676078/s bytes_per_second=967.137M/s items_per_second=15.9387M/s glibc memcpy,memcpy Google S
BM_Memcpy/7/1 131 ns 65.5 ns 11751424 bytes_per_cycle=0.965616/s bytes_per_second=1.34895G/s items_per_second=15.2762M/s glibc memcpy,memcpy Google U
BM_Memcpy/8/1 104 ns 55.0 ns 12314624 bytes_per_cycle=0.583336/s bytes_per_second=834.468M/s items_per_second=18.1937M/s glibc memcpy,memcpy Google W
BM_Memcpy/9/1 932 ns 466 ns 1480704 bytes_per_cycle=3.17342/s bytes_per_second=4.43321G/s items_per_second=2.14679M/s glibc memcpy,uniform 384 to 4096
```
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D150202
|
|
This part of the effort to make all test related pieces into the `test`
directory. This helps is excluding test related pieces in a straight
forward manner if LLVM_INCLUDE_TESTS is OFF. Future patches will also move
the MPFR wrapper and testutils into the 'test' directory.
|
|
|
|
This reverts commit https://reviews.llvm.org/D135134 (b3f1d58a131eb546aaf1ac165c77ccb89c40d758)
That revision appears to have broken Arm memcpy in some subtle
ways. Am communicating with the original author to get a
good reproduction.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/chf1Y6eGM
Differential Revision: https://reviews.llvm.org/D135134
|
|
This reverts commit 9721687835a7df5da0c9482cf684c11b8ba97f75.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
|
|
This reverts commit 98bf836f3127a346a81da5ae3e27246935298de4.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
|
|
This reverts commit d55f2d8ab076298cfd745c05c1b4dfd5583f8b9e.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
|
|
This reverts commit 4c19439d249256db720e323a446e39d05496732f.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose.
This patch is not meant to be submitted but gives an idea of the change.
Codegen can be checked in https://godbolt.org/z/6z1dEoWbs by removing the "static inline" before individual functions.
Unittests are coming.
Suggested review order:
- utils
- op_base
- op_builtin
- op_generic
- op_x86 / op_aarch64
- *_implementations.h
Differential Revision: https://reviews.llvm.org/D135134
|
|
|
|
This is a reland of https://reviews.llvm.org/D130773
|
|
This reverts commit 7add0e5fdc5c7cb6f59f60cd436bf161cf9f9eb7.
|
|
Migrating all private STL code to the standard STL case but keeping it under the CPP namespace to avoid confusion.
Differential Revision: https://reviews.llvm.org/D130773
|
|
The idea is to move all pieces related to the actual libc sources to the
"src" directory. This allows downstream users to ship and build just the
"src" directory.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D112653
|
|
Summary:
Having a consistent prefix makes selecting all of the llvm libc tests
easier on any platform that is also using the gtest framework.
This also modifies the TEST and TEST_F macros to enforce this change
moving forward.
Reviewers: sivachandra
Subscribers:
|
|
Summary:
Made all header files consistent based of this documentation: https://llvm.org/docs/CodingStandards.html#file-headers.
And did the same for all source files top of file comments.
Reviewers: sivachandra, abrachet
Reviewed By: sivachandra, abrachet
Subscribers: MaskRay, tschuett, libc-commits
Tags: #libc-project
Differential Revision: https://reviews.llvm.org/D77533
|
|
Summary:
The patch is not ready yet and is here to discuss a few options:
- How do we customize the implementation? (i.e. how to define `kRepMovsBSize`),
- How do we specify custom compilation flags? (We'd need `-fno-builtin-memcpy` to be passed in),
- How do we build? We may want to test in debug but build the libc with `-march=native` for instance,
- Clang has a brand new builtin `__builtin_memcpy_inline` which makes the implementation easy and efficient, but:
- If we compile with `gcc` or `msvc` we can't use it, resorting on less efficient code generation,
- With gcc we can use `__builtin_memcpy` but then we'd need a postprocess step to check that the final assembly do not contain call to `memcpy` (unlikely but allowed),
- For msvc we'd need to resort on the compiler optimization passes.
Reviewers: sivachandra, abrachet
Subscribers: mgorny, MaskRay, tschuett, libc-commits, courbet
Tags: #libc-project
Differential Revision: https://reviews.llvm.org/D74397
|
|
|
|
Building with address-sanitizer shows that using ArrayRef ends up
accessing a temporary outside its scope.
|
|
Summary: This patch adds a few low level functions needed to build libc memory functions.
Reviewers: sivachandra, jfb
Subscribers: mgorny, MaskRay, tschuett, libc-commits, ckennelly
Tags: #libc-project
Differential Revision: https://reviews.llvm.org/D73472
|