| Age | Commit message (Collapse) | Author |
|
The benchtests does not define _LIBC.
|
|
The compiler might not constant fold the call, which issues
linker error.
Reviewed-by: Sam James <sam@gentoo.org>
|
|
clangs warns of the implicit cast of RAND_MAX to float:
error: implicit conversion from 'int' to 'float' changes value from
2147483647 to 2147483648 [-Werror,-Wimplicit-const-int-float-conversion]
So make it explicit.
Reviewed-by: Sam James <sam@gentoo.org>
|
|
Similar to tst-printf-bz18872.sh, add the attribute_optimize to avoid
build failures with compilers that do not support "GCC optimize" pragma.
Reviewed-by: Sam James <sam@gentoo.org>
|
|
The f128 is not a valid floating constant suffix on clang.
Reviewed-by: Sam James <sam@gentoo.org>
|
|
Random inputs in the range [0,10].
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Random inputs in the range [0,10].
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
All previously forwarded functions are now called directly (either via local
call in libc, or through a __export).t
|
|
These are not used in GNU/Hurd since very long now.
|
|
Add benchmark support for frexp, frexpf, and frexpl to measure the
performance improvement of the fast path optimization.
- Created frexp-inputs, frexpf-inputs, frexpl-inputs with random test values
- Added frexp, frexpf, frexpl to bench-math list
- Added CFLAGS to disable builtins for accurate benchmarking
These benchmarks will be used to quantify the performance gains from the
fast path optimization for normal floating-point numbers.
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
|
|
Replace np.complex with complex to fix numpy error:
AttributeError: module 'numpy' has no attribute 'complex'.
`np.complex` was a deprecated alias for the builtin `complex`. To avoid this error in existing code, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
|
|
The gcc implements fmod as a built-in for x86, so disable it to
benchmark the C implementation.
Also, make fmod and fmodf use the workload directive to measure
the reciprocal throughput.
|
|
Random inputs in the range [-20.0,20.0].
|
|
The inputs are based on fmodf-inputs.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
The inputs are based on fmod-inputs.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Vector variants of the new C23 log10p1 routines.
Note: Benchmark inputs for log10p1(f) are identical to log1p(f)
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Vector variants of the new C23 log2p1 routines.
Note: Benchmark inputs for log2p1(f) are identical to log1p(f).
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
Ensure benchtests compile with trunk GCC.
|
|
|
|
|
|
Random inputs in range [-20.00,20.00].
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Random input in range [-10,10].
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Random inputs in range [1.00,21.00]
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Vector variants of the new C23 exp2m1 & exp10m1 routines.
Note: Benchmark inputs for exp2m1 & exp10m1 are identical to exp2 & exp10
respectively, this also includes the floating point variations.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Changes with respect to v1:
- added missing rsqrt and rsqrtf in bench-math
|
|
These files were prepared together with Saban Houssein.
|
|
Use uint16_t rather than uint8_t for the size arrays.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
This reverts commit 09604542d31abf1e35cd00c1db8d9bee9568bdd0.
|
|
Use uint16_t rather than uint8_t for the size arrays.
|
|
Change duration to 3 seconds. Add spaces before '('.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Commit 934d88d used inputs with exponent generated at random in the
whole binary64 exponent range, which yields essentially very large
or very small values of |y/x|. Instead, this commit generates x, y at
random in [-10,10], which should better corresponds to real applications.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
Random IP addresses in the full range. There is no extra workload
to check the effectiveness '::' optimization for a set of 0-oct
sets (although it would be a possible workload).
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Random IP addresses in the full range.
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
It adds four ranges, which is how the generic implementation handles
normal numbers:
1. Random inputs in the range [0.0, 1.0];
2. Random inputs in the range [1.0, (double)(UINT64_C(1) << 52))];
3. Random inputs in the range [(double)(UINT64_C(1) << 52), DBL_MAX];
4. Random integral inputs in the range [0.0, (double)(UINT64_C(1) << 52)].
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
It adds four ranges, which is how the generic implementation handles
normal numbers:
1. Random inputs in the range [0.0, 1.0];
2. Random inputs in the range [1.0, (float)(1U << 23)];
3. Random inputs in the range [(float)(1U << 23), FLT_MAX];
4. Random integral inputs in the range [0.0, (float)(1U << 23)].
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Add initial inputs for asinpi(f), acospi(f), atanpi(f) and atan2pi(f) based
on existing asin/acos/atan inputs.
Benchtests now works on the new libmvec function.
Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
|
|
Existing benchtests for malloc infrastructure seem to be rather generic
to test global malloc implementation performance. This new benchtest
focus on reducing any non tcache related side effects, allowing to more
realistically predict performance impacts of tcache code changes.
The test was inpired in bench-[cm]alloc-thread code, with severe
simplifications:
- forces single thread execution, reducing concurrency side-effects,
like cache incoherence penalties due simultaneous writes to the same
cache pages;
- Focus on allocating and deallocating a single size for all the
duration of the benchmark. Since all it does is allocate and
deallocate, it will measure the tcache hotpath without any
side-effects.
- Allows to specify the allocation size as input argument.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
changes in v2:
* fixed the missing Makefile entry in the first version
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
This patch changes the shell script that selects which arguments are used
for the execution of bench-malloc-thread.
The problem seems to have been introduced in commit:
commit 2d6427a63cad8056ba6bcaaaa8df21977c8dde3d
Author: Wangyang Guo <wangyang.guo@intel.com>
Date: Fri Nov 29 16:05:35 2024 +0800
benchtests: Add calloc test
With current condition, the following error "/bin/sh: 3: [[: not found"
occurs when executing `make bench BENCHSET="malloc-thread"` and the else
path is taken, using incorrect arguments for bench test execution.
Error is reproducible in Debian based distros.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
Increase iterations so it runs for ~1 second on modern CPUs.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
On SPR, it improves atanh bench performance by:
Before After Improvement
reciprocal-throughput 15.1715 14.8628 2%
latency 57.1941 56.1883 2%
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
On SPR, it improves sinh bench performance by:
Before After Improvement
reciprocal-throughput 14.2017 11.815 17%
latency 36.4917 35.2114 4%
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
The SIGCANCEL signal handler should not issue __syscall_do_cancel,
which calls __do_cancel and __pthread_unwind, if the cancellation
is already in proces (and libgcc unwind is not reentrant). Any
cancellation signal received after is ignored.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
Add a new randomized strlen test similar to bench-random-memcpy. Instead of
repeating the same call to strlen over and over again, it times a large number
of different strings. The distribution of the string length and alignment is
based on SPEC2017.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Adjust sizes between 64KB and 16MB and iterations based on length.
Remove incorrect uses of alloc_bufs since we're not interested in measuring
Linux clear_page time. Use getpagesize() - 1 instead of 4095 when
aligning within a page.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|