llvm-project.git/libc/benchmarks/gpu/src/math/sin_benchmark.cpp, branch users/mingmingl-llvm/samplefdo-profile-format

[libc][gpu] Add exp/log benchmarks and flexible input generation (#155727)

2025-08-28T02:05:10+00:00

This patch adds GPU benchmarks for the exp (`exp`, `expf`, `expf16`) and
log (`log`, `logf`, `logf16`) families of math functions.

Adding these benchmarks revealed a key limitation in the existing
framework: the input generation mechanism was hardcoded to a single
strategy that sampled numbers with a uniform distribution of their
unbiased exponents.

While this strategy is effective for values spanning multiple orders of
magnitude, it is not suitable for linear ranges. The previous framework
lacked the flexibility to support this.

### Summary of Changes

**1. Framework Refactoring for Flexible Input Sampling:**
The GPU benchmark framework was refactored to support multiple,
pluggable input sampling strategies.

* **`Random.h`:** A new header was created to house the
`RandomGenerator` and the new distribution classes.
* **Distribution Classes:** Two sampling strategies were implemented:
* `UniformExponent`: Formalizes the previous logic of sampling numbers
with a uniform distribution of their unbiased exponents. It can now also
be configured to produce only positive values, which is essential for
functions like `log`.
* `UniformLinear`: A new strategy that samples numbers from a uniform
distribution over a linear interval `[min, max)`.
* **`MathPerf` Update:** The `MathPerf` class was updated with a generic
`run_throughput` method that is templated on a distribution object. This
makes the framework extensible to future sampling strategies.

**2. New Benchmarks for `exp` and `log`:**
Using the newly refactored framework, benchmarks were added for `exp`,
`expf`, `expf16`, `log`, `logf`, and `logf16`. The test intervals were
carefully chosen to measure the performance of distinct behavioral
regions of each function.

[libc] Improve GPU benchmarking (#153512)

2025-08-15T16:00:17+00:00

This patch improves the GPU benchmarking in this way:

* Replace `rand`/`srand` with a deterministic per-thread RNG seeded by
`call_index`: reproducible, apples-to-apples libc vs vendor comparisons.
* Fix input generation: sample the unbiased exponent uniformly in
`[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and
`+0.0`.
* Fix standard deviation: use an explicit estimator from sums and
sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples.
* Fix throughput overhead: subtract a loop-only baseline inside
NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call
already corrected (no `overhead()` call).
* Adapt existing math benchmarks to the new RNG/timing plumbing (plumb
`call_index`, drop `rand/srand`, clean includes).
* Correct inter-thread aggregation: use iteration-weighted pooling to
compute the global mean/variance, ensuring statistically sound `Cycles
(Mean)` and `Stddev`.
* Remove `Time / Iteration` column from the results table: it reported
per-thread convergence time (not per-call latency) and was
redundant/misleading next to `Cycles (Mean)`.
* Remove unused `BenchmarkLogger` files: dead code that added
maintenance and cognitive overhead without providing functionality.

---

## TODO (before merge)

* [ ] Investigate compiler warnings and address their root causes.
* [x] Review how per-thread results are aggregated into the overall
result.

## Follow-ups (future PRs)

* Add support to run throughput benchmarks with uniform (linear) input
distributions, alongside the current log2-uniform scheme.
* Review/adjust the configuration and coverage of existing math
benchmarks.
* Add more math benchmarks (e.g., `exp`/`expf`, others).

[libc] Fix GPU benchmarking

2025-07-18T19:36:23+00:00

[libc][gpu] Add Sinf Benchmarks (#102532)

2024-08-08T21:26:26+00:00

This PR adds benchmarking for `sinf()` using the same set up as `sin()`
but with a smaller range for floats.

[libc] [gpu] Fix Minor Benchmark UI Issues (#102529)

2024-08-08T20:32:20+00:00

Previously, `AmdgpuSinTwoPow_128` and others were too large for their
table cells. This PR shortens the name to `AmdSin...`

There were also some `-` missing in the separator. This PR instead
creates the separator string using the length of the headers.

[libc] [gpu] Add Generic, NvSin, and OcmlSinf64 Throughput Benchmark (#101917)

2024-08-08T20:05:34+00:00

This PR implements
https://github.com/lntue/llvm-project/commit/2a158426d4b90ffaa3eaecc9bc10e5aed11f1bcf
to provide better throughput benchmarking for libc `sin()` and
`__nv_sin()`.

These changes have not been tested on AMDGPU yet, only compiled.

[libc] Add AMDGPU Sin Benchmark (#101120)

2024-07-30T15:19:48+00:00

This PR adds support for benchmarking `__ocml_sin_f64()` against
`sin()`. This PR is currently a draft because I do not have access to an
AMD GPU and was not able to test the PR, but the code compiled when I
ran `ninja gpu-benchmark` from `runtimes-amdgcn-amd-amdhsa-bins`

Co-authored-by: Joseph Huber

[libc] Add Generic and NVPTX Sin Benchmark (#99795)

2024-07-30T03:09:11+00:00

This PR adds sin benchmarking for a range of values and on a
pregenerated random distribution.