| Age | Commit message (Collapse) | Author |
|
Fixes #111546
---------
Co-authored-by: alyyelashram <150528548+alyyelashram@users.noreply.github.com>
|
|
This is a part of #97655.
|
|
declaration" (#98593)
Reverts llvm/llvm-project#98075
bots are broken
|
|
This is a part of #97655.
|
|
1. Remove is_disjoint check for smaller sizes and reduce code bloat.
inline_memmove may handle some small sizes as efficiently
as inline_memcpy. For these sizes we may not do is_disjoint check.
This both avoids additional code for the most frequent smaller sizes
and removes code bloat (we don't need the memcpy logic for small sizes).
Here we heavily rely on inlining and dead code elimination: from the
first
inline_memmove we should get only handling of small sizes, and from
the second inline_memmove and inline_memcpy we should get only handling
of larger sizes.
2. Use the memcpy thresholds for memmove.
Memcpy thresholds were more carefully tuned.
This becomes more important since we use memmove
for all small sizes always now.
3. Fix boundary conditions for sizes = 16/32/64.
See the added comment for explanations.
Memmove function size drops from 885 to 715 bytes
due to removed duplication.
```
│ baseline │ small-size │
│ sec/op │ sec/op vs base │
memmove/Google_A 3.208n ± 0% 2.911n ± 0% -9.25% (n=100)
memmove/Google_B 4.113n ± 1% 3.428n ± 0% -16.65% (n=100)
memmove/Google_D 5.838n ± 0% 4.158n ± 0% -28.78% (n=100)
memmove/Google_S 4.712n ± 1% 3.899n ± 0% -17.25% (n=100)
memmove/Google_U 3.609n ± 0% 3.247n ± 1% -10.02% (n=100)
memmove/0 2.982n ± 0% 2.169n ± 0% -27.26% (n=50)
memmove/1 3.253n ± 0% 2.168n ± 0% -33.34% (n=50)
memmove/2 3.255n ± 0% 2.169n ± 0% -33.38% (n=50)
memmove/3 3.259n ± 2% 2.175n ± 0% -33.27% (p=0.000 n=50)
memmove/4 3.259n ± 0% 2.168n ± 5% -33.46% (p=0.000 n=50)
memmove/5 2.488n ± 0% 1.926n ± 0% -22.57% (p=0.000 n=50)
memmove/6 2.490n ± 0% 1.928n ± 0% -22.59% (p=0.000 n=50)
memmove/7 2.492n ± 0% 1.927n ± 0% -22.65% (p=0.000 n=50)
memmove/8 2.737n ± 0% 2.711n ± 0% -0.97% (p=0.000 n=50)
memmove/9 2.736n ± 0% 2.711n ± 0% -0.94% (p=0.000 n=50)
memmove/10 2.739n ± 0% 2.711n ± 0% -1.04% (p=0.000 n=50)
memmove/11 2.740n ± 0% 2.711n ± 0% -1.07% (p=0.000 n=50)
memmove/12 2.740n ± 0% 2.711n ± 0% -1.09% (p=0.000 n=50)
memmove/13 2.744n ± 0% 2.711n ± 0% -1.22% (p=0.000 n=50)
memmove/14 2.742n ± 0% 2.711n ± 0% -1.14% (p=0.000 n=50)
memmove/15 2.742n ± 0% 2.711n ± 0% -1.15% (p=0.000 n=50)
memmove/16 2.997n ± 0% 2.981n ± 0% -0.52% (p=0.000 n=50)
memmove/17 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50)
memmove/18 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50)
memmove/19 2.999n ± 0% 2.982n ± 0% -0.59% (p=0.000 n=50)
memmove/20 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50)
memmove/21 3.000n ± 0% 2.981n ± 0% -0.61% (p=0.000 n=50)
memmove/22 3.002n ± 0% 2.981n ± 0% -0.68% (p=0.000 n=50)
memmove/23 3.002n ± 0% 2.981n ± 0% -0.67% (p=0.000 n=50)
memmove/24 3.002n ± 0% 2.981n ± 0% -0.70% (n=50)
memmove/25 3.002n ± 0% 2.981n ± 0% -0.68% (p=0.000 n=50)
memmove/26 3.004n ± 0% 2.982n ± 0% -0.74% (p=0.000 n=50)
memmove/27 3.005n ± 0% 2.981n ± 0% -0.79% (n=50)
memmove/28 3.005n ± 0% 2.982n ± 0% -0.77% (n=50)
memmove/29 3.009n ± 0% 2.981n ± 0% -0.92% (n=50)
memmove/30 3.008n ± 0% 2.981n ± 0% -0.89% (n=50)
memmove/31 3.007n ± 0% 2.982n ± 0% -0.86% (n=50)
memmove/32 3.540n ± 0% 2.998n ± 0% -15.31% (p=0.000 n=50)
memmove/33 3.544n ± 0% 2.997n ± 0% -15.44% (p=0.000 n=50)
memmove/34 3.546n ± 0% 2.999n ± 0% -15.42% (n=50)
memmove/35 3.545n ± 0% 2.999n ± 0% -15.40% (n=50)
memmove/36 3.548n ± 0% 2.998n ± 0% -15.52% (p=0.000 n=50)
memmove/37 3.546n ± 0% 3.000n ± 0% -15.41% (n=50)
memmove/38 3.549n ± 0% 2.999n ± 0% -15.49% (p=0.000 n=50)
memmove/39 3.549n ± 0% 2.999n ± 0% -15.48% (p=0.000 n=50)
memmove/40 3.549n ± 0% 3.000n ± 0% -15.46% (p=0.000 n=50)
memmove/41 3.550n ± 0% 3.001n ± 0% -15.47% (n=50)
memmove/42 3.549n ± 0% 3.001n ± 0% -15.43% (n=50)
memmove/43 3.552n ± 0% 3.001n ± 0% -15.52% (p=0.000 n=50)
memmove/44 3.552n ± 0% 3.001n ± 0% -15.51% (n=50)
memmove/45 3.552n ± 0% 3.002n ± 0% -15.48% (n=50)
memmove/46 3.554n ± 0% 3.001n ± 0% -15.55% (p=0.000 n=50)
memmove/47 3.556n ± 0% 3.002n ± 0% -15.58% (p=0.000 n=50)
memmove/48 3.555n ± 0% 3.003n ± 0% -15.54% (n=50)
memmove/49 3.557n ± 0% 3.002n ± 0% -15.59% (p=0.000 n=50)
memmove/50 3.557n ± 0% 3.004n ± 0% -15.55% (p=0.000 n=50)
memmove/51 3.556n ± 0% 3.004n ± 0% -15.53% (p=0.000 n=50)
memmove/52 3.561n ± 0% 3.004n ± 0% -15.65% (p=0.000 n=50)
memmove/53 3.558n ± 0% 3.004n ± 0% -15.57% (p=0.000 n=50)
memmove/54 3.561n ± 0% 3.005n ± 0% -15.62% (n=50)
memmove/55 3.560n ± 0% 3.006n ± 0% -15.57% (n=50)
memmove/56 3.562n ± 0% 3.006n ± 0% -15.60% (p=0.000 n=50)
memmove/57 3.563n ± 0% 3.006n ± 0% -15.64% (n=50)
memmove/58 3.565n ± 0% 3.007n ± 0% -15.64% (p=0.000 n=50)
memmove/59 3.564n ± 0% 3.006n ± 0% -15.66% (p=0.000 n=50)
memmove/60 3.570n ± 0% 3.008n ± 0% -15.74% (p=0.000 n=50)
memmove/61 3.566n ± 0% 3.009n ± 0% -15.63% (p=0.000 n=50)
memmove/62 3.567n ± 0% 3.007n ± 0% -15.70% (p=0.000 n=50)
memmove/63 3.568n ± 0% 3.008n ± 0% -15.71% (p=0.000 n=50)
memmove/64 4.104n ± 0% 3.008n ± 0% -26.70% (p=0.000 n=50)
memmove/65 4.126n ± 0% 3.662n ± 0% -11.26% (p=0.000 n=50)
memmove/66 4.128n ± 0% 3.662n ± 0% -11.29% (n=50)
memmove/67 4.129n ± 0% 3.662n ± 0% -11.31% (n=50)
memmove/68 4.129n ± 0% 3.661n ± 0% -11.33% (p=0.000 n=50)
memmove/69 4.130n ± 0% 3.662n ± 0% -11.34% (p=0.000 n=50)
memmove/70 4.130n ± 0% 3.662n ± 0% -11.33% (n=50)
memmove/71 4.132n ± 0% 3.662n ± 0% -11.38% (p=0.000 n=50)
memmove/72 4.131n ± 0% 3.661n ± 0% -11.39% (n=50)
memmove/73 4.135n ± 0% 3.661n ± 0% -11.45% (p=0.000 n=50)
memmove/74 4.137n ± 0% 3.662n ± 0% -11.49% (n=50)
memmove/75 4.138n ± 0% 3.662n ± 0% -11.51% (p=0.000 n=50)
memmove/76 4.139n ± 0% 3.661n ± 0% -11.56% (p=0.000 n=50)
memmove/77 4.136n ± 0% 3.662n ± 0% -11.47% (p=0.000 n=50)
memmove/78 4.143n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50)
memmove/79 4.142n ± 0% 3.661n ± 0% -11.60% (n=50)
memmove/80 4.142n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50)
memmove/81 4.140n ± 0% 3.661n ± 0% -11.57% (n=50)
memmove/82 4.146n ± 0% 3.661n ± 0% -11.69% (n=50)
memmove/83 4.143n ± 0% 3.661n ± 0% -11.63% (p=0.000 n=50)
memmove/84 4.143n ± 0% 3.661n ± 0% -11.63% (n=50)
memmove/85 4.147n ± 0% 3.661n ± 0% -11.73% (p=0.000 n=50)
memmove/86 4.142n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50)
memmove/87 4.147n ± 0% 3.661n ± 0% -11.72% (p=0.000 n=50)
memmove/88 4.148n ± 0% 3.661n ± 0% -11.74% (n=50)
memmove/89 4.152n ± 0% 3.661n ± 0% -11.84% (n=50)
memmove/90 4.151n ± 0% 3.661n ± 0% -11.81% (n=50)
memmove/91 4.150n ± 0% 3.661n ± 0% -11.78% (n=50)
memmove/92 4.153n ± 0% 3.661n ± 0% -11.86% (n=50)
memmove/93 4.158n ± 0% 3.661n ± 0% -11.95% (n=50)
memmove/94 4.157n ± 0% 3.661n ± 0% -11.95% (p=0.000 n=50)
memmove/95 4.155n ± 0% 3.661n ± 0% -11.90% (p=0.000 n=50)
memmove/96 4.149n ± 0% 3.660n ± 0% -11.79% (n=50)
memmove/97 4.157n ± 0% 3.661n ± 0% -11.94% (n=50)
memmove/98 4.157n ± 0% 3.661n ± 0% -11.94% (n=50)
memmove/99 4.168n ± 0% 3.661n ± 0% -12.17% (p=0.000 n=50)
memmove/100 4.159n ± 0% 3.660n ± 0% -12.00% (p=0.000 n=50)
memmove/101 4.161n ± 0% 3.660n ± 0% -12.03% (p=0.000 n=50)
memmove/102 4.165n ± 0% 3.660n ± 0% -12.12% (p=0.000 n=50)
memmove/103 4.164n ± 0% 3.661n ± 0% -12.08% (n=50)
memmove/104 4.164n ± 0% 3.660n ± 0% -12.11% (n=50)
memmove/105 4.165n ± 0% 3.660n ± 0% -12.12% (p=0.000 n=50)
memmove/106 4.166n ± 0% 3.660n ± 0% -12.15% (n=50)
memmove/107 4.171n ± 0% 3.660n ± 1% -12.26% (p=0.000 n=50)
memmove/108 4.173n ± 0% 3.660n ± 0% -12.30% (p=0.000 n=50)
memmove/109 4.170n ± 0% 3.660n ± 0% -12.24% (n=50)
memmove/110 4.174n ± 0% 3.660n ± 0% -12.31% (n=50)
memmove/111 4.176n ± 0% 3.660n ± 0% -12.35% (p=0.000 n=50)
memmove/112 4.174n ± 0% 3.659n ± 0% -12.34% (p=0.000 n=50)
memmove/113 4.176n ± 0% 3.660n ± 0% -12.35% (n=50)
memmove/114 4.182n ± 0% 3.660n ± 0% -12.49% (n=50)
memmove/115 4.185n ± 0% 3.660n ± 0% -12.55% (n=50)
memmove/116 4.184n ± 0% 3.659n ± 0% -12.54% (n=50)
memmove/117 4.182n ± 0% 3.660n ± 0% -12.50% (n=50)
memmove/118 4.188n ± 0% 3.660n ± 0% -12.61% (n=50)
memmove/119 4.186n ± 0% 3.660n ± 0% -12.57% (p=0.000 n=50)
memmove/120 4.189n ± 0% 3.659n ± 0% -12.63% (n=50)
memmove/121 4.187n ± 0% 3.660n ± 0% -12.60% (n=50)
memmove/122 4.186n ± 0% 3.660n ± 0% -12.58% (n=50)
memmove/123 4.187n ± 0% 3.660n ± 0% -12.60% (n=50)
memmove/124 4.189n ± 0% 3.659n ± 0% -12.65% (n=50)
memmove/125 4.195n ± 0% 3.659n ± 0% -12.78% (n=50)
memmove/126 4.197n ± 0% 3.659n ± 0% -12.81% (n=50)
memmove/127 4.194n ± 0% 3.659n ± 0% -12.75% (n=50)
memmove/128 5.035n ± 0% 3.659n ± 0% -27.32% (n=50)
memmove/129 5.127n ± 0% 5.164n ± 0% +0.73% (p=0.000 n=50)
memmove/130 5.130n ± 0% 5.176n ± 0% +0.88% (p=0.000 n=50)
memmove/131 5.127n ± 0% 5.180n ± 0% +1.05% (p=0.000 n=50)
memmove/132 5.131n ± 0% 5.169n ± 0% +0.75% (p=0.000 n=50)
memmove/133 5.137n ± 0% 5.179n ± 0% +0.81% (p=0.000 n=50)
memmove/134 5.140n ± 0% 5.178n ± 0% +0.74% (p=0.000 n=50)
memmove/135 5.141n ± 0% 5.187n ± 0% +0.88% (p=0.000 n=50)
memmove/136 5.133n ± 0% 5.184n ± 0% +0.99% (p=0.000 n=50)
memmove/137 5.148n ± 0% 5.186n ± 0% +0.73% (p=0.000 n=50)
memmove/138 5.143n ± 0% 5.189n ± 0% +0.88% (p=0.000 n=50)
memmove/139 5.142n ± 0% 5.192n ± 0% +0.97% (p=0.000 n=50)
memmove/140 5.141n ± 0% 5.192n ± 0% +1.01% (p=0.000 n=50)
memmove/141 5.155n ± 0% 5.188n ± 0% +0.64% (p=0.000 n=50)
memmove/142 5.146n ± 0% 5.192n ± 0% +0.90% (p=0.000 n=50)
memmove/143 5.142n ± 0% 5.203n ± 0% +1.19% (p=0.000 n=50)
memmove/144 5.146n ± 0% 5.197n ± 0% +0.99% (p=0.000 n=50)
memmove/145 5.146n ± 0% 5.196n ± 0% +0.97% (p=0.000 n=50)
memmove/146 5.151n ± 0% 5.207n ± 0% +1.10% (p=0.000 n=50)
memmove/147 5.151n ± 0% 5.205n ± 0% +1.06% (p=0.000 n=50)
memmove/148 5.156n ± 0% 5.190n ± 0% +0.66% (p=0.000 n=50)
memmove/149 5.158n ± 0% 5.212n ± 0% +1.04% (p=0.000 n=50)
memmove/150 5.160n ± 0% 5.203n ± 0% +0.84% (p=0.000 n=50)
memmove/151 5.167n ± 0% 5.210n ± 0% +0.83% (p=0.000 n=50)
memmove/152 5.157n ± 0% 5.206n ± 0% +0.94% (p=0.000 n=50)
memmove/153 5.170n ± 0% 5.211n ± 0% +0.80% (p=0.000 n=50)
memmove/154 5.169n ± 0% 5.222n ± 0% +1.02% (p=0.000 n=50)
memmove/155 5.171n ± 0% 5.215n ± 0% +0.87% (p=0.000 n=50)
memmove/156 5.174n ± 0% 5.214n ± 0% +0.78% (p=0.000 n=50)
memmove/157 5.171n ± 0% 5.218n ± 0% +0.92% (p=0.000 n=50)
memmove/158 5.168n ± 0% 5.224n ± 0% +1.09% (p=0.000 n=50)
memmove/159 5.179n ± 0% 5.218n ± 0% +0.76% (p=0.000 n=50)
memmove/160 5.170n ± 0% 5.219n ± 0% +0.95% (p=0.000 n=50)
memmove/161 5.187n ± 0% 5.220n ± 0% +0.64% (p=0.000 n=50)
memmove/162 5.189n ± 0% 5.234n ± 0% +0.86% (p=0.000 n=50)
memmove/163 5.199n ± 0% 5.250n ± 0% +0.99% (p=0.000 n=50)
memmove/164 5.205n ± 0% 5.260n ± 0% +1.04% (p=0.000 n=50)
memmove/165 5.208n ± 0% 5.261n ± 0% +1.01% (p=0.000 n=50)
memmove/166 5.227n ± 0% 5.275n ± 0% +0.91% (p=0.000 n=50)
memmove/167 5.233n ± 0% 5.281n ± 0% +0.92% (p=0.000 n=50)
memmove/168 5.236n ± 0% 5.295n ± 0% +1.12% (p=0.000 n=50)
memmove/169 5.256n ± 0% 5.297n ± 0% +0.79% (p=0.000 n=50)
memmove/170 5.259n ± 0% 5.302n ± 0% +0.80% (p=0.000 n=50)
memmove/171 5.269n ± 0% 5.321n ± 0% +0.97% (p=0.000 n=50)
memmove/172 5.266n ± 0% 5.318n ± 0% +0.98% (p=0.000 n=50)
memmove/173 5.272n ± 0% 5.330n ± 0% +1.09% (p=0.000 n=50)
memmove/174 5.284n ± 0% 5.331n ± 0% +0.89% (p=0.000 n=50)
memmove/175 5.284n ± 0% 5.322n ± 0% +0.72% (p=0.000 n=50)
memmove/176 5.298n ± 0% 5.337n ± 0% +0.74% (p=0.000 n=50)
memmove/177 5.282n ± 0% 5.338n ± 0% +1.04% (p=0.000 n=50)
memmove/178 5.299n ± 0% 5.337n ± 0% +0.71% (p=0.000 n=50)
memmove/179 5.296n ± 0% 5.343n ± 0% +0.88% (p=0.000 n=50)
memmove/180 5.292n ± 0% 5.343n ± 0% +0.97% (p=0.000 n=50)
memmove/181 5.303n ± 0% 5.335n ± 0% +0.60% (p=0.000 n=50)
memmove/182 5.305n ± 0% 5.338n ± 0% +0.62% (p=0.000 n=50)
memmove/183 5.298n ± 0% 5.329n ± 0% +0.59% (p=0.000 n=50)
memmove/184 5.299n ± 0% 5.333n ± 0% +0.64% (p=0.000 n=50)
memmove/185 5.291n ± 0% 5.330n ± 0% +0.73% (p=0.000 n=50)
memmove/186 5.296n ± 0% 5.332n ± 0% +0.68% (p=0.000 n=50)
memmove/187 5.297n ± 0% 5.320n ± 0% +0.44% (p=0.000 n=50)
memmove/188 5.286n ± 0% 5.314n ± 0% +0.53% (p=0.000 n=50)
memmove/189 5.293n ± 0% 5.318n ± 0% +0.46% (p=0.000 n=50)
memmove/190 5.294n ± 0% 5.318n ± 0% +0.45% (p=0.000 n=50)
memmove/191 5.292n ± 0% 5.314n ± 0% +0.40% (p=0.032 n=50)
memmove/192 5.272n ± 0% 5.304n ± 0% +0.60% (p=0.000 n=50)
memmove/193 5.279n ± 0% 5.310n ± 0% +0.57% (p=0.000 n=50)
memmove/194 5.294n ± 0% 5.308n ± 0% +0.26% (p=0.018 n=50)
memmove/195 5.302n ± 0% 5.311n ± 0% +0.18% (p=0.010 n=50)
memmove/196 5.301n ± 0% 5.316n ± 0% +0.28% (p=0.023 n=50)
memmove/197 5.302n ± 0% 5.327n ± 0% +0.47% (p=0.000 n=50)
memmove/198 5.310n ± 0% 5.326n ± 0% +0.30% (p=0.003 n=50)
memmove/199 5.303n ± 0% 5.319n ± 0% +0.30% (p=0.009 n=50)
memmove/200 5.312n ± 0% 5.330n ± 0% +0.35% (p=0.001 n=50)
memmove/201 5.307n ± 0% 5.333n ± 0% +0.50% (p=0.000 n=50)
memmove/202 5.311n ± 0% 5.334n ± 0% +0.44% (p=0.000 n=50)
memmove/203 5.313n ± 0% 5.335n ± 0% +0.41% (p=0.006 n=50)
memmove/204 5.312n ± 0% 5.332n ± 0% +0.36% (p=0.002 n=50)
memmove/205 5.318n ± 0% 5.345n ± 0% +0.50% (p=0.000 n=50)
memmove/206 5.311n ± 0% 5.333n ± 0% +0.42% (p=0.002 n=50)
memmove/207 5.310n ± 0% 5.338n ± 0% +0.52% (p=0.000 n=50)
memmove/208 5.319n ± 0% 5.341n ± 0% +0.40% (p=0.004 n=50)
memmove/209 5.330n ± 0% 5.346n ± 0% +0.30% (p=0.004 n=50)
memmove/210 5.329n ± 0% 5.349n ± 0% +0.38% (p=0.002 n=50)
memmove/211 5.318n ± 0% 5.340n ± 0% +0.41% (p=0.000 n=50)
memmove/212 5.339n ± 0% 5.343n ± 0% ~ (p=0.396 n=50)
memmove/213 5.329n ± 0% 5.343n ± 0% +0.25% (p=0.017 n=50)
memmove/214 5.339n ± 0% 5.358n ± 0% +0.35% (p=0.035 n=50)
memmove/215 5.342n ± 0% 5.346n ± 0% ~ (p=0.063 n=50)
memmove/216 5.338n ± 0% 5.359n ± 0% +0.39% (p=0.002 n=50)
memmove/217 5.341n ± 0% 5.362n ± 0% +0.39% (p=0.015 n=50)
memmove/218 5.354n ± 0% 5.373n ± 0% +0.36% (p=0.041 n=50)
memmove/219 5.352n ± 0% 5.362n ± 0% ~ (p=0.143 n=50)
memmove/220 5.344n ± 0% 5.370n ± 0% +0.50% (p=0.001 n=50)
memmove/221 5.345n ± 0% 5.373n ± 0% +0.53% (p=0.000 n=50)
memmove/222 5.348n ± 0% 5.360n ± 0% +0.23% (p=0.014 n=50)
memmove/223 5.354n ± 0% 5.377n ± 0% +0.43% (p=0.024 n=50)
memmove/224 5.352n ± 0% 5.363n ± 0% ~ (p=0.052 n=50)
memmove/225 5.372n ± 0% 5.380n ± 0% ~ (p=0.481 n=50)
memmove/226 5.368n ± 0% 5.386n ± 0% +0.34% (p=0.004 n=50)
memmove/227 5.386n ± 0% 5.402n ± 0% +0.29% (p=0.028 n=50)
memmove/228 5.400n ± 0% 5.408n ± 0% ~ (p=0.174 n=50)
memmove/229 5.423n ± 0% 5.427n ± 0% ~ (p=0.444 n=50)
memmove/230 5.411n ± 0% 5.429n ± 0% +0.33% (p=0.020 n=50)
memmove/231 5.420n ± 0% 5.433n ± 0% +0.24% (p=0.034 n=50)
memmove/232 5.435n ± 0% 5.441n ± 0% ~ (p=0.235 n=50)
memmove/233 5.446n ± 0% 5.462n ± 0% ~ (p=0.590 n=50)
memmove/234 5.467n ± 0% 5.461n ± 0% ~ (p=0.921 n=50)
memmove/235 5.472n ± 0% 5.478n ± 0% ~ (p=0.883 n=50)
memmove/236 5.466n ± 0% 5.478n ± 0% ~ (p=0.324 n=50)
memmove/237 5.471n ± 0% 5.489n ± 0% ~ (p=0.132 n=50)
memmove/238 5.485n ± 0% 5.489n ± 0% ~ (p=0.460 n=50)
memmove/239 5.484n ± 0% 5.488n ± 0% ~ (p=0.833 n=50)
memmove/240 5.483n ± 0% 5.495n ± 0% ~ (p=0.095 n=50)
memmove/241 5.498n ± 0% 5.514n ± 0% ~ (p=0.077 n=50)
memmove/242 5.518n ± 0% 5.517n ± 0% ~ (p=0.481 n=50)
memmove/243 5.514n ± 0% 5.511n ± 0% ~ (p=0.503 n=50)
memmove/244 5.510n ± 0% 5.497n ± 0% -0.24% (p=0.038 n=50)
memmove/245 5.516n ± 0% 5.505n ± 0% ~ (p=0.317 n=50)
memmove/246 5.513n ± 1% 5.494n ± 0% ~ (p=0.147 n=50)
memmove/247 5.518n ± 0% 5.499n ± 0% -0.36% (p=0.011 n=50)
memmove/248 5.503n ± 0% 5.492n ± 0% ~ (p=0.267 n=50)
memmove/249 5.498n ± 0% 5.497n ± 0% ~ (p=0.765 n=50)
memmove/250 5.485n ± 0% 5.493n ± 0% ~ (p=0.348 n=50)
memmove/251 5.503n ± 0% 5.482n ± 0% -0.37% (p=0.013 n=50)
memmove/252 5.497n ± 0% 5.485n ± 0% ~ (p=0.077 n=50)
memmove/253 5.489n ± 0% 5.496n ± 0% ~ (p=0.850 n=50)
memmove/254 5.497n ± 0% 5.491n ± 0% ~ (p=0.548 n=50)
memmove/255 5.484n ± 1% 5.494n ± 0% ~ (p=0.888 n=50)
memmove/256 6.952n ± 0% 7.676n ± 0% +10.41% (p=0.000 n=50)
geomean 4.406n 4.127n -6.33%
```
|
|
This is step 4 of
https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
|
|
This patch mostly renames files so it better reflects the function they declare.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D155607
|
|
Most of the time `memmove` is called on buffers that are disjoint, in that case we can use `memcpy` which is faster.
The additional test is branchless on x86, aarch64 and RISCV with the zbb extension (bitmanip).
On x86 this patch adds a latency of 2 to 3 cycles.
Before
```
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------
BM_Memmove/0/0_median 5.00 ns 5.00 ns 10 bytes_per_cycle=1.25477/s bytes_per_second=2.62933G/s items_per_second=199.87M/s __llvm_libc::memmove,memmove Google A
BM_Memmove/1/0_median 6.21 ns 6.21 ns 10 bytes_per_cycle=3.22173/s bytes_per_second=6.75106G/s items_per_second=160.955M/s __llvm_libc::memmove,memmove Google B
BM_Memmove/2/0_median 8.09 ns 8.09 ns 10 bytes_per_cycle=5.31462/s bytes_per_second=11.1366G/s items_per_second=123.603M/s __llvm_libc::memmove,memmove Google D
BM_Memmove/3/0_median 5.95 ns 5.95 ns 10 bytes_per_cycle=2.71865/s bytes_per_second=5.69687G/s items_per_second=167.967M/s __llvm_libc::memmove,memmove Google L
BM_Memmove/4/0_median 5.63 ns 5.63 ns 10 bytes_per_cycle=2.28294/s bytes_per_second=4.78383G/s items_per_second=177.615M/s __llvm_libc::memmove,memmove Google M
BM_Memmove/5/0_median 5.68 ns 5.68 ns 10 bytes_per_cycle=2.16798/s bytes_per_second=4.54295G/s items_per_second=176.015M/s __llvm_libc::memmove,memmove Google Q
BM_Memmove/6/0_median 7.46 ns 7.46 ns 10 bytes_per_cycle=3.97619/s bytes_per_second=8.332G/s items_per_second=134.044M/s __llvm_libc::memmove,memmove Google S
BM_Memmove/7/0_median 5.40 ns 5.40 ns 10 bytes_per_cycle=1.79695/s bytes_per_second=3.76546G/s items_per_second=185.211M/s __llvm_libc::memmove,memmove Google U
BM_Memmove/8/0_median 5.62 ns 5.62 ns 10 bytes_per_cycle=3.18747/s bytes_per_second=6.67927G/s items_per_second=177.983M/s __llvm_libc::memmove,memmove Google W
BM_Memmove/9/0_median 101 ns 101 ns 10 bytes_per_cycle=9.77359/s bytes_per_second=20.4803G/s items_per_second=9.9333M/s __llvm_libc::memmove,uniform 384 to 4096
```
After
```
BM_Memmove/0/0_median 3.57 ns 3.57 ns 10 bytes_per_cycle=1.71375/s bytes_per_second=3.59112G/s items_per_second=280.411M/s __llvm_libc::memmove,memmove Google A
BM_Memmove/1/0_median 4.52 ns 4.52 ns 10 bytes_per_cycle=4.47557/s bytes_per_second=9.37843G/s items_per_second=221.427M/s __llvm_libc::memmove,memmove Google B
BM_Memmove/2/0_median 5.70 ns 5.70 ns 10 bytes_per_cycle=7.37396/s bytes_per_second=15.4519G/s items_per_second=175.399M/s __llvm_libc::memmove,memmove Google D
BM_Memmove/3/0_median 4.47 ns 4.47 ns 10 bytes_per_cycle=3.4148/s bytes_per_second=7.15563G/s items_per_second=223.743M/s __llvm_libc::memmove,memmove Google L
BM_Memmove/4/0_median 4.53 ns 4.53 ns 10 bytes_per_cycle=2.86071/s bytes_per_second=5.99454G/s items_per_second=220.69M/s __llvm_libc::memmove,memmove Google M
BM_Memmove/5/0_median 4.19 ns 4.19 ns 10 bytes_per_cycle=2.5484/s bytes_per_second=5.3401G/s items_per_second=238.924M/s __llvm_libc::memmove,memmove Google Q
BM_Memmove/6/0_median 5.02 ns 5.02 ns 10 bytes_per_cycle=5.94164/s bytes_per_second=12.4505G/s items_per_second=199.14M/s __llvm_libc::memmove,memmove Google S
BM_Memmove/7/0_median 4.03 ns 4.03 ns 10 bytes_per_cycle=2.47028/s bytes_per_second=5.17641G/s items_per_second=247.906M/s __llvm_libc::memmove,memmove Google U
BM_Memmove/8/0_median 4.70 ns 4.70 ns 10 bytes_per_cycle=3.84975/s bytes_per_second=8.06706G/s items_per_second=212.72M/s __llvm_libc::memmove,memmove Google W
BM_Memmove/9/0_median 90.7 ns 90.7 ns 10 bytes_per_cycle=10.8681/s bytes_per_second=22.7739G/s items_per_second=11.02M/s __llvm_libc::memmove,uniform 384 to 4096
```
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D152811
|
|
Moving memmove implementation to its own file for symmetry with other mem functions.
Differential Revision: https://reviews.llvm.org/D136687
|
|
The new framework makes it explicit which processor feature is being
used and allows for easier per platform customization:
- ARM cpu now uses trivial implementations to reduce code size.
- Memcmp, Bcmp and Memmove have been optimized for x86
- Bcmp has been optimized for aarch64.
This is a reland of https://reviews.llvm.org/D135134 (b3f1d58, 028414881381)
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D136595
|
|
This patch seems to introduce bugs on aarch64.
Reverting while we investigate the root cause.
This reverts commit 02841488138160f9064f334a833d4bf3e80385c6.
|
|
The new framework makes it explicit which processor feature is being
used and allows for easier per platform customization:
- ARM cpu now uses trivial implementations to reduce code size.
- Memcmp, Bcmp and Memmove have been optimized for x86
- Bcmp has been optimized for aarch64.
This is a reland of https://reviews.llvm.org/D135134 (b3f1d58)
Differential Revision: https://reviews.llvm.org/D136595
|
|
This reverts commit https://reviews.llvm.org/D135134 (b3f1d58a131eb546aaf1ac165c77ccb89c40d758)
That revision appears to have broken Arm memcpy in some subtle
ways. Am communicating with the original author to get a
good reproduction.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/chf1Y6eGM
Differential Revision: https://reviews.llvm.org/D135134
|
|
This reverts commit 9721687835a7df5da0c9482cf684c11b8ba97f75.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
|
|
This reverts commit 98bf836f3127a346a81da5ae3e27246935298de4.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
|
|
This reverts commit d55f2d8ab076298cfd745c05c1b4dfd5583f8b9e.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
|
|
This reverts commit 4c19439d249256db720e323a446e39d05496732f.
|
|
This version is more composable and also simpler at the expense of being more explicit and more verbose.
This patch is not meant to be submitted but gives an idea of the change.
Codegen can be checked in https://godbolt.org/z/6z1dEoWbs by removing the "static inline" before individual functions.
Unittests are coming.
Suggested review order:
- utils
- op_base
- op_builtin
- op_generic
- op_x86 / op_aarch64
- *_implementations.h
Differential Revision: https://reviews.llvm.org/D135134
|
|
This implementation relies on storing data in registers for sizes up to 128B.
Then depending on whether `dst` is less (resp. greater) than `src` we move data forward (resp. backward) by chunks of 32B.
We first make sure one of the pointers is aligned to increase performance on large move sizes.
Differential Revision: https://reviews.llvm.org/D114637
|
|
This patch applies the lint rules described in the previous patch. There
was also a significant amount of effort put into manually fixing things,
since all of the templated functions, or structs defined in /spec, were
not updated and had to be handled manually.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D114302
|
|
For example, strcpy does not pull memcpy now.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D114300
|
|
|
|
This reverts commit b659b789c03ac339e28d7b91406b67bb887a426d.
|
|
- Replace `move_byte_forward()` with `memcpy`. In `memcpy` implementation,
it copies bytes forward from beginning to end. Otherwise, `memmove` unit
tests will break.
- Make `memmove` unit tests work.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D109316
|
|
This eliminates cross-header dependency from stdlib to string.
|
|
`ssize_t` is from POSIX and is not standard unfortunately.
Rewritting the code so it doesn't depend on it.
Differential Revision: https://reviews.llvm.org/D94760
|
|
Use `memcpy` rather than copying bytes one by one, for there might be large
size structs to move.
Reviewed By: gchatelet, sivachandra
Differential Revision: https://reviews.llvm.org/D93195
|