Cranelift: ISLE mid-end performance regression (-9.20%)

Hi,

this is a follow-up of https://github.com/bytecodealliance/wasmtime/issues/12106 .

Although we've removed two sorts of performance regressing mid-end ISLE rules,
there still remains a significant performance degradation as well as other suspected cases.
(There is, of course, a bright side: we have significant performance improvements for many cases!)

Performance Regression:
* shootout-switch
* pulldown-cmark

First, here is the backing data of the performance regression:

Benchmark | No Opt | Main | Main Speedup
-- | -: | -: | -:
blake3-scalar | 317,727 | 317,719 | 0.00%
blake3-simd | 313,115 | 306,232 | 2.25%
bz2 | 87,201,400 | 86,337,330 | 1.00%
pulldown-cmark | 6,580,174 | 6,905,992 | -4.72%
regex | 209,743,816 | 210,183,175 | -0.21%
shootout-ackermann | 8,498,140 | 7,764,439 | 9.45%
shootout-base64 | 381,721,177 | 352,724,661 | 8.22%
shootout-ctype | 830,813,398 | 796,486,698 | 4.31%
shootout-ed25519 | 9,583,747,723 | 9,395,321,203 | 2.01%
shootout-fib2 | 3,009,269,670 | 3,010,509,565 | -0.04%
shootout-gimli | 5,338,258 | 5,401,697 | -1.17%
shootout-heapsort | 2,382,073,831 | 2,375,914,107 | 0.26%
shootout-keccak | 25,168,386 | 21,112,482 | 19.21%
shootout-matrix | 538,696,036 | 544,739,691 | -1.11%
shootout-memmove | 36,156,621 | 36,115,998 | 0.11%
shootout-minicsv | 1,481,713,625 | 1,291,534,227 | 14.73%
shootout-nestedloop | 449 | 442 | 1.43%
shootout-random | 630,328,205 | 439,691,474 | 43.36%
shootout-ratelimit | 39,148,817 | 39,956,714 | -2.02%
shootout-seqhash | 8,869,585,125 | 8,639,110,150 | 2.67%
shootout-sieve | 905,404,028 | 840,777,681 | 7.69%
shootout-switch | 139,525,474 | 153,663,682 | -9.20%
shootout-xblabla20 | 2,891,404 | 2,907,369 | -0.55%
shootout-xchacha20 | 4,384,703 | 4,395,319 | -0.24%
spidermonkey | 636,104,785 | 631,998,404 | 0.65%


Unlike the previous cases, the cause is not obvious.
```
19245 clif/v-no-opts/shootout-switch/wasm[0]--function[9]--__original_main.clif
19241 clif/v-main/shootout-switch/wasm[0]--function[9]--__original_main.clif
```
The number of instructions does not increase significantly from no-opt to main.
However, the applied optimizations make the program use long-lived value:

```diff
--- /data/bongjun/clif/v-no-opts/shootout-switch/wasm[0]--function[9]--__original_main.clif	2025-12-08 12:43:58.406738645 +0000
+++ /data/bongjun/clif/v-main/shootout-switch/wasm[0]--function[9]--__original_main.clif	2025-12-08 12:49:01.961085326 +0000

-                                    v8572 = iconst.i32 1066
-                                    v8573 = iconst.i32 0
-@d20b                               v4324 = call fn1(v0, v0, v8572, v8573)  ; v8572 = 1066, v8573 = 0
-                                    v8574 = iadd.i64 v105, v106  ; v106 = 3584
-@d219                               v4333 = load.i32 little heap v8574
-                                    v8575 = iconst.i32 6
-                                    v8576 = icmp uge v4333, v8575  ; v8575 = 6
+                                    v8603 = iconst.i32 1066
+                                    v8604 = iconst.i32 0
+@d20b                               v4324 = call fn1(v0, v0, v8603, v8604)  ; v8603 = 1066, v8604 = 0
+                                    v8605 = iadd.i64 v11, v106  ; v106 = 3584
+@d219                               v4333 = load.i32 little heap v8605
+                                    v8606 = iconst.i32 6
+                                    v8607 = icmp uge v4333, v8606  ; v8606 = 6
```

See `v8574` and `v8605` which uses `v105` and `v11`.
`v11` is defined at the beginning, but `v105` is defined later than `v11`:
```
                                block0(v0: i64, v1: i64):
@01f0                               v5 = load.i32 notrap aligned table v0+256
@01f6                               v6 = iconst.i32 16
@01f8                               v7 = isub v5, v6  ; v6 = 16
@01fb                               store notrap aligned table v7, v0+256
@0203                               v9 = iconst.i32 0x2710
@0207                               v11 = load.i64 notrap aligned readonly can_move checked v0+56

...

@02d6                               v105 = iadd.i64 v11, v4439
```

This might increase the register pressure, causing more spills which can degrade the performance.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cranelift: ISLE mid-end performance regression (-9.20%) #12139

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark	No Opt	Main	Main Speedup
blake3-scalar	317,727	317,719	0.00%
blake3-simd	313,115	306,232	2.25%
bz2	87,201,400	86,337,330	1.00%
pulldown-cmark	6,580,174	6,905,992	-4.72%
regex	209,743,816	210,183,175	-0.21%
shootout-ackermann	8,498,140	7,764,439	9.45%
shootout-base64	381,721,177	352,724,661	8.22%
shootout-ctype	830,813,398	796,486,698	4.31%
shootout-ed25519	9,583,747,723	9,395,321,203	2.01%
shootout-fib2	3,009,269,670	3,010,509,565	-0.04%
shootout-gimli	5,338,258	5,401,697	-1.17%
shootout-heapsort	2,382,073,831	2,375,914,107	0.26%
shootout-keccak	25,168,386	21,112,482	19.21%
shootout-matrix	538,696,036	544,739,691	-1.11%
shootout-memmove	36,156,621	36,115,998	0.11%
shootout-minicsv	1,481,713,625	1,291,534,227	14.73%
shootout-nestedloop	449	442	1.43%
shootout-random	630,328,205	439,691,474	43.36%
shootout-ratelimit	39,148,817	39,956,714	-2.02%
shootout-seqhash	8,869,585,125	8,639,110,150	2.67%
shootout-sieve	905,404,028	840,777,681	7.69%
shootout-switch	139,525,474	153,663,682	-9.20%
shootout-xblabla20	2,891,404	2,907,369	-0.55%
shootout-xchacha20	4,384,703	4,395,319	-0.24%
spidermonkey	636,104,785	631,998,404	0.65%

Cranelift: ISLE mid-end performance regression (-9.20%) #12139

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions