-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Hi,
this is a follow-up of #12106 .
Although we've removed two sorts of performance regressing mid-end ISLE rules,
there still remains a significant performance degradation as well as other suspected cases.
(There is, of course, a bright side: we have significant performance improvements for many cases!)
Performance Regression:
- shootout-switch
- pulldown-cmark
First, here is the backing data of the performance regression:
| Benchmark | No Opt | Main | Main Speedup |
|---|---|---|---|
| blake3-scalar | 317,727 | 317,719 | 0.00% |
| blake3-simd | 313,115 | 306,232 | 2.25% |
| bz2 | 87,201,400 | 86,337,330 | 1.00% |
| pulldown-cmark | 6,580,174 | 6,905,992 | -4.72% |
| regex | 209,743,816 | 210,183,175 | -0.21% |
| shootout-ackermann | 8,498,140 | 7,764,439 | 9.45% |
| shootout-base64 | 381,721,177 | 352,724,661 | 8.22% |
| shootout-ctype | 830,813,398 | 796,486,698 | 4.31% |
| shootout-ed25519 | 9,583,747,723 | 9,395,321,203 | 2.01% |
| shootout-fib2 | 3,009,269,670 | 3,010,509,565 | -0.04% |
| shootout-gimli | 5,338,258 | 5,401,697 | -1.17% |
| shootout-heapsort | 2,382,073,831 | 2,375,914,107 | 0.26% |
| shootout-keccak | 25,168,386 | 21,112,482 | 19.21% |
| shootout-matrix | 538,696,036 | 544,739,691 | -1.11% |
| shootout-memmove | 36,156,621 | 36,115,998 | 0.11% |
| shootout-minicsv | 1,481,713,625 | 1,291,534,227 | 14.73% |
| shootout-nestedloop | 449 | 442 | 1.43% |
| shootout-random | 630,328,205 | 439,691,474 | 43.36% |
| shootout-ratelimit | 39,148,817 | 39,956,714 | -2.02% |
| shootout-seqhash | 8,869,585,125 | 8,639,110,150 | 2.67% |
| shootout-sieve | 905,404,028 | 840,777,681 | 7.69% |
| shootout-switch | 139,525,474 | 153,663,682 | -9.20% |
| shootout-xblabla20 | 2,891,404 | 2,907,369 | -0.55% |
| shootout-xchacha20 | 4,384,703 | 4,395,319 | -0.24% |
| spidermonkey | 636,104,785 | 631,998,404 | 0.65% |
Unlike the previous cases, the cause is not obvious.
19245 clif/v-no-opts/shootout-switch/wasm[0]--function[9]--__original_main.clif
19241 clif/v-main/shootout-switch/wasm[0]--function[9]--__original_main.clif
The number of instructions does not increase significantly from no-opt to main.
However, the applied optimizations make the program use long-lived value:
--- /data/bongjun/clif/v-no-opts/shootout-switch/wasm[0]--function[9]--__original_main.clif 2025-12-08 12:43:58.406738645 +0000
+++ /data/bongjun/clif/v-main/shootout-switch/wasm[0]--function[9]--__original_main.clif 2025-12-08 12:49:01.961085326 +0000
- v8572 = iconst.i32 1066
- v8573 = iconst.i32 0
-@d20b v4324 = call fn1(v0, v0, v8572, v8573) ; v8572 = 1066, v8573 = 0
- v8574 = iadd.i64 v105, v106 ; v106 = 3584
-@d219 v4333 = load.i32 little heap v8574
- v8575 = iconst.i32 6
- v8576 = icmp uge v4333, v8575 ; v8575 = 6
+ v8603 = iconst.i32 1066
+ v8604 = iconst.i32 0
+@d20b v4324 = call fn1(v0, v0, v8603, v8604) ; v8603 = 1066, v8604 = 0
+ v8605 = iadd.i64 v11, v106 ; v106 = 3584
+@d219 v4333 = load.i32 little heap v8605
+ v8606 = iconst.i32 6
+ v8607 = icmp uge v4333, v8606 ; v8606 = 6See v8574 and v8605 which uses v105 and v11.
v11 is defined at the beginning, but v105 is defined later than v11:
block0(v0: i64, v1: i64):
@01f0 v5 = load.i32 notrap aligned table v0+256
@01f6 v6 = iconst.i32 16
@01f8 v7 = isub v5, v6 ; v6 = 16
@01fb store notrap aligned table v7, v0+256
@0203 v9 = iconst.i32 0x2710
@0207 v11 = load.i64 notrap aligned readonly can_move checked v0+56
...
@02d6 v105 = iadd.i64 v11, v4439
This might increase the register pressure, causing more spills which can degrade the performance.