Skip to content

Cranelift: ISLE mid-end performance regression (-9.20%) #12139

@bongjunj

Description

@bongjunj

Hi,

this is a follow-up of #12106 .

Although we've removed two sorts of performance regressing mid-end ISLE rules,
there still remains a significant performance degradation as well as other suspected cases.
(There is, of course, a bright side: we have significant performance improvements for many cases!)

Performance Regression:

  • shootout-switch
  • pulldown-cmark

First, here is the backing data of the performance regression:

Benchmark No Opt Main Main Speedup
blake3-scalar 317,727 317,719 0.00%
blake3-simd 313,115 306,232 2.25%
bz2 87,201,400 86,337,330 1.00%
pulldown-cmark 6,580,174 6,905,992 -4.72%
regex 209,743,816 210,183,175 -0.21%
shootout-ackermann 8,498,140 7,764,439 9.45%
shootout-base64 381,721,177 352,724,661 8.22%
shootout-ctype 830,813,398 796,486,698 4.31%
shootout-ed25519 9,583,747,723 9,395,321,203 2.01%
shootout-fib2 3,009,269,670 3,010,509,565 -0.04%
shootout-gimli 5,338,258 5,401,697 -1.17%
shootout-heapsort 2,382,073,831 2,375,914,107 0.26%
shootout-keccak 25,168,386 21,112,482 19.21%
shootout-matrix 538,696,036 544,739,691 -1.11%
shootout-memmove 36,156,621 36,115,998 0.11%
shootout-minicsv 1,481,713,625 1,291,534,227 14.73%
shootout-nestedloop 449 442 1.43%
shootout-random 630,328,205 439,691,474 43.36%
shootout-ratelimit 39,148,817 39,956,714 -2.02%
shootout-seqhash 8,869,585,125 8,639,110,150 2.67%
shootout-sieve 905,404,028 840,777,681 7.69%
shootout-switch 139,525,474 153,663,682 -9.20%
shootout-xblabla20 2,891,404 2,907,369 -0.55%
shootout-xchacha20 4,384,703 4,395,319 -0.24%
spidermonkey 636,104,785 631,998,404 0.65%

Unlike the previous cases, the cause is not obvious.

19245 clif/v-no-opts/shootout-switch/wasm[0]--function[9]--__original_main.clif
19241 clif/v-main/shootout-switch/wasm[0]--function[9]--__original_main.clif

The number of instructions does not increase significantly from no-opt to main.
However, the applied optimizations make the program use long-lived value:

--- /data/bongjun/clif/v-no-opts/shootout-switch/wasm[0]--function[9]--__original_main.clif	2025-12-08 12:43:58.406738645 +0000
+++ /data/bongjun/clif/v-main/shootout-switch/wasm[0]--function[9]--__original_main.clif	2025-12-08 12:49:01.961085326 +0000

-                                    v8572 = iconst.i32 1066
-                                    v8573 = iconst.i32 0
-@d20b                               v4324 = call fn1(v0, v0, v8572, v8573)  ; v8572 = 1066, v8573 = 0
-                                    v8574 = iadd.i64 v105, v106  ; v106 = 3584
-@d219                               v4333 = load.i32 little heap v8574
-                                    v8575 = iconst.i32 6
-                                    v8576 = icmp uge v4333, v8575  ; v8575 = 6
+                                    v8603 = iconst.i32 1066
+                                    v8604 = iconst.i32 0
+@d20b                               v4324 = call fn1(v0, v0, v8603, v8604)  ; v8603 = 1066, v8604 = 0
+                                    v8605 = iadd.i64 v11, v106  ; v106 = 3584
+@d219                               v4333 = load.i32 little heap v8605
+                                    v8606 = iconst.i32 6
+                                    v8607 = icmp uge v4333, v8606  ; v8606 = 6

See v8574 and v8605 which uses v105 and v11.
v11 is defined at the beginning, but v105 is defined later than v11:

                                block0(v0: i64, v1: i64):
@01f0                               v5 = load.i32 notrap aligned table v0+256
@01f6                               v6 = iconst.i32 16
@01f8                               v7 = isub v5, v6  ; v6 = 16
@01fb                               store notrap aligned table v7, v0+256
@0203                               v9 = iconst.i32 0x2710
@0207                               v11 = load.i64 notrap aligned readonly can_move checked v0+56

...

@02d6                               v105 = iadd.i64 v11, v4439

This might increase the register pressure, causing more spills which can degrade the performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    craneliftIssues related to the Cranelift code generatorcranelift:goal:optimize-speedFocus area: the speed of the code produced by Cranelift.performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions