cmd/compile: riscv performance degredation 



### Go version

go version go1.24.2 linux/riscv64

### Output of `go env` in your module/workspace:

```shell
AR='ar'
CC='riscv64-unknown-linux-gnu-gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='riscv64-unknown-linux-gnu-g++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='riscv64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -pthread -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1454745784=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='riscv64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/tmp/benchmark_demo/go.mod'
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GORISCV64='rva20u64'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_riscv64'
GOVCS=''
GOVERSION='go1.24.2'
GOWORK=''
PKG_CONFIG='pkg-config'
```

### What did you do?

I was studying branch prediction behavior on real RISC-V hardware (Starfive VisionFive 2 - JH7110 => 4x sifive u74-mc) by creating two nearly identical benchmark functions that differ only in data preparation before `b.ResetTimer()`:

1. Created a minimal test case with two benchmark functions:
   - `BenchmarkSortedData`: calls `sort.Ints(data)` before `b.ResetTimer()`
   - `BenchmarkUnsortedData`: same function without the sort call
   
2. Both functions have identical benchmark loops after `b.ResetTimer()`

3. Ran the benchmarks with default optimizations:
   ```bash
   go test -bench=. riscv_bug_test.go
   ```

**Complete Go source file and generated RISC-V assembly have been attached for full analysis:**

[riscv_bug_test.go.txt](https://github.com/user-attachments/files/21205577/riscv_bug_test.go.txt)

[riscv_bug_test.S.txt](https://github.com/user-attachments/files/21205840/riscv_bug_test.S.txt)

### What did you see happen?

**1. Performance Results (Problematic)**

```bash
BenchmarkSortedData-4       6843    175703 ns/op  (SLOW - 4x slower!)
BenchmarkUnsortedData-4    27356     43874 ns/op  (FAST)
```

**2. Assembly Generation Issues**
The compiler generated drastically different assembly for the two functions:

BenchmarkSortedData: 152 bytes, 48-byte stack frame ($40-8)
BenchmarkUnsortedData: 124 bytes, 24-byte stack frame ($16-8)

The benchmark loops after `b.ResetTimer()` are identical in Go source code, but the compiler:

1. Uses different stack layouts (56(SP) vs 32(SP) offsets)
2. Applies different inlining strategies
3. Generates different register allocation patterns

**Verification with disabled optimizations**

When running with `-gcflags="-N -l"`, the performance difference becomes logical:
```bash
bashBenchmarkSortedData-4        975   1229971 ns/op  (Predictable branches - faster)
BenchmarkUnsortedData-4      858   1397883 ns/op  (Unpredictable branches - slower)
```

This shows the 4x artificial difference disappears when optimizations are disabled, revealing the real ~14% CPU behavior difference.

### Expected Assembly Behavior
Both functions should generate **similar assembly code** since they have identical Go source code after `b.ResetTimer()`. The compiler should:

1. **Use similar stack layouts** (same frame size and local variable allocation)
2. **Apply consistent optimization strategies** for the identical benchmark loops
3. **Generate comparable code size** (within a few bytes)
4. **Respect the `b.ResetTimer()` optimization boundary** - code before the timer reset should not influence code generation after it

### Expected Performance Results
The performance difference should reflect **real CPU behavior** (branch prediction effects), approximately:

```bash
BenchmarkSortedData-4        ~1000   ~1200000 ns/op  (Predictable branches)
BenchmarkUnsortedData-4       ~900   ~1400000 ns/op  (Unpredictable branches)
```
This would show a realistic ~15% difference due to branch misprediction penalties on the SiFive U74-MC, not an artificial 4x compiler-generated difference.

### Expected consistency

The same optimization level should produce the same code structure for logically equivalent functions, regardless of data preparation steps that occur before the measurement boundary.

### Root cause analysis (updated)
After further investigation with collaboration from other AI systems, the root cause appears to be be related to the **inlining budget heuristics** in the RISC-V backend. 

The presence of `sort.Ints(data)` before `b.ResetTimer()` causes the compiler to:

1. **Perceive higher function complexity** due to the sort operation's internal loops and calls
2. **Consume inlining budget** during the analysis phase  
3. **Adopt different optimization strategies** for subsequent code, including the benchmark loop
4. **Allocate larger stack frames** as a preventive measure (48 vs 24 bytes)
5. **Generate different register allocation patterns** due to the altered stack layout

This creates a cascade effect where identical Go code after `b.ResetTimer()` produces different assembly due to compiler state changes from code that shouldn't influence the measured performance.

### Technical Impact
- Different stack frame sizes: `$40-8` vs `$16-8`  
- Different register allocation strategies
- Different code generation for identical source code
- 4x artificial performance difference masking real CPU behavior

### Related Issues
This appears related to #50821 (AMD64 register allocation inconsistency), suggesting a broader compiler optimization pipeline issue affecting multiple architectures.

Edits: Typos. No results change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cmd/compile: riscv performance degredation #74606

Go version

Output of `go env` in your module/workspace:

What did you do?

What did you see happen?

Expected Assembly Behavior

Expected Performance Results

Expected consistency

Root cause analysis (updated)

Technical Impact

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cmd/compile: riscv performance degredation #74606

Description

Go version

Output of go env in your module/workspace:

What did you do?

What did you see happen?

Expected Assembly Behavior

Expected Performance Results

Expected consistency

Root cause analysis (updated)

Technical Impact

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Output of `go env` in your module/workspace: