Site Overlay

Go 1.11 — Performance updates for POWER

Go 1.11 — Performance updates for POWER

I have been working on several fronts lately to add new features and improve performance for Go on IBM POWER architecture in go 1.11. In preparation for the source tree freeze in May, I added a few things lately:

  • Atomics: rewrote most of the atomic operations in runtime/internal/atomic, sync/atomic and intrinsics in the SSA backend of the compiler to use lightweight sync instructions whenever possible, according to the ISA recommendations. This gives up to 30% performance increase in the multithread benchmarks. See commit 6633bb2 for details.
  • Big numbers (multi-precision maths): performance optimizations for addVV, subVV, mulAddVWW, addVW and subVW. I did some loop unrolling to keep the pipeline full and keep the processing units busy in a more efficient way. Speedups of over 3x were achieved in some cases, with 1.5x on average. See commits fc8967 and a44c728 for details.
  • Race detector: implemented support for Go on POWER in the LLVM thread sanitizer. This is a prerequisite for enabling the race detectors for Go on POWER. See here for details.
  • Assembler fixes: fixed some incomplete instruction implementations in the Go assembler — missing EH field in  l*arx / st*cx (commit e1f8fe8) and added add/and/or/xor immediate shifted instructions (commit 9a9a8c0).

There are a few more things I want to add for go1.11. If make it in time, I will update the list here.

by Carlos Eduardo Seo