I rewrote most of the math/big arithmetic implementations so they perform better on Power. Commit 9459c03, which I already mentioned here before, added new implementations for addVV/subVV with up to ~3x improvement. Now, commit 3cb41be adds better implementations for addMulVVW and mulAddVVW, which are important since they are used in the big numbers multiplication functions. Speedups are up to ~1.5x.
My colleague Lynn also added more performance optimizations in the compiler, with new intrinsics for math operations (floor, ceil, trunc) in commit 0f19e24. Those show a drastic performance improvement, with 88% benchmark time reduction.
Finally, we fixed the compiler on ppc64le-alpine with commit 9aea0e8. There are still some testsuite failures on Alpine Linux on Power, but we are working quickly to fix those.
In the short term, we should soon have runtime CPU capabilities detection on Power with the new internal/cpu package ported to ppc64/ppc64le, and also enable the Go assembler for the new ISA 3.0 (POWER9).