Go: Changed function alignment to 16 bytes for Power

Go: Changed function alignment to 16 bytes for Power

I just added a simple, but important change for Power. In order to eliminate inefficiencies in the iBuffer, all functions are now aligned to 16 bytes.

This opens a new future work line to add an alignment directive in the assembler, so we can properly align loops in the compiler and when writing code in assembly.

Committed as 09b71d5.

by Carlos Eduardo Seo

Go: Performance optimization for addVV for Power

Go: Performance optimization for addVV for Power

I added a new implementation for addVV (math/big package) for Power architecture. The new assembly implementation leverages specific Power instructions and provides a speedup of ~3x over the generic implementation in Go. This works on both Little Endian and Big Endian ppc64, and will be available in the next go1.9 release.

In addition, for go1.10, I plan to add optimizations for math/big using POWER9 instructions, which will help some of the multiply-and-add functions.

Committed as 9459c03.

by Carlos Eduardo Seo

Go: Cleanup of legacy code on Big Endian Power architecture

Go: Cleanup of legacy code on Big Endian Power architecture

As I mentioned in my previous post, in Go 1.9, the new minimum processor requirement for ppc64 Big Endian will be POWER8.

I started cleaning up old code that was required to maintain compatibility with POWER5 to POWER7 in the SSA backend and in the atomics implementation. Commits c644a76 and 189053a remove some checks that were preventing ppc64 from using (and benefiting from) instruction sequences we added for ppc64le, which would break older processors support.

This is another step forward making ppc64 and ppc64le differ only in endianess, and not in functionality or performance. If you spot any other legacy code that needs removal, patches are welcome!

by 

Go: Performance optimization for IndexByte for Power

Go: Performance optimization for IndexByte for Power

I added a new implementation for both bytes·IndexByte and strings·IndexByte for Power architecture. The new lazy search-based algorithm gives a speedup of up to 15x over the previous implementation. This works on both Little Endian and Big Endian ppc64.

I may add a vectorized version of this algorithm in the future, if it proves more efficient for long slices.

In addition, for Go 1.9, we will drop support for processors older than POWER8 upstream. This means that both ppc64 and ppc64le will only differ on endianess, and not on processor support.

Committed as d60166d.

by