Go: Performance optimization for IndexByte for Power

Go: Performance optimization for IndexByte for Power

I added a new implementation for both bytes·IndexByte and strings·IndexByte for Power architecture. The new lazy search-based algorithm gives a speedup of up to 15x over the previous implementation. This works on both Little Endian and Big Endian ppc64.

I may add a vectorized version of this algorithm in the future, if it proves more efficient for long slices.

In addition, for Go 1.9, we will drop support for processors older than POWER8 upstream. This means that both ppc64 and ppc64le will only differ on endianess, and not on processor support.

Committed as d60166d.

by