Commit 1b045ee added a vectorized implementation of the main loop in strlen, improving the performance for long strings by approximately 200% on POWER8-based systems. Unfortunately, I had to use a few POWER8-only instructions, so this vectorized loop cannot be applied to POWER7 systems as well.
Thanks to Anton Blanchard for pointing out this optimization opportunity.
I am working on a similar implementation for strnlen as well.