Disassembling the Extend function shows something that looks
much more healthy now. The SSE 4.2 instructions are right
there in the body of the function.
Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz
Before:
crc32c: 1.305 micros/op 766260 ops/sec; 2993.2 MB/s (4K per op)
After:
crc32c: 0.442 micros/op 2263843 ops/sec; 8843.1 MB/s (4K per op)