Rust @programming.dev burntsushi @programming.dev 1 yr. ago

aho-corasick (and thus the regex crate too) now uses SIMD on aarch64 (e.g., Apple silicon) to greatly accelerate some searches

github.com add aarch64 SIMD implementation of Teddy by BurntSushi · Pull Request #129 · BurntSushi/aho-corasick

Up until this point, Teddy was explicitly written using x86-64 SIMD routines. Specifically, ones from SSSE3 and AVX2. This PR shuffles Teddy's main implementation into code that is generic over a n...

2 comments

Cross-posting from reddit:

The PR has more details, but here are a few ad hoc benchmarks using ripgrep on my M2 mac mini while searching a 5.5GB file.

This one is just a case insensitive search. A case insensitive regex expands to something like (ignoring Unicode) [Ss][Hh][Ee][Rr]..., which means that it has multiple literal prefixes. In fact, you can enumerate them! As long as the set is small enough, this is something that the new SIMD acceleration on aarch64 can handle (and has done for a long time on x86-64):

$ time rg-before-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en 3055 real 8.208 user 7.731 sys 0.467 maxmem 5600 MB faults 191 $ time rg-after-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en 3055 real 1.137 user 0.695 sys 0.430 maxmem 5904 MB faults 203

And of course, using multiple literals explicitly also uses this optimization:

$ time rg-before-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en 3804 real 9.055 user 8.580 sys 0.474 maxmem 4912 MB faults 11 $ time rg-after-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en 3804 real 1.121 user 0.697 sys 0.422 maxmem 4832 MB faults 11

And it doesn't just work for prefixes, it also works for inner literals too:

$ time rg-before-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en 773 real 9.065 user 8.586 sys 0.477 maxmem 6384 MB faults 11 $ time rg-after-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en 773 real 1.124 user 0.702 sys 0.421 maxmem 6784 MB faults 11

If you're curious about how the SIMD stuff works, you can read my description of Teddy here. I ported this algorithm out of the Hyperscan project several years ago, and it has been one of the killer ingredients for making ripgrep fast in a lot of common cases. But it only worked on x86-64. With the rise and popularity of aarch64 and Apple silicon, I was motivated to port it over. I just recently finished analogous work for the memchr crate as well.
- This sounds really great and will probably have quite an impact on a lot of users. So, nice work!