diff options
author | Andrew Gallant <jamslam@gmail.com> | 2018-07-17 20:23:31 -0400 |
---|---|---|
committer | Andrew Gallant <jamslam@gmail.com> | 2018-07-17 20:27:04 -0400 |
commit | 7e5a59027638952ae55a1cc3d9d3b2ce7d69be98 (patch) | |
tree | 4be0d0d3cb2c877db00c87dd52472dd89ba8a7f2 /grep | |
parent | d17ca45063c53c188c207fe5825c36d885513560 (diff) |
grep: small literal detection fix
This commit tweaks the inner literal detection heuristic such that if it
comes up with any literal that is all whitespace, then it's likely a bad
literal to look for since it's so common. Therefore, we simply reject the
inner literal optimization in this case and let the regex engine do its
thang.
Diffstat (limited to 'grep')
-rw-r--r-- | grep/src/literals.rs | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/grep/src/literals.rs b/grep/src/literals.rs index 3e1c385b..5e3dc8ea 100644 --- a/grep/src/literals.rs +++ b/grep/src/literals.rs @@ -67,6 +67,16 @@ impl LiteralSets { lit = req; } + // Special case: if we have any literals that are all whitespace, + // then this is probably a failing of the literal detection since + // whitespace is typically pretty common. In this case, don't bother + // with inner literal scanning at all and just defer to the regex. + let any_all_white = req_lits.iter() + .any(|lit| lit.iter().all(|&b| (b as char).is_whitespace())); + if any_all_white { + return None; + } + // Special case: if we detected an alternation of inner required // literals and its longest literal is bigger than the longest // prefix/suffix, then choose the alternation. In practice, this |