< Previous | [Index] | slide #34 out of 41 slides | Next >

String::Approx (cont.)


This code implements the basic 'bitap' algorithm for approximate string matching, within "edit distance" (Levenshtein measure).

The 'bitap' algorithm is best known as part of the agrep(1) ("approximate grep") implementation.

This code doesn't implement the "partition-scan" improvement, so this could still be made to run faster. Neither does it implement all the described extensions (implemented are "sets of characters" (any-character and caseignoring as special cases of this) and "patterns with and without errors"; missing are: "wild cards" (Kleene star), "unknown number of errors" (finding out the edit distance when given two strings), "non-uniform costs", "set of patterns", "long patterns", and "regular expressions"), so it can still be made to run slower, too.

Shlomo Yona. Israeli Perl Mongers monthly meetings.Last update at: Sat Jan 10 22:18:25 IST 2004