ANSI C implementation of a Suffix Tree

What you will find in this page

The Suffix Tree data structure

A suffix tree is a data structure that exposes the internal structure of a string in a deep way, and can be used to solve the exact matching problem in linear time, but its real virtue comes from its use in linear-time solutions to many string problems more complex than exact string matching.

The following definitions are taken from [1], which contains conprehensive overview of the suffix tree data structure:

Definition A suffix tree T for an m-character string S is a rooted directed tree with exactly m leaves numbered 1 to m. Each internal node, other than the root, has at least two children and each edge is labeled with a nonempty substring of S. No two edges out of a node can have edge-labels beginning with the same character. The key feature of the suffix tree is that for any leaf i, the concatenation of the edge-labels on the path from the root to leaf i exactly spells out the suffix of S that starts at position i. That is, it spells out S[i..m].

Authors and maintainers

The source code was initially by Dotan Tsadok who worked (on and off) on it from 24.12.2001 untill 21.8.2002 as his undergraduate project in Haifa university.

The current maintainer is Shlomo Yona.

A Perl interface for this suffix tree data structure is available thanks to Offer Kaye, with whom I produced the first working version.

TODO

Compilation

Under Linux one would probably do something like this:

make Makefile

Files

or as one tarball: suffix_tree.tar.gz

A Perl Module that interfaces the suffix tree ANSI-C implementation is now available:

Older versions of the Perl Module:

Thomas Mailund (<mailund@birc.dk>) has written a wrapper for the suffix tree in Python. The Python bindings for the suffix tree implementation can be found here.

Li Zhao (<mr.lizhao@gmail.com>) ported the code to C++ and reports higher performance. His port is available here.

click here to read Belorussian translation (provided by Webhostingrating)

License

Copyright (c) 2002, 2003 Shlomo Yona. All rights reserved.

This library is free software. You can redistribute it and/or modify it under the same terms as Perl itself.

References

  1. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology by Dan Gusfield. Hardcover - 534 pages 1st edition (January 15, 1997). Cambridge Univ Pr (Short); ISBN: 0521585198.

Shlomo Yona

Valid HTML 4.01!