CPM201122nd Annual
Symposium on Combinatorial
Pattern
Matching
Palermo, ITALY, 27-29 June 201 |

Gad Landau, Algorithms on Grammar-Compressed Strings Abstract:
Grammar based compression, where one replaces a long string by a small
context-free grammar that generates the string, is a simple and
powerful paradigm that captures many of the popular compression
schemes, including the Lempel-Ziv family, Run-Length Encoding,
Byte-Pair Encoding, Sequitur and Re-Pair.
Let S be a string of length N given as a grammar G(S) of size n. The random access problem is to compactly represent G(S) while supporting fast random access queries. That is, given an index i, report S[i] without decompressing S. We will first present a linear space representations of G(S) that supports O(log N) random access time. This representation extends to efficiently support substring decompression. Namely, we can decompress any substring S[i]...S[j] in the same complexity as a random access query and additional O(j-i) time. Once we obtain an efficient substring decompression method, it can then serve as a basis for a compressed version of classical pattern matching. Namely, we can take any black-box (uncompressed) approximate pattern matching algorithm and turn it into a corresponding algorithm over grammar compressed strings. We will then focus on a specific algorithm for computing the edit distance of two grammar-compressed strings. This algorithm requires O(nN) time and uses the compression in a more complicated way (i.e., not through random access queries). |