This is an implementation of Rabin and Karp's streaming hash, as described
in "Winnowing: Local Algorithms for Document Fingerprinting" by Schleimer,
Wilkerson, and Aiken. Following the suggestion of Schleimer, I am using
their second equation:
$H[ $c[2..$k + 1] ] = (( $H[ $c[1..$k] ] - $c[1] ** $k ) + $c[$k+1] ) * $k
The results of this hash encodes information about the next k values in
the stream (hense k-gram.) This means for any given stream of length n
integer values (or characters), you will get back n - k + 1 hash values.
For best results, you will want to create a code generator that filters
your data to remove all unnecessary information. For example, in a large
english document, you should probably remove all white space, as well as
removing all capitalization.
BibTeX::Parser is a pure perl BibTeX parser.
A Bloom filter is a probabilistic algorithm for doing existence tests
in less memory than a full list of keys would require. The tradeoff to
using Bloom filters is a certain configurable risk of false positives.
CSS::Simple is an interface through which to read/write/manipulate CSS
files while respecting the cascade order.
This module takes a list of CSS files and concatenates them, making sure
to honor any valid @import statements included in the files.
Following the CSS 2.1 spec, @import statements must be the first rules in
a CSS file. Media-specific @import statements will be honored by enclosing
the included file in an @media rule. This has the side effect of actually
improving compatibility in Internet Explorer, which ignores media-specific
@import rules but understands @media rules.
It is possible that feature versions will include methods to compact
whitespace and other parts of the CSS itself, but this functionality is
not supported at the current time.
CSS::Tiny is a perl class to read and write .css stylesheets with as
little code as possible, reducing load time and memory overhead.
This module is primarily for reading and writing simple files, and
anything we write shouldn't need to have documentation/comments. If you
need something with more power, move up to CSS.pm.
Hansjoerg Pehofer <hansjoerg.pehofer@uibk.ac.at>
RNV is an implementation of Relax NG Compact Syntax validator in ANSI C.
This module can be used, along with a CSS::Parse::* module, to parse CSS
data and represent it as a tree of objects. Using a CSS::Adaptor::* module,
the CSS data tree can then be transformed into other formats.
stringi (pronounced "stringy") is THE R package for fast, correct,
consistent and convenient string/text processing in each locale and
any native character encoding. The use of the ICU library gives R
users a platform-independent set of functions known to Java, Perl,
Python, PHP, and Ruby programmers.