Generic template rendering and notifications with Consul
OpenJade is an implementation of the ISO/IEC 10179:1996 standard DSSSL
language. It is based on the James Clark implementation of DSSSL named
Jade. OpenJade is now developed and maintained by the OpenJade team.
For general information about DSSSL, see the OpenJade home page.
这个包是一个叫 OpenSP 的 SGML/XML 工具的集合。
它是 James Clark 的 SP 套件的一个分支。这些工具用来解析、验证和规
范 SGML 和 XML 文件。
OpenToken is a facility for performing token analysis and parsing within
the Ada language. It is designed to provide all the functionality of a
traditional lexical analyzer/parser generator, such as lex/yacc. But due
to the magic of inheritance and runtime polymorphism it is implemented
entirely in Ada as withed-in code. No precompilation step is required, and
no messy tool-generated source code is created. The tradeoff is that the
grammar is generated at runtime.
The Open Text Summarizer is an open source tool for summarizing texts.
The program reads a text and decides which sentences are important and
which are not.
AI::Categorizer is a framework for automatic text categorization. It
consists of a collection of Perl modules that implement common
categorization tasks, and a set of defined relationships among those
modules. The various details are flexible - for example, you can choose
what categorization algorithm to use, what features (words or otherwise)
of the documents should be used (or how to automatically choose these
features), what format the documents are in, and so on.
The basic process of using this module will typically involve obtaining a
collection of pre-categorized documents, creating a "knowledge set"
representation of those documents, training a categorizer on that
knowledge set, and saving the trained categorizer for later use. There are
several ways to carry out this process. The top-level AI::Categorizer
module provides an umbrella class for high-level operations, or you may
use the interfaces of the individual classes in the framework.
A simple sample script that reads a training corpus, trains a categorizer,
and tests the categorizer on a test corpus, is distributed as eg/demo.pl .
Perl library that provides several modules to compute or validate check digits.
This is an implementation of Rabin and Karp's streaming hash, as described
in "Winnowing: Local Algorithms for Document Fingerprinting" by Schleimer,
Wilkerson, and Aiken. Following the suggestion of Schleimer, I am using
their second equation:
$H[ $c[2..$k + 1] ] = (( $H[ $c[1..$k] ] - $c[1] ** $k ) + $c[$k+1] ) * $k
The results of this hash encodes information about the next k values in
the stream (hense k-gram.) This means for any given stream of length n
integer values (or characters), you will get back n - k + 1 hash values.
For best results, you will want to create a code generator that filters
your data to remove all unnecessary information. For example, in a large
english document, you should probably remove all white space, as well as
removing all capitalization.
BibTeX::Parser is a pure perl BibTeX parser.
A Bloom filter is a probabilistic algorithm for doing existence tests
in less memory than a full list of keys would require. The tradeoff to
using Bloom filters is a certain configurable risk of false positives.