Sphinx is an open source full text search server, designed from the
ground up with performance, relevance (aka search quality), and
integration simplicity in mind. It's written in C++ and works on Linux
(RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few
other systems.
Sphinx lets you either batch index and search data stored in an SQL
database, NoSQL storage, or just files quickly easily and or index and
search data on the fly, working with Sphinx pretty much as with a
database server.
A variety of text processing features enable fine-tuning Sphinx for
your particular application requirements, and a number of relevance
functions ensures you can tweak search quality as well.
Searching via SphinxAPI is as simple as 3 lines of code, and querying
via SphinxQL is even simpler, with search queries expressed in good
old SQL.
Sphinx clusters scale up to billions of documents and tens of millions
search queries per day, powering top websites such as Craigslist,
DailyMotion, NetLog, etc.
And last but not least, it's open-sourced under GPLv2, and the
community edition is free to use.
A declarative YAML templating system tuned for BOSH deployment manifests.
SRILM is a toolkit for building and applying statistical language models (LMs),
primarily for use in speech recognition, statistical tagging and segmentation.
It has been under development in the SRI Speech Technology and
Research Laboratory since 1995. The toolkit has also greatly benefitted from
its use and enhancements during the Johns Hopkins University/CLSP summer
workshops in 1995, 1996, and 1997
SRILM consists of the following components:
* A set of C++ class libraries implementing language models,
supporting data stuctures and miscellaneous utility functions.
* A set of executable programs built on top of these libraries to
perform standard tasks such as training LMs and testing them on
data, tagging or segmenting text, etc.
* A collection of miscellaneous scripts facilitating minor related tasks.
Colordiff is a wrapper for diff and produces the same output as diff but with
coloured syntax highlighting at the command line to improve readability.
The output is similar to how a diff-generated patch might appear in Vim or Emacs
with the appropriate syntax highlighting options enabled.
Aspell Yiddish dictionary.
Yiddish hunspell dictionaries
svn2cl is a simple xsl transformation and shell script wrapper for generating
a classic GNU-style ChangeLog from a subversion repository log. It is made
from several changelog-like scripts using common xslt constructs found in
different places.
TagSoup - Just Keep On Truckin'
TagSoup is a SAX-compliant parser written in Java that, instead of parsing
well-formed or valid XML, parses HTML as it is found in the wild: poor,
nasty and brutish, though quite often far from short. TagSoup is designed
for people who have to process this stuff using some semblance of a rational
application design. By providing a SAX interface, it allows standard XML
tools to be applied to even the worst HTML. TagSoup also includes
a command-line processor that reads HTML files and can generate either
clean HTML or well-formed XML that is a close approximation to XHTML.
Zulu hunspell dictionaries