Sphinx is an open source full text search server, designed from the
ground up with performance, relevance (aka search quality), and
integration simplicity in mind. It's written in C++ and works on Linux
(RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few
other systems.
Sphinx lets you either batch index and search data stored in an SQL
database, NoSQL storage, or just files quickly easily and or index and
search data on the fly, working with Sphinx pretty much as with a
database server.
A variety of text processing features enable fine-tuning Sphinx for
your particular application requirements, and a number of relevance
functions ensures you can tweak search quality as well.
Searching via SphinxAPI is as simple as 3 lines of code, and querying
via SphinxQL is even simpler, with search queries expressed in good
old SQL.
Sphinx clusters scale up to billions of documents and tens of millions
search queries per day, powering top websites such as Craigslist,
DailyMotion, NetLog, etc.
And last but not least, it's open-sourced under GPLv2, and the
community edition is free to use.
A declarative YAML templating system tuned for BOSH deployment manifests.
SRILM is a toolkit for building and applying statistical language models (LMs),
primarily for use in speech recognition, statistical tagging and segmentation.
It has been under development in the SRI Speech Technology and
Research Laboratory since 1995. The toolkit has also greatly benefitted from
its use and enhancements during the Johns Hopkins University/CLSP summer
workshops in 1995, 1996, and 1997
SRILM consists of the following components:
* A set of C++ class libraries implementing language models,
supporting data stuctures and miscellaneous utility functions.
* A set of executable programs built on top of these libraries to
perform standard tasks such as training LMs and testing them on
data, tagging or segmenting text, etc.
* A collection of miscellaneous scripts facilitating minor related tasks.
Colordiff 是一个 diff 的包装器,产生和 diff 相同的输出,但是在命令行模式下以语法高亮的
形式显示,以增强可读性。
输出和带有语法高亮功能的 Vim 或 Emacs 产生的 diff-generated 补丁类似。
- ehaupt
ehaupt@critical.ch
Aspell Yiddish dictionary.
Yiddish hunspell dictionaries
svn2cl 是一个简单的 xsl 转换和 shell 包装,用来根据 subversion 版本库日志
生成一个经典的 GNU 风格的变更履历。 它是由几个从不同的地方找来的不同风格
的变更履历脚本使用共同的 xslt 构成的。
TagSoup - Just Keep On Truckin'
TagSoup is a SAX-compliant parser written in Java that, instead of parsing
well-formed or valid XML, parses HTML as it is found in the wild: poor,
nasty and brutish, though quite often far from short. TagSoup is designed
for people who have to process this stuff using some semblance of a rational
application design. By providing a SAX interface, it allows standard XML
tools to be applied to even the worst HTML. TagSoup also includes
a command-line processor that reads HTML files and can generate either
clean HTML or well-formed XML that is a close approximation to XHTML.
Zulu hunspell dictionaries