bib2html is a script which permits to generate a set of HTML pages from a
BibTeX database.
Mini-XML is a small XML parsing library that you can use to
read XML and XML-like data files in your application without
requiring large non-standard libraries.
Apertium is an open-source machine translation platform, initially aimed
at related-language pairs but recently expanded to deal with more
divergent language pairs (such as English-Catalan). The platform
provides:
1. a language-independent machine translation engine
2. tools to manage the linguistic data necessary to build a machine
translation system for a given language pair and
3. linguistic data for a growing number of language pairs
Pod::DocBook is a module for translating Pod-formatted documents to
DocBook 4.2 SGML. It is primarily a back end for pod2docbook, but,
as a Pod::Parser subclass, it can be used on its own.
Library to compare files and strings, used in Kompare and KDevelop.
XML::Twig - Tree interface to XML documents allowing chunk by chunk
processing of huge documents.
From the website:
XML-Twig is a Perl module that subclasses XML-Parser to allow easy
processing of XML documents of all sizes. A flush method allows dumping of
a completely processed sub-document to be printed, thus allowing processing
of documents of any size.
Real People TTS for StarDict.
StarDict is a Cross-Platform and international dictionary written in Gtk2.
It has powerful features such as "Glob-style pattern matching," "Scan
selection word," "Fuzzy query," etc.
Mako is a template library written in Python. It provides a familiar,
non-XML syntax which compiles into Python modules for maximum
performance. Mako's syntax and API borrows from the best ideas of many
others, including Django templates, Cheetah, Myghty, and
Genshi. Conceptually, Mako is an embedded Python (i.e. Python Server
Page) language, which refines the familiar ideas of componentized
layout and inheritance to produce one of the most straightforward and
flexible models available, while also maintaining close ties to Python
calling and scoping semantics.
NLTK is a leading platform for building Python programs to work with human
language data. It provides easy-to-use interfaces to over 50 corpora and
lexical resources such as WordNet, along with a suite of text processing
libraries for classification, tokenization, stemming, tagging, parsing,
and semantic reasoning, and an active discussion forum.
Thanks to a hands-on guide introducing programming fundamentals alongside
topics in computational linguistics, NLTK is suitable for linguists,
engineers, students, educators, researchers, and industry users alike.
NLTK is available for Windows, Mac OS X, and Linux. Best of all, NLTK is
a free, open source, community-driven project.
NLTK has been called "a wonderful tool for teaching, and working in,
computational linguistics using Python" and "an amazing library to play
with natural language".