RTF is the Microsoft Richtext Format, a more portable, mostly-ASCII
formatting language that is exported by word processors like MS Word.
These files generally have the extension .rtf, but occassionally have
.doc extensions as well. This parser is from the Microsoft spec,
"ported" to Unix systems.
PyCHM is a package that provides bindings for Jed Wing's CHMLIB library.
Queequeg is a tiny English grammar checker for non-native speakers who
are not used to verb conjugation and number agreement. We especially
focus on people who're writing academic papers or business documents
where thorough checking is required. We aim to reduce this laborious
work with automated checking.
This is a variant of the Flex fast lexical scanner. Flex was written
in the early 1990s by Verne Paxson. This version has been modified
by Thomas Dickey, so that it conforms to ANSI C. It includes other
improvements, but remains compatible with Paxson's 2.5.4 release
(as well as POSIX lex). See the NEWS file for details.
rtfx converts RTF files into a generic XML format. It majors on keeping
meta data like style names, etc... rather than every bit of formatting.
This makes it handy for converting RTF documents into a custom XML
format (using XSL or an additional processing step).
RTF features supported: page breaks, section breaks, style names,
lists (various types), tables, footnotes, info block, bold, italic,
underline, super/sub script, hidden text, strike out, text color, fonts.
This is a Ruby module to access James Clark's XML Parser ToolKit. ("expat")
Sary is a suffix array library and tools. It provides fast full-text
search facilities for text files on the order of 10 to 100 MB using a
data structure called a suffix array. It can also search specific
fields in a text file by assigning index points to those fields.
The syck extension is a binding to the Syck library which facilitates
YAML parsing.
YAML(tm) (rhymes with "camel") is a straightforward machine parsable
data serialization format designed for human readability and
interaction with scripting languages. YAML is optimized for data
serialization, configuration settings, log files, Internet
messaging and filtering.
These tools are used to convert XML and HTML to and from a line-oriented
format more amenable to processing by classic Unix pipeline processing
tools, like grep, sed, awk, cut, shell scripts, and so forth.
The line-oriented format used by these tools looks very much like, but
is not quite precisely the same as XPath.
The Slides doctype and stylesheets are for making presentations.