Source-highlight is a simple program that, given a source C/C++,
Prolog, Perl, PHP3, Python or Java file, produces an html source
with syntax highlighting.
UnRTF is a command-line converter from RTF (Rich Text) to HTML, LaTeX,
PostScript, plain text, and text with VT100 codes. When converting to HTML, it
supports tables, fonts, embedded images, hyperlinks, paragraph alignment, and
more. All other conversions are "alpha" i.e. being newly developed.
Weka is a collection of machine learning algorithms for data mining tasks. The
algorithms can either be applied directly to a dataset or called from your own
Java code. Weka contains tools for data pre-processing, classification,
regression, clustering, association rules, and visualization. It is also
well-suited for developing new machine learning schemes.
PPower4 is used to post process presentations in PDF format which were
prepared using (La)TeX to add dynamic effects. The PDF files can be
created with pdf(la)tex, v(la)tex or with standard LaTeX and then
converted to PDF with dvipdfm.
PDFMiner is a tool for extracting information from PDF documents. Unlike other
PDF-related tools, it focuses entirely on getting and analyzing text data.
PDFMiner allows to obtain the exact location of texts in a page, as well as
other information such as fonts or lines. It includes a PDF converter that can
transform PDF files into other text formats (such as HTML).
It has an extensible PDF parser that can be used for other purposes instead
of text analysis.
RSS2Gen is a Python library for generating RSS 2.0 feeds.
Python bindings for the LT XML API and toolkit.
This is a Ruby library for parsing, creating, downloading, and caching
RSS (http://my.netscape.com/publish/help/mnn20/quickstart.html).
RT is a simple and human-readable table format.
RTtool is a converter from RT into various formats.
RT can be incorporated into RD.
At this time, RTtool can convert RT into HTML and plain text.
To convert into plain text, you need w3m.
sgrep (structured grep) is a tool for searching and indexing text, SGML,XML
and HTML files and filtering text streams using structural criteria. The data
model of sgrep is based on regions, which are nonempty substrings of text.
Regions are typically occurrences of constant strings, SGML-tags, or meaningful
text elements, which are recognizable through some delimiting strings or the
builtin SGML, XML and HTML parser. Regions can be arbitrarily long, arbitrarily
overlapping, and arbitrarily nested.
Sgrep is a convenient tool for making queries to almost any kind of text files
with some well kown structure. These include programs, mail folders, news
folders, HTML, SGML, etc... With relatively simple queries you can display mail
messages by their subject or sender, extract titles or links or any regions
from HTML files, function prototypes from C or make complex queries to SGML
files based on the DTD of the file.