Genshi is a Python library that provides an integrated set of components
for parsing, generating, and processing HTML, XML or other textual content
for output generation on the web. The major feature is a template language,
which is heavily inspired by Kid.
Divmod Reverend is a simple, general purpose Bayesian classifier,
written in Python.
It is designed to be easy to adapt and extend for your application.
Stuff you can do with the Reverend:
* classify RSS stories
* classify recipes by cuisine
* who do you write like? Shakespeare, Dickens or Austen
* detect the language of a document
* is your code more like Guido's or Peter's
Ah yes, INI files. We love them. We hate them. We cannot escape
them. Originally made popular by Windows, INI files are everywhere
including in Samba[www.samba.org] and Trac[trac.edgewall.org]. This
gem has one goal: make INI file, structure, and stream manipulation
as fast, safe, and simple as possible. We take a modal approach
with a pluggable parser class.
This is a generic, language-neutral framework for extending
Ruby objects with linguistic methods.
It includes an English-language module with inflection,
pluralisation, conjunctions, indefinite articles, present
participles, ordinal numbers, numbers to words, general
quantification, integration with WordNet and CMU's LinkGrammar,
as well as a framework for providing modules for other languages.
Loofah is a general library for manipulating HTML/XML documents and fragments.
It's built on top of Nokogiri and libxml2, so it's fast and has a nice API.
Loofah excels at HTML sanitization (XSS prevention). It includes some nice HTML
sanitizers, which are based on HTML5lib's whitelist.
A generic swappable back-end for XML parsing.
Lots of Ruby libraries utilize XML parsing in some form, and everyone has their
favorite XML library. In order to best support multiple XML parsers and
libraries, multi_xml is a general-purpose swappable XML backend library.
Ox, standing for Optimized XML, is a XML parser and object serializer,
which is designed to be a speed-optimized alternative to Nokogiri and Marshal.
* Ox is self contained, and uses nothing other than standard C libraries.
* Ox writes/parses generic XML documents including HTML documents.
* Ox serializes Objects into human readable XML in contrast to Marshal
* Ox also supports SAX parsing.
Saxon is a collection of tools for processing XML documents. The main
components are:
- An XSLT 2.0 processor, that can be used from the command line, or invoked
from a Java application by use of the standard JAXP API. Saxon can be
integrated with Java applications using the JAXP API, which means it is
possible for a Java application to switch between different XSLT processors
without changing the application code. As well as conforming closely with the
XSLT 2.0 specification, Saxon offers a number of powerful extensions.
- An XPath 2.0 processor accessible via an API to Java applications.
- An XQuery 1.0 processor that can be used from the command line, or invoked
from a Java application by use of an API.
- An XML Schema 1.0 processor. This can be used on its own to validate a schema
for correctness, or to validate a source document against the definitions in
a schema. It is also used to support the schema-aware functionality of the
XSLT and XQuery processors.
So you can use Saxon to process XML by writing XSLT stylesheets, by writing
XQuery queries, by writing Java applications, or by combinations of the
approaches.
RT is a simple and human-readable table format.
RTtool is a converter from RT into various formats.
RT can be incorporated into RD.
At this time, RTtool can convert RT into HTML and plain text.
To convert into plain text, you need w3m.
The syck extension is a binding to the Syck library which facilitates
YAML parsing.
YAML(tm) (rhymes with "camel") is a straightforward machine parsable
data serialization format designed for human readability and
interaction with scripting languages. YAML is optimized for data
serialization, configuration settings, log files, Internet
messaging and filtering.