This is a port of HTMLDOC, which can:
Convert HTML files to PDF or PostScript
Generate a table-of-contents for books
Generate indexed HTML files
Generate files on-the-fly for web applications, from the
command-line for batch jobs, or from a GUI for interactive work.
HTMLDOC Provides
A command-line interface for batch and WWW applications.
A graphical interface for interactive work.
In my opinion, HTMLDOC is *fast*, compared to the other solutions I've seen.
HTMLDOC is available under the GPL.
Commercial support is available from the author.
The Haskell XML Toolbox bases on the ideas of HaXml and HXML, but
introduces a more general approach for processing XML with Haskell. The
Haskell XML Toolbox uses a generic data model for representing XML
documents, including the DTD subset and the document subset, in Haskell.
It contains a validating XML parser, a HTML parser, namespace support,
an XPath expression evaluator, an XSLT library, a RelaxNG schema
validator and funtions for serialization and deserialization of user
defined data. The library makes extensive use of the arrow approach for
processing XML.
libtranslate is a library for translating text and web pages between
natural languages. Its modular infrastructure allows to implement new
translation services separately from the core library.
libtranslate is shipped with a generic module supporting web-based
translation services such as Babel Fish, Google Language Tools and
SYSTRAN. Moreover, the generic module allows to add new services
simply by adding a few lines to a XML file (see the services.xml(5)
manual page).
The libtranslate distribution includes a powerful command line
interface (see the translate(1) manual page).
This module provides an object that matches a data source against a
query expression.
Query expressions are compiled into an internal form when a new object
is created or the `prepare' method is called; they are not recompiled on
each match.
The class provided by this module uses four packages to process the
query. The query parser parses the question and calls a query expression
builder (internal form of the question). The optimizer is then called to
reduce the complexity of the expression. The solver applies the
expression on a data source.
Dblatex started as a DB2LaTeX clone. So, why this project? The purpose
is a bit different on these points:
(1) The project is end-user oriented, that is, it tries to hide as much
as possible the latex compiling stuff by providing a single clean
script to produce directly DVI, PostScript and PDF output.
(2) The actual output rendering is done not only by the XSL stylesheets
transformation, but also by a dedicated LaTeX package. The purpose is
to allow a deep LaTeX customisation without changing the XSL
stylesheets.
(3) Post-processing is done by Python, to make publication faster,
convert the images if needed, and do the whole compilation.
XML::Node is a Perl5 module which provides a simplified extension interface
to XML::Parser.
Paraphrasing the README:
Instead of worrying about "start", "end", and "char" callbacks of every
single XML node, you can simply say that you only want to be notified when
a path is found.
Using XML::Node, you can ignore the parts of XML files that you are not
interested in. Additionally, you can register a variable instead of a
callback function. The corresponding string found in an XML file will be
automatically appended to your variable.
RXP is a very fast validating XML parser written by Richard Tobin
of the University of Edinburgh. It complies fully with the W3C test
suites (although we have compiled it without Unicode support for
the time being). pyRXP is a wrapper around this which constructs a
lightweight in-memory "tuple tree" in a single call. This structure
is the lightest one we could define in Python, and it is constructed
entirely in C code, resulting in unprecedented speed. It is a core
part of ReportLab's forthcoming XML toolkit, which aims to offer
simple, fast and pythonic tools for common XML processing tasks.
TagSoup - Just Keep On Truckin'
TagSoup is a SAX-compliant parser written in Java that, instead of parsing
well-formed or valid XML, parses HTML as it is found in the wild: poor,
nasty and brutish, though quite often far from short. TagSoup is designed
for people who have to process this stuff using some semblance of a rational
application design. By providing a SAX interface, it allows standard XML
tools to be applied to even the worst HTML. TagSoup also includes
a command-line processor that reads HTML files and can generate either
clean HTML or well-formed XML that is a close approximation to XHTML.
XML/Ada is a full XML suite for use with Ada compilers, such as GNAT AUX.
XML/Ada is a set of modules that provide a simple manipulation of XML
streams. It supports the whole XML 1.1 specification and can parse any file
that follows this standard, including the contents of the DTD although no
validation of the documents is performed based on those.
It provides support for a number of standards associated with XML such as
SAX, DOM, and XML schemas. Additionally, it includes a module to manipulate
unicode streams since this is required by the XML standard.
Apache Cocoon is a web development framework built around the
concepts of separation of concerns and component-based web development.
Cocoon implements these concepts around the notion of 'component
pipelines', each component on the pipeline specializing on a
particular operation. This makes it possible to use a Lego(tm)-like
approach in building web solutions, hooking together components
into pipelines without any required programming.
Cocoon is "web glue for your web application development needs".
It is a glue that keeps concerns separate and allows parallel
evolution of all aspects of a web application, improving development
pace and reducing the chance of conflicts.