Bindings for the libXML2 SAX interface.
Pandoc is a Haskell library for converting from one markup format to
another, and a command-line tool that uses this library. It can read
markdown and (subsets of) HTML, reStructuredText, LaTeX, DocBook,
MediaWiki markup, TWiki markup, Haddock markup, OPML, Emacs Org-Mode,
txt2tags and Textile, and it can write markdown, reStructuredText,
XHTML, HTML 5, LaTeX, ConTeXt, DocBook, OPML, OpenDocument, ODT, Word
docx, RTF, MediaWiki, DokuWiki, Textile, groff man pages, plain text,
Emacs Org-Mode, AsciiDoc, Haddock markup, EPUB (v2 and v3),
FictionBook2, InDesign ICML, and several kinds of HTML/javascript slide
shows (S5, Slidy, Slideous, DZSlides, reveal.js).
Pandoc extends standard markdown syntax with footnotes, embedded LaTeX,
definition lists, tables, and other features. A compatibility mode is
provided for those who need a drop-in replacement for Markdown.pl.
In contrast to existing tools for converting markdown to HTML, which use
regex substitutions, pandoc has a modular design: it consists of a set
of readers, which parse text in a given format and produce a native
representation of the document, and a set of writers, which convert this
native representation into a target format. Thus, adding an input or
output format requires only adding a reader or writer.
The PCRE backend to accompany regex-base.
Automated proofreader for text files, man pages, and DocBook SGML
source files.
html-pretty (or htmlpty on file systems with unpleasant filename
length restrictions) is a prettyprinter for HTML and SGML. It can
also assist in the conversion of ordinary text files in ASCII or
ISO8859-1 character sets to HTML.
Simple utilities for manipulating HTML and XML files.
dom4j is an easy to use, open source library for working with XML, XPath
and XSLT on the Java platform using the Java Collections Framework and
with full support for DOM, SAX and JAXP.
SGML DTDs for HTML level 0, 1, 2, 3.2, and the 4.0 draft as
defined by the World Wide Web Consortium (W3C). See
http://www.w3.org/ for more information.
These DTDs are useful for validating or processing world wide web
pages with SGML tools such as those in the sp or jade ports.
John Fieber
jfieber@FreeBSD.org
html2text is a command line utility, written in C++, that converts
HTML documents (HTML 3.2) into plain text (ISO 8859-1).
Each HTML document is loaded from a location indicated by an URI or
read from standard input, and formatted into a stream of plain text
characters that is written to standard output or into an output-file.
The input-URI may specify a remote site, from that the documents are
loaded with the Hypertext Transfer Protocol (HTTP). The program is
even able to preserve the original positions of table fields and
accepts also syntactically incorrect input, attempting to interpret it
"reasonably". The rendering is largely customisable through an RC
file.
html2xhtml converts HTML files into XHTML. It can fix many common
errors in HTML files (e.g. missing end tags, elements with incorrect
content model, non-standard elements or attributes, etc.) It can
also handle invalid or non well-formed XHTML input, and clean it
to produce a well-formed and valid XHTML output. The output document
type can be selected among several XHTML DTDs (1.0, 1.1, Basic, etc.)