A new all Haskell "tagged" DFA regex engine, inspired by libtre.
This library implements i;unicode-casemap, the simple, non
locale-sensitive unicode collation algorithm described in RFC 5051.
Proper unicode collation can be done using text-icu, but that is a big
dependency that depends on a large C library, and rfc5051 might be
better for some purposes.
TagSoup is a library for parsing HTML/XML. It supports the HTML 5
specification, and can be used to parse either well-formed XML, or
unstructured and malformed HTML from the web. The library also provides
useful functions to extract information from an HTML document, making it
ideal for screen-scraping.
Tag-stream is a library for parsing HTML//XML to a token stream. It can
parse unstructured and malformed HTML from the web. It also provides an
Enumeratee which can parse streamline html, which means it consumes constant
memory.
Basic types for representing XML.
Contains renderers and parsers for both XML and HTML 5 document fragments,
which share data structures so that it's easy to work with both. Document
fragments are bits of documents, which are not constrained by some of the
high-level structure rules (in particular, they may contain more than one
root element). Note that this is not a compliant HTML 5 parser. Rather,
it is a parser for HTML 5 compliant documents. It does not implement the
HTML 5 parsing algorithm, and should generally be expected to perform
correctly only on documents that you trust to conform to HTML 5. This is
not a suitable library for implementing web crawlers or other software
that will be exposed to documents from outside sources. The result is also
not the HTML 5 node structure, but rather something closer to the physical
structure. For example, omitted start tags are not inserted (and so, their
corresponding end tags must also be omitted).
A program which can be used to converts a single HTML file or a collection
of related HTML files into a single LaTeX file.
String operations the Python way - a package for those of us who miss Python's
string methods while we're working in R.
DTDinst is a program for converting XML DTDs into XML instance
format. The XML instance can be in either a format specific to DTDinst
or RELAX NG format.
mRss is a C library for parsing, writing and creating RSS files or streams.