Marpa::HTML does "high-level" parsing of HTML. It allows handlers to be
specified for elements, terminals and other components in the hierarchical
structure of an HTML document. Marpa::HTML is an extremely liberal HTML parser.
Marpa::HTML does not reject any documents, no mater how poorly they fit the HTML
standards.
Sphinx::Manager provides utilities to start, stop, restart, and reload the
Sphinx search engine binary (searchd), and to run the Sphinx indexer program.
The utilities are designed to handle abnormal conditions, such as PID files not
being present when expected, and so should be robust in most situations.
String::Flogger::flog() args are mostly just like sprintf arguments, but
non-strings (like references, objects, and undef) are converted to JSON,
and we can defer evaluation of bits of the message so that it won't be
evaluated unless needed.
Text::Ngrams - Flexible Ngram analysis (for characters, words, and more)
This module implement text n-gram analysis, supporting several types of
analysis, including character and word n-grams.
The module can be used from the command line through the script ngrams.pl
provided with the package.
Text::Xslate::Bridge::TT2Like exports Template-Toolkit variable
methods into Text::Xslate namespace, such that you can use them on
your variables.
The only difference between this module and Text::Xslate::Bridge::TT2
is that Bridge::TT2 uses Template::Toolkit underneath, while this
module is independent of Template::Toolkit and therefore does not
require TT to be installed.
This module aims to comply exactly to the XPath specification at
http://www.w3.org/TR/xpath and yet allow extensions to be added in the
form of functions. Modules such as XSLT and XPointer may need to do
this as they support functionality beyond XPath.
PEAR::XML_DTD parsing of DTD files and DTD validation of XML files.
The XML validation is done with the php sax parser, the xml extension, it
does not use the domxml extension.
Currently supports most of the current XML spec, including entities,
elements and attributes. Some uncommon parts of the spec may still be
unsupported.
PPower4 is used to post process presentations in PDF format which were
prepared using (La)TeX to add dynamic effects. The PDF files can be
created with pdf(la)tex, v(la)tex or with standard LaTeX and then
converted to PDF with dvipdfm.
SDCV(StarDict under Console Version) is simple, cross-platform text-based
utility for work with dictionaries in StarDict's format.
The word from "list of words" may be string with leading '/' for using Fuzzy
search algorithm, string may contain '?' and '*' for using regexp search.
It work in interactive and not interactive mode.
Douglas Thrift's Search Engine is an indexing search engine for use on small
websites such as personal or small business sites. It is designed to be
very similar to Google for end users and its output is customizable. For
indexing, it supports both the Robots Exclusion Protocol and the Robots META
Tag as specified at http://www.robotstxt.org/wc/exclusion.html.