This module is a simple HTML parser. It is similar in concept to HTML::Parser,
but it differs in a couple of important ways.
HTML::StickyQuery::DoCoMoGUID - add guid in query for DoCoMo imode.
This module is sub class of the HTML::Parser and uses it to
parse HTML document and add QUERY_STRING to href attributes.
You can assign Session ID or any form data without using cookie.
Seamus Venasse <svenasse@polaris.ca>
HTML::Strip
===========
This module strips HTML-like markup from text.
It is written in XS, and thus about five times quicker than using
regular expressions for the same task.
This class provides an easy interface to HTML::StripScripts, using
HTML::Parser to parse the HTML.
See HTML::Parser for details of how to customise how the raw HTML
is parsed into tags, and HTML::StripScripts for details of how to
customise the way those tags are filtered.
This module strips scripting constructs out of HTML, leaving as
much non-scripting markup in place as possible. This allows web
applications to display HTML originating from an untrusted source
without introducing XSS (cross site scripting) vulnerabilities.
You will probably use HTML::StripScripts::Parser rather than using
this module directly.
The process is based on whitelists of tags, attributes and attribute
values. This approach is the most secure against disguised scripting
constructs hidden in malicious HTML documents. As well as removing
scripting constructs, this module ensures that there is a matching
end for each start tag, and that the tags are properly nested.
Previously, in order to customise the output, you needed to subclass
HTML::StripScripts and override methods. Now, most customisation
can be done through the Rules option provided to new(). (See
examples/declaration/ and examples/tags/ for cases where subclassing
is necessary.) The HTML document must be parsed into start tags,
end tags and text before it can be filtered by this module. Use
either HTML::StripScripts::Parser or HTML::StripScripts::Regex
instead if you want to input an unparsed HTML document.
The HTML::Summary module produces summaries from the textual content of
web pages. It does so using the location heuristic, which determines the value
of a given sentence based on its position and status within the document; for
example, headings, section titles and opening paragraph sentences may be
favoured over other textual content. A LENGTH option can be used to restrict
the length of the summary produced.
This distribution contains the HTML::Summary module, and some supporting
modules. The full list of modules is:
HTML::Summary
Text::Sentence
Lingua::JA::Jcode
Lingua::JA::Jtruncate
This module can be used to parse the content of tables in HTML text. The
parser returns an arrayref consisting of data for each table found within the
passed-in text.
The HTML::Field set of modules creates objects that represent HTML form fields
which try to make it easier to interact with CGI objects, databases, and
HTML::Template objects.
The objective of an HTML::Field object is to know how to write its own HTML,
how to get its value out of a CGI object or from a hash,
how to add their value to a hash suitable for passing into a HTML::Template
or into a SQL::Abstract object, for example, and thus re-use some of the code
which is typically repeated several times in a CGI script.
This bundle includes also HTML::FieldForm, which is a very simple module to
manage sets of HTML::Field objects.
HTML::TableExtract is a module that simplifies the extraction
of information contained in tables within HTML documents.
Tables of note may be specified using Headers, Depth, Count,
or some combination of the three. See the module documentation
for details.