HTML::Shakan is yet another form generator.
This is a simple HTML link extractor designed for the person who does
not want to deal with the intricacies of "HTML::Parser" or the de-ref-
erencing needed to get links out of "HTML::LinkExtor".
This module is a simple HTML parser. It is similar in concept to HTML::Parser,
but it differs in a couple of important ways.
HTML::StickyQuery::DoCoMoGUID - add guid in query for DoCoMo imode.
This module is sub class of the HTML::Parser and uses it to
parse HTML document and add QUERY_STRING to href attributes.
You can assign Session ID or any form data without using cookie.
Seamus Venasse <svenasse@polaris.ca>
HTML::Strip
===========
This module strips HTML-like markup from text.
It is written in XS, and thus about five times quicker than using
regular expressions for the same task.
This class provides an easy interface to HTML::StripScripts, using
HTML::Parser to parse the HTML.
See HTML::Parser for details of how to customise how the raw HTML
is parsed into tags, and HTML::StripScripts for details of how to
customise the way those tags are filtered.
This module strips scripting constructs out of HTML, leaving as
much non-scripting markup in place as possible. This allows web
applications to display HTML originating from an untrusted source
without introducing XSS (cross site scripting) vulnerabilities.
You will probably use HTML::StripScripts::Parser rather than using
this module directly.
The process is based on whitelists of tags, attributes and attribute
values. This approach is the most secure against disguised scripting
constructs hidden in malicious HTML documents. As well as removing
scripting constructs, this module ensures that there is a matching
end for each start tag, and that the tags are properly nested.
Previously, in order to customise the output, you needed to subclass
HTML::StripScripts and override methods. Now, most customisation
can be done through the Rules option provided to new(). (See
examples/declaration/ and examples/tags/ for cases where subclassing
is necessary.) The HTML document must be parsed into start tags,
end tags and text before it can be filtered by this module. Use
either HTML::StripScripts::Parser or HTML::StripScripts::Regex
instead if you want to input an unparsed HTML document.
The HTML::Summary module produces summaries from the textual content of
web pages. It does so using the location heuristic, which determines the value
of a given sentence based on its position and status within the document; for
example, headings, section titles and opening paragraph sentences may be
favoured over other textual content. A LENGTH option can be used to restrict
the length of the summary produced.
This distribution contains the HTML::Summary module, and some supporting
modules. The full list of modules is:
HTML::Summary
Text::Sentence
Lingua::JA::Jcode
Lingua::JA::Jtruncate
This module can be used to parse the content of tables in HTML text. The
parser returns an arrayref consisting of data for each table found within the
passed-in text.