This package provides parsing and rendering functions for XML. It is
based on the datatypes found in the xml-types package. This package is
broken up into the following modules:
* Text.XML: DOM-based parsing and rendering. This is the most commonly
used module.
* Text.XML.Cursor: A wrapper around Text.XML which allows bidirectional
traversing of the DOM, similar to XPath.
* Text.XML.Unresolved: A slight modification to Text.XML which does not
require all entities to be resolved at parsing. The datatypes are
slightly more complicated here, and therefore this module is only
recommended when you need to deal directly with raw entities.
* Text.XML.Stream.Parse: Streaming parser, including some streaming
parser combinators.
* Text.XML.Stream.Render: Streaming renderer.
Libtre is an attempt to create a lightweight, robust, and efficient fully
POSIX compliant regexp matching library. There is still some work left, but
the results so far are promising.
At the core of Libtre is a new algorithm for regular expression matching with
submatch addressing. The algorithm uses linear worst-case time in the length
of the text being searched, and quadratic worst-case time in the length of the
used regular expression. In other words, the time complexity of the algorithm
is O(M2N), where M is the length of the regular expression and N is the length
of the text. The used space is also quadratic on the length of the regex, but
does not depend on the searched string. This quadratic behaviour occurs only
on pathological cases which are probably very rare in practice.
This is an implementation of Rabin and Karp's streaming hash, as described
in "Winnowing: Local Algorithms for Document Fingerprinting" by Schleimer,
Wilkerson, and Aiken. Following the suggestion of Schleimer, I am using
their second equation:
$H[ $c[2..$k + 1] ] = (( $H[ $c[1..$k] ] - $c[1] ** $k ) + $c[$k+1] ) * $k
The results of this hash encodes information about the next k values in
the stream (hense k-gram.) This means for any given stream of length n
integer values (or characters), you will get back n - k + 1 hash values.
For best results, you will want to create a code generator that filters
your data to remove all unnecessary information. For example, in a large
english document, you should probably remove all white space, as well as
removing all capitalization.
Usage: cdif [-Bvns] [-A #] [-C #] [-D #] [-I #] [-e #] [-[bwcu]] file1 file2
cdif [-rcs] [-q] [-rrev1 [-rrev2]] [$myname options] file
cdif [$myname options] [diff-output-file]
Options:
-B byte compare
-v use video standout (default for tty)
-n use nroff style overstrike (default for non-tty)
-b ignore trailing blank
-w ignore whitespace
-c[#] context diff
-u[#] unified diff (if diff has -u option)
-e # expression of `word' (default is '\w+')
-s show statistical information at the end
-A, -C, -D (Append, Change, Delete) takes one of
vso: video standout
vul: video underline
vbd: video bold
bd: nroff style overstrike
ul: nroff style underline
or any sequence or sequences separated by comma
-I specify string to be shown on insertion point
Following strings have special meanings.
vbar: print vertical bar at the point
caret: print caret under the point
-diff=command
specify any diff command
When working with text it is convenient and common to want to truncate
strings to make them fit a desired context. E.g., you might have a menu
that is only 100px wide and prefer text doesn't wrap so you'd truncate
it around 15-30 characters, depending on preference and typeface size.
This is trivial with plain text and substr but with HTML it is somewhat
difficult because whitespace has fluid significance and open tags that
are not properly closed destroy well-formedness and can wreck an entire
layout.
HTML::Truncate attempts to account for those two problems by padding
truncation for spacing and entities and closing any tags that remain
open at the point of truncation.
Used in its basic form, this module provides an interface for generating
basic HTML form elements much like HTML::StickyForms does. The main
difference is HTML::SuperForm returns HTML::SuperForm::Field objects
rather than plain HTML. This allows for more flexibilty when generating
forms for a complex application.
To get the most out of this module, use it as a base (Super) class for
your own form object which generates your own custom fields. If you
don't use it this way, I guess there's really nothing Super about it.
Example are shown later in the document.
The interface was designed with mod_perl and the Template Toolkit in
mind, but it works equally well in any cgi environment.
This is yet another library for template-based text generation.
Template-based text generation is a way to separate program code and
data, so non-programmer can control final result (like HTML) as desired
without tweaking the program code itself. By doing so, jobs like website
maintenance is much easier because you can leave program code unchanged
even if page redesign was needed.
The idea is simple. Whenever a block of text surrounded by '<%' and '%>'
(or any pair of delimiters you specify) is found, it will be taken as
Perl expression, and will be replaced by its evaluated result.
Major goal of this library is simplicity and speed. While there're many
modules for template processing, this module has near raw Perl-code
(i.e., "s|xxx|xxx|ge") speed, while providing simple-to-use objective
interface.
With Numbers_Words class you can convert numbers written in arabic digits to
words in several languages. You can convert an integer between -infinity and
infinity. If your system does not support such long numbers you can
call Numbers_Words::toWords() with just a string.
With the Numbers_Words::toCurrency($num, $locale, 'USD') method you can convert
a number (decimal and fraction part) to words with currency name.
The following languages are supported:
* bg (Bulgarian)
* cs (Czech)
* de (German)
* dk (Danish)
* en_100 (Donald Knuth system, English)
* en_GB (British English)
* en_US (American English)
* es (Spanish Castellano)
* es_AR (Argentinian Spanish)
* et (Estonian)
* fr (French)
* fr_BE (French Belgium)
* he (Hebrew)
* hu_HU (Hungarian)
* id (Indonesian)
* it_IT (Italian)
* lt (Lithuanian)
* nl (Dutch)
* pl (Polish)
* pt_BR (Brazilian Portuguese)
* ru (Russian)
* sv (Swedish)
Libtre is an attempt to create a lightweight, robust, and efficient fully
POSIX compliant regexp matching library. There is still some work left, but
the results so far are promising.
At the core of Libtre is a new algorithm for regular expression matching with
submatch addressing. The algorithm uses linear worst-case time in the length
of the text being searched, and quadratic worst-case time in the length of the
used regular expression. In other words, the time complexity of the algorithm
is O(M2N), where M is the length of the regular expression and N is the length
of the text. The used space is also quadratic on the length of the regex, but
does not depend on the searched string. This quadratic behaviour occurs only
on pathological cases which are probably very rare in practice.
sgrep (structured grep) is a tool for searching and indexing text, SGML,XML
and HTML files and filtering text streams using structural criteria. The data
model of sgrep is based on regions, which are nonempty substrings of text.
Regions are typically occurrences of constant strings, SGML-tags, or meaningful
text elements, which are recognizable through some delimiting strings or the
builtin SGML, XML and HTML parser. Regions can be arbitrarily long, arbitrarily
overlapping, and arbitrarily nested.
Sgrep is a convenient tool for making queries to almost any kind of text files
with some well kown structure. These include programs, mail folders, news
folders, HTML, SGML, etc... With relatively simple queries you can display mail
messages by their subject or sender, extract titles or links or any regions
from HTML files, function prototypes from C or make complex queries to SGML
files based on the DTD of the file.