MARC-XML is an extension to the MARC-Record distribution for working with
XML data encoded using the MARC21slim XML schema from the Library of Congress.
For more details see: http://www.loc.gov/standards/marcxml/
The HTML::Fraction encodes fractions as HTML entities. Some very common
fractions have HTML entities (eg 1/2 is ½). Additionally, common
vulgar fractions have Unicode characters (eg 1/5 is ⅕). This
module takes a string and encodes fractions as entities: this means that
it will look pretty in the browser.
Par is similar but superiour to the fmt(1) command included in the
base system.
Par is a filter that copies its input to its output, changing all
white characters (except newlines) to spaces, and reformatting
each paragraph. Paragraphs are separated by protected, blank, and
bodiless lines (see the Terminology section for definitions), and
optionally delimited by indentation (see the d option in the Options
section).
Each output paragraph is generated from the corresponding input
paragraph as follows:
1) An optional prefix and/or suffix is removed from each input line.
2) The remainder is divided into words (separated by spaces).
3) The words are joined into lines to make an eye-pleasing paragraph.
4) The prefixes and suffixes are reattached.
If there are suffixes, spaces are inserted before them so that they
all end in the same column.
HTML::HTML5::Entities is a pure Perl, drop-in replacement for HTML::Entities,
providing the character entities defined in HTML5.
Lingua::EN::PluralToSingular converts words denoting a plural in the English
language into words denoting a singular noun.
The Lingua::EN::Sentence module contains the function get_sentences,
which splits text into its constituent sentences, based on a regular
expression and a list of abbreviations (built in and given).
Seamus Venasse <svenasse@polaris.ca>
Squeeze English text to most compact format possibly so that it is
barely readable. You should convert all text to lowercase for maximum
compression, because optimizations have been designed mostly for
uncapitalised letters.
Seamus Venasse <svenasse@polaris.ca>
This is a simple module which makes an unscientific effort at
summarizing English text. It recognizes simple patterns which look
like statements, abridges them, and concatenates them into something
vaguely resembling a summary. It needs more work on large bodies
of text, but it seems to have a decent effect on small inputs at
the moment.
Seamus Venasse <svenasse@polaris.ca>
The module is a probability based, corpus-trained tagger that assigns
POS tags to English text based on a lookup dictionary and probability
values. The tagger determines appropriate tags based on conditional
probabilities - it looks at the preceding tag to figure out what the
appropriate tag is for the current word. Unknown words will be classified
according to word morphology or can be set to be treated as nouns or
other parts of speech.
The tagger also recursively extracts as many nouns and noun phrases as
it can, using a set of regular expressions.
Used in its basic form, this module provides an interface for generating
basic HTML form elements much like HTML::StickyForms does. The main
difference is HTML::SuperForm returns HTML::SuperForm::Field objects
rather than plain HTML. This allows for more flexibilty when generating
forms for a complex application.
To get the most out of this module, use it as a base (Super) class for
your own form object which generates your own custom fields. If you
don't use it this way, I guess there's really nothing Super about it.
Example are shown later in the document.
The interface was designed with mod_perl and the Template Toolkit in
mind, but it works equally well in any cgi environment.