This module implements a statistical language identifier.
The filename attributes to the constructor must refer to files
containing tables of n-gram probabilites for languages. These tables
can be generated using the trainlid(1) utility program.
Seamus Venasse <svenasse@polaris.ca>
These are Perl bindings to CLD, the Compact Language Detection library
from Google/Chrome.
HTML::Tiny is a simple, dependency free module for generating HTML (and
XML). It concentrates on generating syntactically correct XHTML using a
simple Perl notation.
In addition to the HTML generation functions utility functions are provided
to
* encode and decode URL encoded strings
* entity encode HTML
* build query strings
* JSON encode data structures
Lingua::Identify identifies the language a given string or file is
written in.
Lingua::Ispell.pm - a module encapsulating access to the Ispell program.
ispell, when reporting on misspelled words, indicates the string it was
unable to verify, as well as its starting offset in the input line.
No such information is returned for words which are deemed to be
correctly spelled.
This module provides a way for the user to specify possible languages
in order of preference, and then to pick the best language of those
available. Different 'dialects' given by the 'territory' part of the
language specifier (such as en, en_GB, and en_US) are also supported.
Seamus Venasse <svenasse@polaris.ca>
IDNA::Punycode is a module to encode / decode Unicode strings into
Punycode, an efficient encoding of Unicode for use with IDNA.
Lingua::Stem::Ru applies the Porter Stemming Algorithm to its parameters,
returning the stemmed words.
Lingua::Stem - Stemming of words
This routine applies stemming algorithms to its parameters, returning the
stemmed words as appropriate to the selected locale.
Currently supported locales are:
EN - English (also EN-US and EN-UK)
DA - Danish
DE - German
GL - Galician
IT - Italian
NO - Norwegian
PT - Portuguese
SV - Swedish
Strip whitespace and comments from JavaScript code