The module is a probability based, corpus-trained tagger that assigns
POS tags to English text based on a lookup dictionary and probability
values. The tagger determines appropriate tags based on conditional
probabilities - it looks at the preceding tag to figure out what the
appropriate tag is for the current word. Unknown words will be classified
according to word morphology or can be set to be treated as nouns or
other parts of speech.
The tagger also recursively extracts as many nouns and noun phrases as
it can, using a set of regular expressions.
These are Perl bindings to CLD, the Compact Language Detection library
from Google/Chrome.
Lingua::PT::Stemmer - Stemmers for Portuguese and Galician. While these stemmers
can be used stand alone, they are typically used as back ends to the general
stemmer front end provided by textproc/p5-Lingua-Stem.
Lingua::Stem::Ru applies the Porter Stemming Algorithm to its parameters,
returning the stemmed words.
Small module for inflecting pronouns for a bunch of different
genders.
Seamus Venasse <svenasse@polaris.ca>
Determine the infinitive form of a conjugated word. Also,
determine the suffix used to identify which rule to apply to
transform the conjugated word into the infinitive form.
Seamus Venasse <svenasse@polaris.ca>
This module provides an easy-to-use interface for encoding and decoding
Internationalized Domain Names (IDNs).
IDNs use characters drawn from a large repertoire (Unicode), but IDNA
allows the non-ASCII characters to be represented using only the ASCII
characters already allowed in so-called host names today (letter-digit-
hypen, "/[A-Z0-9-]/i").
The exportable subroutines of Lingua::EN::Inflect provide plural
inflections and "a"/"an" selection for English words.
Plural forms of all nouns, most verbs, and some adjectives are
provided. Where appropriate, "classical" variants (for example:
"brother" -> "brethren", "dogma" -> "dogmata", etc.) are also
provided.
Seamus Venasse <svenasse@polaris.ca>
This is a module for finding IP addresses in plain text.
NetAddr::IP::Find exports one function, find_ipaddrs(). It
works very similar to URI::Find's find_uris() or
Email::Find's find_emails().
$num_ipaddrs_found = find_ipaddrs($text, \&callback);
Forenames and surnames are often stored either wholly in UPPERCASE
or wholly in lowercase. This module allows you to convert names
into the correct case where possible.
Although forenames and surnames are normally stored separately if
they do appear in a single string, whitespace separated, NameCase
and nc deal correctly with them.