This module takes as input a person or persons name in free format
text and attempts to parse it. If successful, the name is broken
down into components and useful functions can be performed.
"Named entities" is the NLP jargon for proper nouns which
represent people, places, organisations, and so on.
This module provides a very simple way of extracting these from a text.
If we run the "extract_entities" routine on a piece of news coverage of
recent UK political events, we should expect to see it return a list of
hash references looking like this:
{ entity => 'Mr Howard', class => 'person', scores => { ... }, },
{ entity => 'Ministry of Defence', class => 'organisation', ... },
{ entity => 'Oxfordshire', class => 'place', ... },
The additional "scores" hash reference in there breaks down the various
possible classes for this entity in an open-ended scale.
This module will tell you if a number, either in words or as digits,
is a cardinal or ordinal number.
This is useful if you e.g. want to distinguish these types of numbers
found with Lingua::EN::FindNumber and take different actions.
Perl module Lingua::EN::Numbers::Easy provides hash access to
Lingua::EN::Numbers objects.
Lingua::EN::Numbers converts arbitrary numbers into human-oriented
English text. Limited support is included for parsing standardly
formatted numbers (i.e. '3,213.23'). But no attempt has been made
to handle any complex formats. Support for multiple variants of
English are supported. Currently only "American" formatting is
supported.
HTML::HTML5::Entities is a pure Perl, drop-in replacement for HTML::Entities,
providing the character entities defined in HTML5.
Lingua::EN::PluralToSingular converts words denoting a plural in the English
language into words denoting a singular noun.
The Lingua::EN::Sentence module contains the function get_sentences,
which splits text into its constituent sentences, based on a regular
expression and a list of abbreviations (built in and given).
Seamus Venasse <svenasse@polaris.ca>
Squeeze English text to most compact format possibly so that it is
barely readable. You should convert all text to lowercase for maximum
compression, because optimizations have been designed mostly for
uncapitalised letters.
Seamus Venasse <svenasse@polaris.ca>
This is a simple module which makes an unscientific effort at
summarizing English text. It recognizes simple patterns which look
like statements, abridges them, and concatenates them into something
vaguely resembling a summary. It needs more work on large bodies
of text, but it seems to have a decent effect on small inputs at
the moment.
Seamus Venasse <svenasse@polaris.ca>