"Named entities" is the NLP jargon for proper nouns which
represent people, places, organisations, and so on.
This module provides a very simple way of extracting these from a text.
If we run the "extract_entities" routine on a piece of news coverage of
recent UK political events, we should expect to see it return a list of
hash references looking like this:
{ entity => 'Mr Howard', class => 'person', scores => { ... }, },
{ entity => 'Ministry of Defence', class => 'organisation', ... },
{ entity => 'Oxfordshire', class => 'place', ... },
The additional "scores" hash reference in there breaks down the various
possible classes for this entity in an open-ended scale.
IDNA::Punycode is a module to encode / decode Unicode strings into
Punycode, an efficient encoding of Unicode for use with IDNA.
The purpose of the PPIx-Regexp package is to parse regular expressions
in a manner similar to the way the PPI package parses Perl.
Antiword is a free MS Word reader. It converts the binary files from
Word 2, 6, 7, 97, 2000, 2002 and 2003 to plain text and to PostScript.
The Perl-Critic-Tics distribution includes extra policies for Perl::Critic to
address a fairly random assortment of things that make me (rjbs) wince.
Search::Odeum is an interface to the Odeum API. Odeum is the inverted index API
which is a part of qdbm database library.
This distribution contains SGMLS.pm, a perl5 class library for parsing
the output from James Clark's SGMLS and NSGMLS parsers.
String::Print inserts values into (translated) strings. It provides printf and
sprintf alternatives via both an object oriented and a functional interface.
Sort::Fields provides a general purpose technique for efficiently
sorting lists of lines that contain data separated into fields.
-Anton
<tobez@FreeBSD.org>
This module contains some functions which are useful for quoting strings
which are going to pass through the shell or a shell-like object.
-Anton
<tobez@FreeBSD.org>