You have two databases of person records that need to be synchronized
or matched up, but they use different keys--maybe one uses SSN and
the other uses employee id. The only fields you have to match on
are first and last name.
That's what this module is for.
Just feed the first and last names to the name_eq() function, and
it returns undef for no possible match, and a percentage of certainty
(rank) otherwise.
Seamus Venasse <svenasse@polaris.ca>
This class knows how to read two treebank formats, the Penn format
and the Chomsky Normal Form (CNF) format. These formats differ in
how they handle terminal nodes. The Penn format places pre-terminal
part of speech tags in the left-hand position of a
parenthesis-delimited pair, just like it does non-terminal nodes.
The CNF format attaches pre-terminal tags to the word with an
underscore.
The POI project consists of APIs for manipulating various file formats based
upon Microsoft's OLE 2 Compound Document format using pure Java. In short, you
can read and write MS Excel files using Java. Soon, you'll be able to read and
write Word files using Java. POI is your Java Excel solution as well as your
Word Excel solution. However, we have a complete API for porting other OLE 2
Compound Document formats and welcome others to participate.
Artha is a free cross-platform English thesaurus that works completely
off-line and is based on WordNet. Stable releases for download are
currently available for GNU/Linux and Microsoft Windows; it is tested
on major Desktop Environments like GNOME, KDE, Xfce, etc and on Microsoft
Windows XP, Vista and 7. Artha is released under the GNU General Public
Licence version 2; hence you are free to copy/redistribute it.
This provides a simple interface to Plucene. Plucene is large and multi-
featured, and it expected that users will subclass it, and tie all the
pieces together to suit their own needs. Plucene::Simple is, therefore,
just one way to use Plucene. It's not expected that it will do exactly
what *you* want, but you can always use it as an example of how to
build your own interface.
Set of modules:
* Pod::Parser - base class for creating POD filters and translators
* Pod::Select - extract selected sections of POD from input
* Pod::Usage - print a usage message from embedded pod documentation
* Pod::PlainText - convert POD data to formatted ASCII text
* Pod::InputObjects - objects representing POD input paragraphs, commands, etc.
* Pod::Checker - check pod documents for syntax errors
* Pod::ParseUtils - helpers for POD parsing and conversion
* Pod::Find - find POD documents in directory trees
Regexp::Copy allows you to copy the contents of one Regexp object to another.
A problem that I have found with the qr// operator is that the Regexp objects
that it creates are is impossible to dereference.
This causes problems if you want to change the data in the regexp without
losing the reference to it. Its impossible.
Regexp::Copy allows you to change the Regexp by copying one object created
through qr// to another.
KDiff3 is a program that:
* compares or merges two or three text input files or directories,
* shows the differences line by line and character by character (!),
* provides an automatic merge-facility and
* an integrated editor for comfortable solving of merge-conflicts,
* supports KIO on KDE (allows accessing ftp, sftp, fish, smb etc.),
* Printing of differences,
* Manual alignment of lines,
* Automatic merging of version control history (cvs Log keyword),
* and has an intuitive graphical user interface.
String::Strip is an XS extension that implements four white space
removal routines: StripSpace (remove all white space), StripLSpace
(strip leading white space), StripTSpace (strip trailing white space),
and StripLTSpace (strip leading and trailing white space). All four of
these routines work directly on the input argument, rather than passing
back a result. The routines tend to be roughly 30% faster than
equivalent function regex code.
-Anton
<tobez@FreeBSD.org>
Text::Capitalize provides a few different flavors of procedures for
title-like formatting for strings.
For the "capitalize" function Title-like (written by Stanislaw Y.
Pusep) formatting consists of ensuring that the first letter of
each word is uppercase, and that the rest is lowercase.
The "capitalize_title" function tries to get closer to English title
capitalization rules where only the "important" words are supposed
to be capitalized. There are also some customization features
provided to allow the user to choose variant rules.