Text-Ngram
n-Gram analysis is a field in textual analysis which uses sliding
window character sequences in order to aid topic analysis, language
determination and so on. The n-gram spectrum of a document can be
used to compare and filter documents in multiple languages, prepare
word prediction networks, and perform spelling correction.
This module provides an efficient XS-based implementation of n-gram
spectrum analysis.
This module is a thin wrapper for John Gruber's SmartyPants plugin for
various CMSs.
SmartyPants is a web publishing utility that translates plain ASCII
punctuation characters into "smart" typographic punctuation HTML
entities. SmartyPants can perform the following transformations:
* Straight quotes ( " and ' ) into "curly" quote HTML entities
* Backticks-style quotes (``like this'') into "curly" quote HTML entities
* Dashes (-- and ---) into en- and em-dash entities
* Three consecutive dots (...) into an ellipsis entity
RTF::Tokenizer is an object-orientated low-level RTF reader. If
you're looking to render RTF, or want a higher-level RTF processor,
this is not the module for you - you want RTF::Reader. This is the
sixth release of RTF::Tokenizer - it's faster, higher quality, and
implements the RTF standard better than any previous release.
It's also philosophically a better module, and conforms more
strictly to Object Orientated guidelines - it can be sub-classed
and the interface is cleaner.
dbacl is a digramic Bayesian text classifier. Given some text,
it calculates the posterior probabilities that the input resembles
one of any number of previously learned document collections.
It can be used to sort incoming email into arbitrary categories
such as spam, work, and play, or simply to distinguish an English text
from a French text. It fully supports international character sets,
and uses sophisticated statistical models based on the
Maximum Entropy Principle.
This module is a variation on the lovely Text::Diff module. Rather
than generating traditional line-oriented diffs, however, it generates
word-oriented diffs. This can be useful for tracking changes in
narrative documents or documents with very long lines. To diff
source code, one is still best off using Text::Diff. But if you
want to see how a short story changed from one version to the next,
this module will do the job very nicely.
XML::LibXML::Cache is a cache for XML::LibXML documents loaded from files. It is
useful to speed up loading of XML files in persistent web applications.
This module caches the document object after the first load and returns the
cached version on subsequent loads. Documents are reloaded whenever the document
file changes. Changes to other files referenced during parsing also cause a
reload. This includes external DTDs, external entities or XIncludes.
XML::Mini is a set of Perl classes that allow you to access XML data and
create valid XML output with a tree-based hierarchy of elements.
XML::Mini does not require any external libraries or modules and is pure Perl.
If available, XML::Mini will use the Text::Balanced module in order to escape
limitations of the regex-only approach (eg "cross-nested" tag parsing).
The goals of this project are simple:
Create a highly configurable, easily modifiable source code beautifier.
What it does:
* Ident code, aligning on parens, assignments, etc
* Align on '=' and variable definitions
* Align structure initializers
* Align #define stuff
* Align backslash-newline stuff
* Reformat comments (a little bit)
* Fix inter-character spacing
* Add or remove parens on return statements
* Add or remove braces on single-statement if/do/while/for statements
* Supports embedded SQL 'EXEC SQL' stuff
* Highly configurable - 168 configurable options as of version 0.30
Zorba is a general purpose XQuery processor implementing in C++ the W3C family
of specifications. The query processor has been designed to be embeddable in a
variety of environments such as other programming languages extended with XML
processing capabilities, browsers, database servers, XML message dispatchers,
or smartphones. Zorba can be accessed through APIs from C, C++, Ruby, Python,
Java, and PHP. Zorba runs on most platforms and is available under the Apache
license v2.
Perl2html is a syntax highlighter for Perl source code that produces a
highlighted html file as output.
Perl2html offers the following features:
- fast (single pass conversion using flex)
- doesn't change formatting - only adds <FONT COLOR=#XXXX> tags
and properly escapes non-ascii characters
- easy integration with webservers - browse your sources colourized
- gzips http output for browsers to save bandwidth (only in CGI mode)
- documentation and manpage included