Bastardize provides an magical object into which text can be charged
and then returned in various, slighty modified ways.
Among others, bastardize has the following methods:
rdct converts english to hyperreductionist english
(ex. "english" becomes "")
pig pig latin
(ex. "hi there" becomes "ihay erethay")
k3wlt0k a k3wlt0kizer developed originally by Fmh
rot13 implements rot13 "encryption" in perl
(ex. "foo bar" becomes "sbb one")
rev reverses the arrangement of characters
censor attempts to censor text which might be innaproriate
n20e performs numerical abbreviations
(ex. "numerical_abbreviation" becomes "n20e")
Text::German - German grundform reduction
This is a rather incomplete implementaion of work done by Gudrun Putze-Meier
<gudrun.pm@t-online.de>. I have to confess that I never read her original
paper. So all credit belongs to her, all bugs are mine. I tried to get some
insight from an implementation of two students of mine. They remain anonymous
because their work was the wost piece of code I ever saw. My code behaves
mostly as their implementation did except it is about 75 times faster.
Text::WrapI18N intends to be a better Text::Wrap module. This module is needed
to support multibyte character encodings such as UTF-8, EUC-JP, EUC-KR, GB2312,
and Big5. This module also supports characters with irregular widths, such as
combining characters (which occupy zero columns on terminal, like diacritical
marks in UTF-8) and fullwidth characters (which occupy two columns on terminal,
like most of east Asian characters). Also, minimal handling of languages which
doesn't use whitespaces between words (like Chinese and Japanese) is supported.
Like Text::Wrap, hyphenation and "kinsoku" processing are not supported, to keep
simplicity.
XML::DOM::Lite is designed to be a reasonably fast, highly portable,
XML parser kit written in pure perl, implementing the DOM standard
quite closely. To keep performance up and footprint down.
The standard pattern for using the XML::DOM::Lite parser kit is to use
XML::DOM::Lite qw(Parser :constants);
Available exports are : Parser, Node, NodeList, NodeIterator,
NodeFilter, XPath, Document, XSLT and the constants.
This is mostly for convenience, so that you can save your key-strokes
for the fun stuff. Alternatively, to avoid polluting your namespace,
you can simply : use XML::DOM::Lite::Parser; use
XML::DOM::Lite::Constants qw(:all); # ... etc
XML::DOM2 is yet _another_ perl XML module.
* DOM Level2 Compilence in both document, elements and attributes
* NameSpace control for elements and attributes
* XPath (it's just one small method once you have a good DOM)
* Extendability:
* Document, Element or Attribute classes can be used as base class
for other kinds of document, element or attribute.
* Element and Attribute Handler allows element specific child
elements and attribute objects.
* Element and Attribute serialisation overiding.
* Parsing with SAX (use XML::SAX::PurePerl for low dependancy installs)
* Internal serialisation
This experimental module is designed to allow for easy creation and
manipulation of OPML files. OPML files are most commonly used for the sharing
of blogrolls or subscriptions - an outlined list of what other blogs an
Internet blogger reads.
This is purely experimental at this point and has a few limitations. This
module may now support attributes in the <outline> element of an embedded
hierarchy, but these are limited to the following attributes: date_added,
date_downloaded, description, email, filename, htmlurl, keywords, text,
title, type, version, and xmlurl. Additionally, the following alternate
spellings are also supported: dateAdded, dateDownloaded, htmlUrl, and xmlUrl.
Popup is an interactive learning aid for pairs of words. It behaves much like
a stack of flashcards, but handles one-to-many and many-to-one word
relationships better, and includes an integrated scheduler for efficient use
of your 'cards'. Popup was written by Bjorn Ghola and Rob Burns.
Features:
* An editor for cardstack files with support for copying and pasting groups
of words, as well as drag and drop.
* Three quiz styles: multiple choice, spelling, and flashcard.
* Supports quizes and practice
* Graduated time interval scheduler.
* Localized for Thai and German.
LICENSE: GPL2 or later
PyTidyLib is a Python package that wraps the HTML Tidy library. This allows
you, from Python code, to "fix" invalid (X)HTML markup. Some of the library's
many capabilities include:
* Clean up unclosed tags and unescaped characters such as ampersands
* Output HTML 4 or XHTML, strict or transitional, and add missing doctypes
* Convert named entities to numeric entities, which can then be used in XML
documents without an HTML doctype.
* Clean up HTML from programs such as Word (to an extent)
* Indent the output, including proper (i.e. no) indenting for pre elements,
which some (X)HTML indenting code overlooks.
Whoosh is a fast, featureful full-text indexing and searching library
implemented in pure Python. Programmers can use it to easily add search
functionality to their applications and websites. Every part of how Whoosh
works can be extended or replaced to meet your needs exactly.
Some of Whoosh's features include:
- Pythonic API.
- Pure-Python. No compilation or binary packages needed, no mysterious
crashes.
- Fielded indexing and search.
- Fast indexing and retrieval -- faster than any other pure-Python, scoring,
full-text search solution I know of.
- Pluggable scoring algorithm (including BM25F), text analysis, storage,
posting format, etc.
- Powerful query language.
- Pure Python spell-checker (as far as I know, the only one).
Tex2im is a simple tool that converts LaTeX formulas into high resolution
pixmap graphics for inclusion in text processors or presentations. I
encountered the problem that the formulas generated by the editors of common
office packages usually were the ugliest part of my scientific presentations;
on the other hand I didn't want to use latex for my transparencies. On the
latex side I'm aware of the slitex and foiltex packages, nevertheless I
consider them to be masochistic. EPS import can be nice, but commonly you get
either display or printing problems. Also, often its nice just to copy
formulas out of you latex documents.