ArCHMage is the extensible reader/decompiler of files in CHM format(Microsoft
HTML help, also known as Compiled HTML). ArCHMage is based on chmlib by Jed
Wing and is written on python.
AsmXml is a very fast XML parser and decoder for x86 platforms. It
achieves high speed by using the following features:
* Support of an XML subset only
* Written in pure assembler
* Optimized memory accesses
* Parsing and decoding at the same time
This parser is intended for applications that need intensive processing
of XML. This project will likely appeal you if XML parsing is a
bottleneck in your data-flow. It is expecially designed for bulk loads
into databases.
This is not an all-purpose library, it is not designed to be used with
DOM, SAX, XPath and so on. Here, XML is just considered as an
interchange format, not as a working format.
This is a Perl script that extracts URLs from correctly-encoded MIME
email messages or plain text. This can be used either as a
pre-parser for urlview, or to replace urlview entirely.
This is designed primarily for use with the mutt emailer. The idea
is that if you want to access a URL in an email, you pipe the email
to a URL extractor (like this one) which then lets you select a URL
to view in some third program (such as Firefox). An alternative
design is to access URLs from within mutt's pager by defining macros
and tagging the URLs in the display to indicate which macro to use.
A script you can use to do that is tagurl.pl.
Main features:
- Configurable
- Handles URLs that have been broken over several lines in
format=flowed delsp=yes email messages
- Handles quoted-printable email messages
- Sanitizes URLs so that they can't break out of the command shell
This script splits up a unified diff into separate patch files,
each of which patches one source file.
Dixit is a cross-platform application for consulting off-line a Romanian
definitions dictionary (DEX). It features a browser-like interface,
with cross-references between definitions, and the capability of adding new
definitions from a server.
The distributed database is currently based on 14 dictionaries plus 5 DEX
editions. The database contains more that 235.000 definitions from various
sources. The most important "Dictionarul explicativ al limbii romane" (DEX 1998)
All 65.542 definitions from DEX have been introduced in the database by
"DEX online" Project (dexonline.ro) volunteers.
The following dictionaries are also complete, with help from Siveco and Editura
Litera International: "Dictionar de sinonime" (2002), "Dictionar de antonime"
(2002), "Dictionar ortografic al limbii romane" (2002), "Noul dictionar
explicativ al limbii romane" (2002).
Any resemblance to dict is not entirely coincidental, but the database doesn't
have the same format :(
A batch converter that transforms UNIX-style manpages from the
DocBook SGML DTD into nroff/troff -man macros.
`chpp' is a preprocessor. Therefore, its main purpose is to modify
input text by including other input files and by macro expansion.
What distinguishes `chpp' from other textprocessors are mainly two
features:
* `chpp' is non-intrusive. This means that you can take your
favorite text and it is very unlikely that it will be changed when
piped through `chpp'. Due to this feature it is pretty easy to
start using `chpp' since you can just start writing your text and
need not concern yourself with `chpp' sitting in the background
changing it for no obvious reason.
* `chpp' is not just a package for performing simple macro expansion,
but can indeed be considered a full-fledged programming language.
Most importantly, it provides support for complex data structures,
namely lists and hashes (associative arrays), which can be nested
arbitrarily.
This version of Crimson supports the Java API for XML Processing
(JAXP) version 1.1 specification by providing implementations for the
following package hierarchies: javax.xml.parsers, org.w3c.dom,
org.xml.sax.*. Note that the javax.xml.transform hierarchy is NOT
supported. One known implementation of the javax.xml.transform
hierarchy is Xalan 2, which is available as a separate port in
java/xalan.
More info about JAXP:
Crimson home page:
dbacl is a digramic Bayesian text classifier. Given some text,
it calculates the posterior probabilities that the input resembles
one of any number of previously learned document collections.
It can be used to sort incoming email into arbitrary categories
such as spam, work, and play, or simply to distinguish an English text
from a French text. It fully supports international character sets,
and uses sophisticated statistical models based on the
Maximum Entropy Principle.
Dblatex started as a DB2LaTeX clone. So, why this project? The purpose
is a bit different on these points:
(1) The project is end-user oriented, that is, it tries to hide as much
as possible the latex compiling stuff by providing a single clean
script to produce directly DVI, PostScript and PDF output.
(2) The actual output rendering is done not only by the XSL stylesheets
transformation, but also by a dedicated LaTeX package. The purpose is
to allow a deep LaTeX customisation without changing the XSL
stylesheets.
(3) Post-processing is done by Python, to make publication faster,
convert the images if needed, and do the whole compilation.