GNU mifluz has two main characteristics : it is very
simple (one might say stupid :-) and uses 50% of the size of the
indexed text for the index. It is simple because it provides only
a few basic functionalities. It does not contain document parsers
(HTML, PDF etc...). It does not contain a full text query parser.
It does not provide result display functions or other user friendly
stuff. It only provides functions to store word occurrences and retrieve
them. The fact that it uses 50% of the size of the indexed text is
rather atypical. Most well known full text indexing systems only use
30%. The advantage GNU mifluz has over most full text indexing systems
is that it is fully dynamic (update, delete, insert), uses only a
controlled amount of memory while resolving a query, has higher upper
limits and has a simple storage scheme. Consuming more disk space
allows all this.
Miller is like sed, awk, cut, join, and sort for name-indexed data such
as CSV.
With Miller you get to use named fields without needing to count
positional indices.
A maintainance utility for sgml catalog files.
mkcatalog maintains sgml "catalog" files.
usage: mkcatalog [-pq] install|deinstall dtd-subdirectory [catalog-filename]
options:
-p preserve old catalog file.
-q silent mode
commands(required):
install set DTD configuration to catalog files.
deinstall usset DTD configuration from catalog files.
required arguments:
dtd-subdirectory DTD sub-directory.
(root sgml direcotry is ${PREFIX}/share/sgml.)
optional arguments:
catalog-filename DTD catalog filename.
for example:
# mkcatalog install html/4.0
This commands do the following actions:
1. Add `CATALOG "html/catalog"' to ${PREFIX}/share/sgml/catalog.
2. Add `CATALOG "4.0/catalog"'
to ${PREFIX}/share/sgml/html/catalog.
# mkcatalog install docbook/4.1 docbook41.cat
This commands do the following actions:
1. Add `CATALOG "docbook/catalog"' to ${PREFIX}/share/sgml/catalog.
2. Add `CATALOG "4.1/docbook41.cat"'
to ${PREFIX}/share/sgml/docbook/catalog.
# mkcatalog deinstall docbook/4.1 docbook41.cat
This commands do the following actions:
1. Delete `CATALOG "4.1/docbook41.cat"'
from ${PREFIX}/share/sgml/docbook/catalog.
2. Delete `CATALOG "docbook/catalog"'
from ${PREFIX}/share/sgml/catalog.
Modlogan is for all those who want a log-analyzer that is easy
to extend and very flexible. Just write a new output plugin and
create your very special report-design. You don't have to
reinvent the whole wheel to change the colour of the tire.
Imagine a SQL-output-plugin that writes the calculated data
into your database. Or a Memo generator that posts the
monthly stats to your department mailing list for further
investigation.
Msort sorts files in sophisticated ways. Records may be fixed size,
newline-separated blocks, or terminated by any specified character.
Key fields may be selected by position, tag, or character range. For
each key, distinct exclusions, multigraphs, substitutions, and a sort
order may be defined or locale collation rules used. Comparisons may
be lexicographic, numeric, numeric string, hybrid, random, by string
length, angle, date, time, month name, or ISO8601 timestamp. Keys may
be reversed so as to generate reverse dictionaries. Optional keys are
supported. Unicode is supported, including full case-folding. Msort
itself has a somewhat complex command line interface, but may be
driven by an optional GUI.
MultiMarkdown, or MMD, is a tool to help turn minimally marked-up plain
text into well formatted documents, including HTML, PDF (by way of
LaTeX), OPML, or OpenDocument (specifically, Flat OpenDocument or
'.fodt', which can in turn be converted into RTF, Microsoft Word, or
virtually any other word-processing format).
MMD is a superset of the Markdown syntax, originally created by John
Gruber. It adds multiple syntax features (tables, footnotes, and
citations, to name a few), in addition to the various output formats
listed above (Markdown only creates HTML). Additionally, it builds in
'smart' typography for various languages (proper left- and right-sided
quotes, for example).
NOTE: To use the mmd2pdf script, you must install print/latexmk.
MyThes is a simple thesaurus that uses a structured text data file and an index
file with binary search to lookup words and phrases and return information on
part of speech, meanings, and synonyms
This class provides methods to validate:
- ISBN (International Standard Book Number)
- ISSN (International Standard Serial Number)
- ISMN (International Standard Music Number)
- ISRC (International Standard Recording Code)
- EAN/UCC-8 number
- EAN/UCC-13 number
- EAN/UCC-14 number
- UCC-12 (U.P.C.) ID number
- SSCC (Serial Shipping Container Code)
Nux is a small, straightforward, and surprisingly effective open-source
extension of the XOM XML library. Nux is geared towards versatile embedded
integration and interchange, in particular for high-throughput server container
environments (e.g. large-scale Peer-to-Peer messaging network infrastructures
over high-bandwidth networks, scalable MOMs, etc). But its simplicity also
makes it useful for client side XML query/transformation workflow pipelines.
Features include:
- Seamless W3C XQuery support for XOM.
- Efficient and flexible pools and factories for XQueries, XSL Transforms, as
well as Builders that validate against various schema languages, including
W3C XML Schemas, DTDs, RELAX NG, Schematron, etc.
- For simple and complex continuous queries and/or transformations over very
large or infinitely long XML input, a convenient streaming path filter API
combines full XQuery support with straightforward filtering.
- Glue for integration with JAXB and for queries over ill-formed HTML.
- All this is rock-solid, dependable, well documented, and ships in a jar file
that weighs just 60 KB.