humanzip is a compression program that operates on text files. Unlike
most compression algorithms, its output is human readable. Indeed, it
is explictly meant to be read by humans and might even be easier to read
than the original.
humanzip compresses files by looking for common strings of words and
replacing them with single symbols. The idea is to reduce the screen and
print size of documents. Humanzip does not explictly try to reduce the
size of the file as measured in bytes, although this usually happens
incidentally.
Full-text search system. You can search lots of documents for some documents
including specified words. If you run a web site, it is useful as your own
search engine for pages in your site. Also, it is useful as search utilities
of mail boxes and file servers.
The characteristic of Hyper Estraier is the following.
* High performance of search
* High scalability of target documents
* Perfect recall ratio by N-gram method
* Phrase search, attribute search, and similarity search
* Multilingualism with Unicode
* Independent of file format and repository
* Simple and powerful API
* Supporting P2P architecture
El-Kabong is a high-speed, forgiving, sax-style HTML parser.
Its aim is to provide consumers with a very fast, clean,
lightweight library which parses HTML quickly, while forgiving
syntactically incorrect tags.
info2man converts GNU info files to pod or -man formats.
GNU info can be a pain as it demands its own special pager, it's a binary
format, it's cruder than HTML and less documented, and most GNU- authored
manual entries basically say "we like info so we don't maintain this manual
entry, thus it is probably wrong". info2man thus converts info files so that
they can be read by ordinary tools.
The Digester package lets you configure an XML -> Java object mapping module,
which triggers certain actions called rules whenever a particular pattern of
nested XML elements is recognized. A rich set of predefined rules is available
for your use, or you can also create your own. Advanced features of Digester
include:
- Ability to plug in your own pattern matching engine, if the standard one is
not sufficient for your requirements.
- Optional namespace-aware processing, so that you can define rules that are
relevant only to a particular XML namespace.
- Encapsulation of Rules into RuleSets that can be easily and conveniently
reused in more than one application that requires the same type of
processing
Jaxup defines an interface to update XML documents, through which clients can
work without knowledge of the exact object model that the document uses. The
interface is called Updater, and the idea behind it is the same as with Jaxen's
Navigator interface. In addition, an implementation of xmldb.org's proposed
XUpdate specification is provided. The implementation is in the XUpdate class.
Implementations of the Updater interface are provided for the following object
models:
- DOM
- Dom4j
- JDom
The JRefEntry DTD is a customization of the DocBook RefEntry
model. The purpose of this customization is to mirror the order and
nature of structured comment tags in JavaDoc documentation.
jshon 解析器,读取和创建 JSON。
它被设计为在 shell 内使用,
比基于 grep/sed/awk 的解析健壮,
比基于一行 perl/python 代码的解析更轻。
filepp is a generic file preprocessor designed to allow the
functionality provided by the C preprocessor cpp(1) to be used with
any file type. filepp is designed to be easily customised and
extended.
This program converts line endings of text files between MS-DOS and **IX
formats. It detects binary files in a nearly foolproof way and leaves them
alone unless you override this. It will also leave files alone that are already
in the right format and preserves file timestamps. User interrupts are handled
gracefully and no garbage or corrupted files left behind. 'flip' does not
convert files to a different character set, and it can not handle Apple
Macintosh line endings (CR only). For that (and more), you can use the 'recode'
program (package 'recode').