The project aims to create a feature-rich dictionary lookup program.
It supports:
* Babylon .BGL files, complete with images and resources;
* StarDict .ifo/.dict./.idx/.syn dictionaries;
* Dictd .index/.dict(.dz) dictionary files;
* ABBYY Lingvo .dsl source files, together with abbreviations.
The files can be optionally compressed with dictzip. Dictionary
resources can be packed together into a .zip file;
* ABBYY Lingvo .lsa/.dat audio archives. Those can be indexed
separately, or be referred to from .dsl files.
info2man converts GNU info files to pod or -man formats.
GNU info can be a pain as it demands its own special pager, it's a binary
format, it's cruder than HTML and less documented, and most GNU- authored
manual entries basically say "we like info so we don't maintain this manual
entry, thus it is probably wrong". info2man thus converts info files so that
they can be read by ordinary tools.
Java2html is a syntax highlighter for Java and C++ source code that
produces a highlighted html file as output.
Java2html offers the following features:
- support for Java and C++
- fast (single pass conversion using flex)
- doesn't change formatting - only adds <FONT COLOR=#XXXX> tags
and properly escapes non-ascii characters
- easy integration with webservers - browse your sources colourized
- gzips http output for browsers to save bandwidth (only in CGI mode)
- documentation and manpage included
A high-speed character set detection library.
libguess employs discrete-finite automata to deduce the character set
of the input buffer. The advantage of this is that all character sets
can be checked in parallel, and quickly. Right now, libguess passes a
byte to each DFA on the same pass, meaning that the winning character
set can be deduced as efficiently as possible.
libguess is fully reentrant, using only local stack memory for DFA operations.
This package provides several functions to quickly search for substrings
in strict or lazy ByteStrings. It also provides functions for breaking or
splitting on substrings and replacing all occurrences of a substring (the
first in case of overlaps) with another. GHC before 6.10 are no longer
supported, other compilers only if they support BangPatterns. If you need
it to work with other compilers, send a feature request.
libstree is a generic suffix tree implementation, written in C.
It can handle arbitrary data structures as elements of a string.
Unlike most demo implementations, it is not limited to simple ASCII
character strings. Suffix tree generation in libstree is highly
efficient and implemented using the algorithm by Ukkonen, which
means that libstree builds suffix trees in time linear to the length
of the strings (assuming that string element comparisons can be done
in O(1)).
The XSL Cache extension is a modification of PHP's standard XSL extension
that caches the parsed XSL stylesheet representation between sessions for
2.5x boost in performance for sites that repeatedly apply the same
transform.
Although there is still some further work that could be done on
the extension, this code is already proving beneficial in production use for
a few applications on the New York Times' website.
Localize is an application to aid in the translation of .strings files.
.strings files must be distributed in ASCII encoding, which generally
isn't a convenient encoding to do translation in. As an example, its rather
difficult to enter Chinese characters into an ASCII encoded text file.
Localize will, with any luck, help out with this. Currently its just a
shell of an application, but sometime in the future I hope to complete it.
LICENSE: GPL2 or later
lttoolbox is a toolbox for lexical processing, morphological analysis
and generation of words. The analysis is the process of splitting of
words splitting a word (e.g. cats) into its lemma 'cat' and the
grammatical information <n><pl>. The generation is the opposite
process.
The package is split into three programs, lt-comp, the compiler,
lt-proc, the processor, and lt-expand, which generates all possible
mappings between surface forms and lexical forms in the dictionary.
OpenToken is a facility for performing token analysis and parsing within
the Ada language. It is designed to provide all the functionality of a
traditional lexical analyzer/parser generator, such as lex/yacc. But due
to the magic of inheritance and runtime polymorphism it is implemented
entirely in Ada as withed-in code. No precompilation step is required, and
no messy tool-generated source code is created. The tradeoff is that the
grammar is generated at runtime.