This module implements a trie data structure. The term "trie" comes from the
word retrieval, but is generally pronounced like "try". A trie is a tree
structure (or directed acyclic graph), the nodes of which represent letters
in a word. For example, the final lookup for the word 'bob' would look
something like $ref->{'b'}{'o'}{'b'}{'00'} (the 00 being an end marker).
Only nodes which would represent words in the trie exist, making the structure
slightly smaller than a hash of the same data set.
The advantages of the trie over other data storage methods is that lookup times
are O(1) WRT the size of the index. For sparse data sets, it is probably not as
efficient as performing a binary search on a sorted list, and for small files,
it has a lot of overhead. The main advantage (at least from my perspective) is
that it provides a relatively cheap method for finding a list of words in a
large, dense data set which begin with a certain string.
OSP Tourney DM is designed to facilitate competitive, yet flexible, match
play. This mod simply makes it easier and more convenient for players and
and admins alike to enhance the Quake 3 experience. There have absolutely
been *NO* changes to the core gameplay or its dynamics -- its all default
Quake 3 in this regard.
Simple tools for processing strings in Russian (choose proper form for plurals,
in-words representation of numerals, dates in Russian without locales,
transliteration, etc).
Ruby escape - HTML/URI/shell escaping utilities
Features:
- several escaping/composing functions
* HTML text
* HTML attribute value
* URI path
* shell command line
- dedicated classes for escaped strings
- escape and compose strongly related strings at once
Amberfish is general purpose text retrieval software, developed at Etymon
by Nassib Nassar and distributed as open source software under the terms
of version 2 of the GNU General Public License (GPL). Its distinguishing
features are indexing/search of semi-structured text (i.e. both free tex
and multiply nested fields), built-in support for XML documents using the
Xerces library, structured queries allowing generalized field/tag paths,
hierarchical result sets (XML only), automatic searching across multiple
databases (allowing modular indexing), TREC format results, efficient
indexing, and relatively low memory requirements during indexing (and the
ability to index documents larger than available memory). Z39.50 support
is available. Other features include Boolean queries, right truncation,
phrase searching, relevance ranking, support for multiple documents per
file, incremental indexing, and easy integration with other UNIX tools,
The architecture is also designed to permit proximity queries; however,
they are not fully implemented at present.
This port also includes the Porter stemming algorithm for suffix
stripping, available at:
http://www.tartarus.org/~martin/PorterStemmer
F# is an open-source, strongly typed, multi-paradigm programming
language encompassing functional, imperative and object-oriented
programming techniques. F# is most often used as a cross-platform CLI
language, but can also be used to generate JavaScript and GPU code.
F# is developed by The F# Software Foundation and Microsoft. An open
source, cross-platform edition of F# is available from the F# Software
Foundation. F# is also a fully supported language in Visual Studio.
Other tools supporting F# development include Mono, MonoDevelop,
SharpDevelop and the WebSharper tools for JavaScript and HTML5 web
programming.
F# originated as a variant of ML and has been influenced by OCaml, C#,
Python, Haskell, Scala and Erlang.
This is a keyboard for input of the standardized Yi script of southwestern
China with Unicode Yi fonts. It is written in Keyman keyboard language and
developed by SIL Non-Roman Script Initiative (NRSI).
This port installs the keyboard so that it can be used through SCIM or
IBus KMFL IMEngine (textproc/scim-kmfl-imengine, textproc/ibus-kmfl).
To keyboard a Yi syllable, you should type the Pinyin romanization for that
syllable, followed by a space. For keyboarding punctuation, use the usual
punctuation keystrokes.
The keyboard is compatible with Yi range as defined in Unicode 3.0 and it does
not provide keystrokes for the Yi Radicals which were added to Unicode 3.2
(U+A4A2..U+A4A3, U+A4B4, U+A4C1, U+A4C5).
icon-slicer is a utility for generating icon themes and libXcursor cursor
themes.
The inputs to icon-slicer are conceptually:
A) a set of multi-layer images, one for each size
B) a XML theme description file
Each image contains all the cursors arranged in a grid; for cursors the
layers are:
- a layer with a dot for the hotspot of each cursor
- the main image or first animation frame for multi-frame animated cursors
- the second animation frame for multi-frame animated cursors
For icons, the layers are:
- a layer with the images
- an optional layer with attachment points for emblems
- an optional layer with boxes for embedding text into icons
In practice, since loading of multilayer images is not supported by standard
image libraries, each layer is input as a separate image file.
Libtextcat is a library with functions that implement the classification
technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization" [1].
It was primarily developed for language guessing, a task on which it is known to
perform with near-perfect accuracy.
The central idea of the Cavnar & Trenkle technique is to calculate a
"fingerprint" of a document with an unknown category, and compare this with the
fingerprints of a number of documents of which the categories are known. The
categories of the closest matches are output as the classification. A
fingerprint is a list of the most frequent n-grams occurring in a document,
ordered by frequency. Fingerprints are compared with a simple out-of-place
metric.
[1] The document that started it all: William B. Cavnar & John M. Trenkle (1994)
N-Gram-Based Text Categorization, <http://citeseer.ist.psu.edu/68861.html>.
Super is a setuid-root program that offers:
o restricted setuid-root access to executables, adjustable
on a per-program and per-user basis;
o a relatively secure environment for scripts, so that well-written
scripts can be run as root (or some other uid/gid), without
unduly compromising security.
The design philosophy behind super is two-fold:
(a) some users can be trusted when executing certain commands;
(b) there are some commands, such as a script to mount CDROM's,
which you'd like to be safely executable even by users who
are NOT trusted. Although setuid-root scripts are insecure,
a good setuid-root wrapper around a sensible non-setuid script
can be hard to break, and super provides that wrapper so that
even a non-trusted user can use the scripts.