nokogiri-diff adds the ability to calculate the differences (added or removed
nodes) between two XML/HTML documents.
Features:
- Performs a breadth-first comparison between children nodes.
- Compares XML/HTML Elements, Attributes, Text nodes and DTD nodes.
- Allows calculating differences between documents, or just enumerating the
added or removed nodes.
Agrep is a tool for fast text searching allowing errors.
The three most significant features of agrep that are not supported by
the grep family are
1) the ability to search for approximate patterns;
for example, "agrep -2 homogenos foo" will find homogeneous as well
as any other word that can be obtained from homogenos with at most
2 substitutions, insertions, or deletions.
"agrep -B homogenos foo" will generate a message of the form
best match has 2 errors, there are 5 matches, output them? (y/n)
2) agrep is record oriented rather than just line oriented; a record
is by default a line, but it can be user defined;
for example, "agrep -d '^From ' 'pizza' mbox"
outputs all mail messages that contain the keyword "pizza".
Another example: "agrep -d '$$' pattern foo" will output all
paragraphs (separated by an empty line) that contain pattern.
3) multiple patterns with AND (or OR) logic queries.
For example, "agrep -d '^From ' 'burger,pizza' mbox"
outputs all mail messages containing at least one of the
two keywords (, stands for OR).
"agrep -d '^From ' 'good;pizza' mbox" outputs all mail messages
containing both keywords.
LICENSE: no redistribution for profit
The Spreadsheet Library is designed to read and write Spreadsheet Documents.
As of version 0.6.0, only Microsoft Excel compatible spreadsheets are
supported. Spreadsheet is a combination/complete rewrite of the
Spreadsheet::Excel Library by Daniel J. Berger and the ParseExcel Library by
Hannes Wyss. Spreadsheet can read, write and modify Spreadsheet Documents.
wv is a library that allows access to Microsoft Word files.
It can load and parse the Word 6-9 formats (Word 6, 95, 97, 2000).
Several converter executables called wvWare are also provided:
wvHtml, wvLatex, wvCleanLatex, wvDVI, wvPS, wvPDF,
wvText, wvAbw, wvWml, wvMime
Note: wvHtml was previously known as MSWordView.
This is a SCIM IMEngine module which uses m17n library as the backend. It
allows you to use keyboard layouts available via devel/m17n-db and
textproc/m17n-contrib through standard SCIM interface. m17n-lib currently
supports input of more than 60 languages with more than 70 language
specific input methods.
`sgrep' (structured grep) is a tool for searching text files and
filtering text streams using structural criteria. Complex criteria
can be specified as macros using M4.
Sgrep was created by:
Jani Jaakkola, email:Jani.Jaakkola@helsinki.fi
Pekka Kilpelainen, email: Pekka.Kilpelainen@helsinki.fi
From the XP homepage:
XP is an XML 1.0 parser written in Java. It is fully conforming: it
detects all non well-formed documents.
XP has the following design goals: Conformance and correctness, high
performance and a layered structure. It is currently non-validating but can
parse all external entities.
For more details, please see the XP homepage:
The Yacc to LaTeX utility takes (hopefully) any yacc source file,
and derives an Extended Backus-Naur Form (EBNF) description from
it. This EBNF is written out as LaTeX source. The output is a LaTeX
"longtable" environment, that can be included in any LaTeX document,
typically using an \input{} statement.
Sphinx is an open source full text search server, designed from the
ground up with performance, relevance (aka search quality), and
integration simplicity in mind. It's written in C++ and works on Linux
(RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few
other systems.
Sphinx lets you either batch index and search data stored in an SQL
database, NoSQL storage, or just files quickly easily and or index and
search data on the fly, working with Sphinx pretty much as with a
database server.
A variety of text processing features enable fine-tuning Sphinx for
your particular application requirements, and a number of relevance
functions ensures you can tweak search quality as well.
Searching via SphinxAPI is as simple as 3 lines of code, and querying
via SphinxQL is even simpler, with search queries expressed in good
old SQL.
Sphinx clusters scale up to billions of documents and tens of millions
search queries per day, powering top websites such as Craigslist,
DailyMotion, NetLog, etc.
And last but not least, it's open-sourced under GPLv2, and the
community edition is free to use.
Sphinx is an open source full text search server, designed from the
ground up with performance, relevance (aka search quality), and
integration simplicity in mind. It's written in C++ and works on Linux
(RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few
other systems.
Sphinx lets you either batch index and search data stored in an SQL
database, NoSQL storage, or just files quickly easily and or index and
search data on the fly, working with Sphinx pretty much as with a
database server.
A variety of text processing features enable fine-tuning Sphinx for
your particular application requirements, and a number of relevance
functions ensures you can tweak search quality as well.
Searching via SphinxAPI is as simple as 3 lines of code, and querying
via SphinxQL is even simpler, with search queries expressed in good
old SQL.
Sphinx clusters scale up to billions of documents and tens of millions
search queries per day, powering top websites such as Craigslist,
DailyMotion, NetLog, etc.
And last but not least, it's open-sourced under GPLv2, and the
community edition is free to use.