Parsec is designed from scratch as an industrial-strength parser
library. It is simple, safe, well documented (on the package homepage),
has extensive libraries and good error messages, and is also fast. It
is defined as a monad transformer that can be stacked on arbitrary
monads, and it is also parametric in the input stream type.
TagSoup is a library for parsing HTML/XML. It supports the HTML 5
specification, and can be used to parse either well-formed XML, or
unstructured and malformed HTML from the web. The library also provides
useful functions to extract information from an HTML document, making it
ideal for screen-scraping.
The texmath library provides functions to read and write TeX math,
presentation MathML, and OMML (Office Math Markup Language, used in
Microsoft Office). Support is also included for converting math formats to
pandoc's native format (allowing conversion, via pandoc, to a variety of
different markup formats). The TeX reader supports basic LaTeX and AMS
extensions, and it can parse and apply LaTeX macros.
Hunspell is a widely used spell checker
Main features:
- Extended support for language peculiarities; Unicode character encoding,
compounding and complex morphology.
- Improved suggestion using n-gram similarity, rule and dictionary based
pronunciation data.
- Morphological analysis, stemming and generation.
- Hunspell is based on MySpell and works also with MySpell dictionaries.
- C++ library under GPL/LGPL/MPL tri-license.
libodfgen is a library for generating documents in Open Document Format
(ODF). It provides generator implementation for the following libraries:
* libwpd (::WPXDocumentInterface): text documents
* libwpg (libwpg::WPGPaintInterface): vector drawings
* libetonyek (libetonyek::KEYPresentationInterface): presentations
As these APIs are used by multiple libraries, libodfgen can be used to
generate ODF from many sources.
Liblinebreak is an implementation of the line and word breaking algorithm
as described in Unicode 5.1.0 Standard Annex 14, Revision 22. It breaks
lines that contain Unicode characters. It is designed to be used in a
generic text renderer. FBReader is one real-world example.
By default, this module exports a single hash (`%RE') that stores or
generates commonly needed regular expressions. Patterns currently
provided include:
* balanced parentheses and brackets
* delimited text (with escapes)
* integers and floating-point numbers in any base (up to 36)
* comments in C, C++, Perl, and shell
* offensive language
* lists of any pattern
* IPv4 addresses
Kibana is an open source (Apache Licensed), browser based analytics and search
interface to Logstash and other timestamped data sets stored in ElasticSearch.
With those in place Kibana is a snap to setup and start using (seriously).
Kibana strives to be easy to get started with, while also being flexible and
powerful.
Loook is a simple Python tool that searches for text strings in
LibreOffice and OpenOffice.org files.
AND, OR and phrase searches are supported. It doesn't create an index,
but searching should be fast enough unless you have really many files.
What Is S5?
* It's a Simple Standards-based Slide Show System
* One XHTML document provides all of the slide show's content
* CSS handles the layout and look of the slides
* JavaScript handles the dynamic aspects of the show
* That's all there is to it!