html2xhtml converts HTML files into XHTML. It can fix many common
errors in HTML files (e.g. missing end tags, elements with incorrect
content model, non-standard elements or attributes, etc.) It can
also handle invalid or non well-formed XHTML input, and clean it
to produce a well-formed and valid XHTML output. The output document
type can be selected among several XHTML DTDs (1.0, 1.1, Basic, etc.)
An object-oriented SGML/XML parser toolkit and DSSSL engine.
Features summary:
* Includes nsgmls
* Provides access to all information about SGML document
* Supports almost all optional SGML features
* Sophisticated entity manager
* Supports multi-byte character sets
* Object-oriented
* Written in C++ from scratch
* Fast
* Portable
* Production quality
* Free
Note: This port is a superset of the sp port. If you have sp
installed, it is recommended that you remove it before installing
jade.
John Fieber
jfieber@FreeBSD.org
This is a keyboard for input of the complex Biblical Hebrew (including
cantillation marks) with Unicode fonts. It is written in Keyman keyboard
language and developed by SIL Non-Roman Script Initiative (NRSI).
This port installs the keyboard so that it can be used through SCIM or
IBus KMFL IMEngine (textproc/scim-kmfl-imengine, textproc/ibus-kmfl).
The keyboard is provided under the terms of MIT/X11 License.
http://scripts.sil.org/SILHebrUni_Documentation
This library supports full W3C XML Schema regular expressions inclusive
all Unicode character sets and blocks. It is implemented by the
technique of derivations of regular expressions. The W3C syntax is
extended to support not only union of regular sets, but also
intersection, set difference, exor. Matching of subexpressions is also
supported. The library can be used for constricting lightweight
scanners and tokenizers. It is a standalone library, no external regex
libraries are used.
Libxslt is the XSLT C library developed for the GNOME project. XSLT itself is
a an XML language to define transformation for XML. Libxslt is based on
libxml2, the XML C library developed for the GNOME project. It also implements
most of the EXSLT set of processor-portable extensions functions and some of
Saxon's evaluate and expressions extensions.
People can either embed the library in their application or use xsltproc the
command line processing tool.
SAC (Simple API for CSS) is an event-based API much like SAX for XML.
If you are familiar with the latter, you should have little trouble
getting used to SAC. More information on SAC can be found online at
http://www.w3.org/TR/SAC.
CSS having more constructs than XML, core SAC is still more complex than
core SAX. However, if you need to parse a CSS style sheet, SAC probably
remains the easiest way to get it done.
DelimMatch allows you to match delimited substrings in a buffer. The
delimiters can be specified with any regular expression and the start
and end delimiters need not be the same. If the delimited text is
properly nested, entire nested groups are returned.
In addition, you may specify quoting and escaping characters that
contribute to the recognition of start and end delimiters.
-Anton
<tobez@FreeBSD.org>
You have two databases of person records that need to be synchronized
or matched up, but they use different keys--maybe one uses SSN and
the other uses employee id. The only fields you have to match on
are first and last name.
That's what this module is for.
Just feed the first and last names to the name_eq() function, and
it returns undef for no possible match, and a percentage of certainty
(rank) otherwise.
Seamus Venasse <svenasse@polaris.ca>
This class knows how to read two treebank formats, the Penn format
and the Chomsky Normal Form (CNF) format. These formats differ in
how they handle terminal nodes. The Penn format places pre-terminal
part of speech tags in the left-hand position of a
parenthesis-delimited pair, just like it does non-terminal nodes.
The CNF format attaches pre-terminal tags to the word with an
underscore.
OpenOffice::OODoc is an extensible Perl interface allowing direct
read/write operations on files which comply with the
OASIS Open Document Format for Office Applications (ODF),
i.e. the ISO/IEC 26300:2006 standard.
It provides a high-level, document-oriented language, and isolates
the programmer from the details of the file format. It can process
different document classes (texts, spreadsheets, presentations,
and drawings). It can retrieve or update styles and images,
document metadata, as well as text content.