SeqAn is an open source C++ library of efficient algorithms
and data structures for the analysis of sequences with the
focus on biological data.
This port includes only the library, the apps have been
moved to biology/seqan-apps. This is the last release of
the version 1 API, it is highly recommended to upgrade to
biology/seqan (version 2).
The European Molecular Biology Open Software Suite (EMBOSS) is a
comprehensive set (about 100) of open source tools for genetic sequence
analysis. EMBOSS is produced by the European Molecular Biology Network
(EMBnet - http://www.embnet.org/).
All EMBOSS tools are all built around the same set of core libraries - AJAX
and NUCLEUS - and therefore share a unified user interface, have similar
"look and feel", and implement a uniform sequence addressing methodology.
The various components of EMBOSS are distributed under the GPL, except the
core libraries which are under the LGPL.
EMBASSY packages are third party applications which have been integrated with
the EMBOSS suite, but which are not included in the base EMBOSS distribution
for licensing or other reasons. The EMBASSY packages live in the
biology/embassy port.
Version 3 of the FASTA packages contains many programs for searching DNA and
protein databases and one program (prss3) for evaluating statistical
significance from randomly shuffled sequences. Several additional analysis
programs, including programs that produce local alignments, are available as
part of version 2 of the FASTA package, which is available as the port
biology/fasta.
FASTA is described in: W. R. Pearson and D. J. Lipman (1988), "Improved
Tools for Biological Sequence Analysis", PNAS 85:2444-2448; W. R. Pearson
(1996) "Effective protein sequence comparison" Meth. Enzymol. 266:227-258;
Pearson et. al. (1997) Genomics 46:24-36; Pearson, (1999) Meth. in
Molecular Biology 132:185-219.
The FASTA3 suite is distributed freely subject to the condition that it may
not be sold or incorporated into a commercial product.
From the website:
T-Coffee is a multiple sequence alignment package. Given a set of sequences
(Proteins or DNA), T-Coffee generates a multiple sequence alignment.
Related publications:
- 3DCoffee: Combining Protein Sequences and Structures within Multiple
Sequence Alignments.
O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame.
Journal of Molecular Biology, Vol 340, pp385-395, 2004
- T-Coffee: A novel method for multiple sequence alignments. C.Notredame,
D. Higgins, J. Heringa, Journal of Molecular Biology,Vol 302,
pp205-217,2000
- COFFEE: A New Objective Function For Multiple Sequence Alignmnent.
C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5)
407-422,1998
PyCogent is a software library for genomic biology. It is a fully integrated
and thoroughly tested framework for: controlling third-party applications;
devising workflows; querying databases; conducting novel probabilistic
analyses of biological sequence evolution; and generating publication quality
graphics. It is distinguished by many unique built-in capabilities (such as
true codon alignment) and the frequent addition of entirely new methods for
the analysis of genomic data.
SeqAn is an open source C++ library of efficient algorithms
and data structures for the analysis of sequences with the
focus on biological data.
This port contains applications built on SeqAn and developed
within the SeqAn project. Among them are famous read mappers
like RazerS and Yara, as well as many other tools. Some
applications are packaged separately and the library
can be found at biology/seqan.
The Biopython Project is an international association of developers who are
providing freely available Python tools for use in areas of computational
molecular biology such as bioinformatics and genomics.
Biopython is a collection of Python packages and modules created by the
Biopython Project, intended to provide the basis for building bioinformatics
applications in the Python language.
Note that the current release is alpha quality, and not yet deemed to be
stable.
This port includes optional support for Biopython-CORBA, a CORBA interface
built to the BioCorba standard (http://biocorba.org/).
The computer program avida is an auto-adaptive genetic system designed
primarily for use as a platform in Digital or Artificial Life research. The
avida system is based on concepts similar to those employed by the tierra
program developed by Tom Ray. It is a population of self-reproducing strings
with a Turing-complete genetic basis subjected to Poisson-random mutations.
The population adapts to the combination of an intrinsic fitness landscape
(self-reproduction) and an externally imposed (extrinsic) fitness function
provided by the researcher. By studying this system, one can examine
evolutionary adaptation, general traits of living systems (such as
self-organization), and other issues pertaining to theoretical or
evolutionary biology and dynamic systems.
Version 2 of the FASTA packages contains many programs for performing
sequence comparisons, producing local alignments, and other related tasks
for analysing DNA and proteins.
Currently, the FASTA2 suite is in maintenance mode. This package provides
the analysis tools from FASTA2. The searching programs are available in
version 3 of the FASTA packages, which may be found in the port
biology/fasta3.
FASTA is described in: W. R. Pearson and D. J. Lipman (1988), "Improved
Tools for Biological Sequence Analysis", PNAS 85:2444- 2448, and W. R.
Pearson (1990) "Rapid and Sensitive Sequence Comparison with FASTP and FASTA"
Methods in Enzymology 183:63- 98).
The FASTA2 suite is distributed freely subject to the condition that it may
not be sold or incorporated into a commercial product.
ARIADNE is a package of two programs, ariadne and prospero, that compare
protein sequences and profiles using the Smith-Waterman algorithm, and
assesses statistical significance using a new accurate formula,
described in Mott, 2000, "Accurate Formula for P-values of gapped local
sequence and profile alignments" J. Mol Biol. 300:649-659.
The sequence/profile comparison algorithms used in ARIADNE are standard,
and are probably not the fastest implementations available. The novel
part is the method for determining statistical significance, which will
give thresholds of significance that are accurate to within 5% 95% of
the time.
The package is written in ANSI C. You are free to incorporate the method
used for assessing statistical significance into third-party code,
provided you cite the above reference. The routines for assessing
significance are all in gaplib.c