GRAPPA: Genome Rearrangements Analysis under Parsimony and other
Phylogenetic Algorithms
This suite of programs implements the approach to phylogeny
reconstruction from gene orders described in the paper
Moret, B.M.E., Wyman, S., Bader, D.A., Warnow, T., and Yan, M.,
``A detailed study of breakpoint analysis,''
Proc. 6th Pacific Symp. Biocomputing PSB 2001, Hawaii (2001).
In its current state, "grappa" allows one to explore either the
space of all possible trees on n labelled leaves or the space of all
such trees that obey (are refinements of) a particular constraint tree.
"invdist" takes the first two genomes in the input file and returns
their inversion distance.
"distmat" prints the inversion and breakpoint distance matrices.
Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA.
A k-mer is a substring of length k, and counting the occurrences of all such
substrings is a central step in many analyses of DNA sequence. JELLYFISH can
count k-mers quickly by using an efficient encoding of a hash table and by
exploiting the "compare-and-swap" CPU instruction to increase parallelism.
MAFFT offers a range of multiple alignment strategies, L-INS-i
(accurate; recommended for <200 sequences), FFT-NS-i (standard speed and
accuracy), FFT-NS-2 (fast; recommended for >2,000 sequences), etc.
According to BAliBASE and other benchmark tests, L-INS-i is one of the
most accurate methods currently available.
MAFFT has been described:
K. Katoh and H. Toh 2008 (Briefings in Bioinformatics 9:286-298)
Recent developments in the MAFFT multiple sequence alignment program.
K. Katoh, K. Misawa, K. Kuma and T. Miyata (Nucleic Acids Res. 30:
3059-3066, 2002) MAFFT: a novel method for rapid multiple sequence
alignment based on fast Fourier transform.
The Basic Local Alignment Search Tool (BLAST) finds regions of local
similarity between sequences. The program compares nucleotide or protein
sequences to sequence databases and calculates the statistical
significance of matches. BLAST can be used to infer functional and
evolutionary relationships between sequences as well as help identify
members of gene families.
AcePerl is an object-oriented Perl interface for the ACEDB genome database
system. It provides functionality for connecting to remote ACEDB databases,
performing queries, fetching ACE objects, and updating databases.
Bio::GFF3 are low-level, fast functions for parsing GFF version 3 files.
All they do is convert back and forth between low-level Perl data
structures and GFF3 text.
Phrap is a program for assembling shotgun DNA sequence data.
Among other features, it allows use of the entire read and not just the
trimmed high quality part, it uses a combination of user-supplied and
internally computed data quality information to improve assembly accuracy
in the presence of repeats, it constructs the contig sequence as a mosaic
of the highest quality read segments rather than a consensus, it provides
extensive assembly information to assist in trouble-shooting assembly
problems, and it handles large datasets.
This package also contains Swat and Cross_match.
Swat is a program for searching one or more DNA or protein query sequences
against a sequence database, using (an efficient implementation of) the
Smith-Waterman-Gotoh algorithm.
Cross_Match is a general-purpose utility based on Swat for comparing any
two sets of DNA sequences, and it can be used to:
* produce vector-masked versions of a set of reads
* compare a set of cDNA sequences to a set of cosmids
* compare contigs found by two altanative assembly procedures to each other
* compare phrap contigs to the final edited cosmid sequence.
Phred reads DNA sequencer trace data, calls bases, assigns quality values
to the bases, and writes the base calls and quality values to output files.
Trace data is read from chromatogram files in the SCF, ABI, and EST formats,
even if they were compressed using gzip, bzip2, or UNIX compress.
Quality values are written to FASTA format files or PHD files, which can be
used by the Phrap sequence assembly program in order to increase the accuracy
of the assembled sequence.
Base calling and quality value accuracies tested for:
ABI models 373, 377, and 3700
Molecular Dynamics MegaBACE
LI-COR 4000
Base calling accuracies tested for:
ABI model 3100
Beckman CEQ
It contains also a data evaluation program called 'daev'.
See DAEV.DOC for more information.
You must obtain the tarball via e-mail to build. See the web site below.
p5-Bio-Graphics is a simple GD-based renderer (diagram drawer)
for DNA and protein sequences.
p5-Bio-MAGETAB contains the core MAGE-TAB Utilities Perl modules. This
is a beta release. All functions have now been implemented and most
have test suites; the exceptions include the modules involved in
export of MAGE-TAB documents, which are still a little experimental in
nature. The API is mostly finalised (and fully documented), but some
details may yet change where necessary to improve usability.