Comparative Genomic Analysis Tools (CGAT)
These programs can handle arbitrarily large DNA sequences. They were written to analyze genomic sequences. Sets of BAC or PAC analyses can be merged as the sequence of overlapping clones becomes available.
The programs run on a UNIX platform, and all use a text interface. They are written in Perl (with the exception of lineplot, written in C), and can thus be easily ported to the Mac or to Windows. Some of the programs have been ported the Macintosh.
Instructions for downloading and installing the package of programs.
Instructions for using the package of programs
Getting started using the package of programs.
Instructions on using fplot and on the fplot file format.
DNA formats used by these programs.
Program descriptions
Follow the links to view the help page for each program.
dna_to_fplot
Runs the most common set of analyses automatically. Genomic DNA is masked, searched against the Genbank databases using BLASTN, searched for exon and other features using GRAIL 2. cDNA sequence is also translated and searched against the nr database. For cDNA sequences, ORF analysis is performed, but GRAIL 2 isn't used. The results of these analyses are combined and displayed as an HTML page.fplot
Displays sequence features graphed along DNA. Produces output in postscript or dynamic Html formats. The input file is a lightly structured text file, and can easily be manipulated by hand, or with an associated program, preplot.dna_plot
Produces a text file showing the DNA sequence with features from an fplot format file, splice sites and polyA sites, and the conceptual translation graphed along the DNA.preplot
Imports output from other sequence analysis programs into a fplot input file, and manipulates an fplot input file. Currently, can read in GRAIL services, lineplot, GRAIL repeats, and MZEF output files.blast_off
Searches arbitrarily long sequences against the NCBI databases using the BLAST email server.parse
Summarizes and filters BLAST output. Works with blast_off output, or with saved BLAST searches. Output can optionally be sent to preplot and fplot. Uninteresting databases sequences or classes of sequences can be filtered out. BLAST results can also be filtered based on the strength of the matches (percent match, Pval, or HSP score).lineplot
Performs a filtered dotplot comparison of two DNA sequences.grailer
Sends DNA off for GRAIL services: exon prediction, Pol II promoter prediction, polyA site prediction, and CpG island prediction.mask
Masks repetitive sequences using the GRAIL server. The sequence is masked for both complex repeats (B1, Alu, LINE) and simple seq. repeats.repeat_now
Sends off DNA to the RepeatMasker email server (results come by email).cutdna
Gives the requested DNA subsequence. Can be used to filter line numbers or other garbage from DNA, and will also give the reverse complement, or format the DNA.count_bp
Gives count of bp in DNA.GC_content
Calculates GC content of DNA and makes an fplot graph of GC content along the sequence.re
Lists restriction enzyme sites in DNAorf
Generates list of open reading frames (ORFs) in DNA, and the corresponding AA sequence.trans
Translates DNA to AA. The translations can be printed directly, or with the AA's printed along the DNA.rnaspl_to_fplot
Formats the results of RNASPL as an fplot file. RNASPL is a program which finds exon-exon splice sites in cDNA sequence,