Help on DNA formats

My scripts can use DNA formatted in the FASTA format, bare DNA (a file that
only contains the DNA sequence and nothing else), or DNA plus non-DNA
characters.  Some handle DNA with annotations, which always start with a ';'.
Genbank entry files are also supported.

Fasta format is a file that contains a single description line that begins
with a caret ">" followed by the DNA.

Example:

>DNA seq. 1
aaaggcgcgcgcgccggtgtgtgtgt
atatatatctctctttgagagagagag
ttttt

The Fasta format allows a file to contain mutiple DNA sequences, each with a
descriptor line.  However, most of my scripts can only handle one DNA sequence
per file, count_bp can handle input containing multiple FASTA sequences.

Also, my scripts expect that DNA is either in Fasta format or bare.  Non-DNA
characters (character not part of the degenerate DNA code: acgtnrybdhvkmsw)
are ignored.  So, line numbers are ignored.  Leave them in, take them out--it
won't affect these scripts. 

You can use this to filter line numbers or formating characters out of a
DNA sequence using the program cutdna, like so:

cutdna 1 end dna.clean
Last updated 4/28/97
Written by Jim Lund in the lab of Roger Reeves, Johns Hopkins University