Help on DNA formats
My scripts can use DNA formatted in the FASTA format, bare DNA (a file that only contains the DNA sequence and nothing else), or DNA plus non-DNA characters. Some handle DNA with annotations, which always start with a ';'. Genbank entry files are also supported. Fasta format is a file that contains a single description line that begins with a caret ">" followed by the DNA. Example: >DNA seq. 1 aaaggcgcgcgcgccggtgtgtgtgt atatatatctctctttgagagagagag ttttt The Fasta format allows a file to contain mutiple DNA sequences, each with a descriptor line. However, most of my scripts can only handle one DNA sequence per file, count_bp can handle input containing multiple FASTA sequences. Also, my scripts expect that DNA is either in Fasta format or bare. Non-DNA characters (character not part of the degenerate DNA code: acgtnrybdhvkmsw) are ignored. So, line numbers are ignored. Leave them in, take them out--it won't affect these scripts. You can use this to filter line numbers or formating characters out of a DNA sequence using the program cutdna, like so: cutdna 1 endLast updated 4/28/97dna.clean
Written by Jim Lund in the lab of Roger Reeves, Johns Hopkins University