count_bp

This perl program counts the number of base pairs in a DNA sequence. By default it reads a DNA seq. from standard input, counts the number of base pairs, and writes the number to standard output. If the file has more than one DNA seq., indicated by there being more than one FASTA description line, then the FASTA tag followed by the number of bp in the sequence is written for each FASTA seq. in the file.

DNA seq. can be in FASTA format, or raw. Line numbers, white space, and characters that aren't part of the DNA code are ignored. The masking char. is counted as part of the DNA.

Usage: count_bp [-i] [-n] [-mX] [-q] [-d] <dna.seq.file

Options:

-c When the input file contains more than one FASTA DNA sequence, the numbering is cumulative, i.e. the size reported for the second sequence is the number of bps in the first and second sequences added together. Without this argument, the size of each individual file is calculated.

-n This argument forces the input file to be considered as a single DNA sequence.

-mX Set mask character to a different value. 'x' is the default. The mask character will be counted as a DNA bp. To specify no masking character, use -m alone, without a following character.

-q Print out this help message.

-i Interactive mode. When called this way, the program will ask which options to use, and prompt for a file name.

-d Use full degenerate DNA code plus masking character. Default is to consider (a,c,g,t,n) and the masking character to be part of the DNA.

Updated 6/9/98

Written by Jim Lund in the lab of Roger Reeves, Johns Hopkins University