re

This Perl program finds the restriction enzyme (RE) sites in a DNA sequence, and outputs a list of the positions of the sites for each RE.

Usage: re [-eRE.file] [-s#,#] [-w#] [-r#,#] <DNA_file >output_file

This script reads the DNA seq. in from standard input. Usu. you will want the script to read in the DNA seq. from a file. <DNA_file does this, where 'DNA_file' is the file containing the DNA seq.

The script's output is sent to standard output. '>output_file' directs the program's output to the file 'output_file'. If no output file is specified, the program's output is printed on the screen.

-e Switch specifies the file containing the REs. RE.list has all the usual ones, but you can make your own file by adding or deleting REs. The file RE.list is the default RE file. An alternate RE file can be specified using '-e'.

-s#,# Switch is used to specify a subsequence for analysis. For example, if you input a 50000bp seq., and specify '-s40000,45000', then the script will only output REs found within the subseq., and will retain the numbering of the larger seq., i.e. 'BamHI 41333, 44001'

The first base is base number 1. The last bp can be specified by number or by 'end'. '-s500,end' directs the script to analyze the DNA from bp 500 to the end of the sequence.

-w# Switch tells the script to only examine the DNA for REs which have a recognition site equal to or bigger than # nucleotides. '-w6' will show BamHI and NotI sites, but will not use AluI, a 4-base cutter.

-r Switch governs whether to report the RE sites found for a given RE. The different formats that can be used with this switch are: '-r#,#' '-r#,' and '-r,#'. '-r1,1' outputs only the REs that cut once in the examined seq. '-r1,9' outputs the REs that cut between 1 and 9 times in the DNA seq. '-r0,0 gives a list of the REs that don't cut the DNA seq. '-r1,' gives the REs that cut one or more times. '-r,10 gives the REs that cut 10 or fewer times.

Examples:

re <DNA1.seq >sites.seq re -s1001,5000 -w6 <DNA1.seq )sites.seq

re -s2000,end -w6 -r1, <DNA1.seq >sites.seq

The first example finds the RE sites in the DNA seq. in the file DNA1.seq and writes them in the file sites.seq. The second example does the same, but only RE's that have 6 bp or bigger recognition sites are used, and only the DNA from bp 1001-5000 is analyzed. The third example analyzes the DNA from bp 2000 to the end of the file, and only prints a given RE in the output file if the DNA has a site.



Updated, 3/99, written 1/28/97

Written by Jim Lund in the lab of Roger Reeves, Johns Hopkins University