mask

This perl 5 program reads the a DNA sequence from the file name given as an argument. The DNA seq. is formatted and sent to the GRAIL server, which finds repetitive elements and simple sequence repeats, and returns a list of repeats. The repetitive parts of the DNA are then masked with the character given in $mask, by default 'N'. The masked DNA is written to standard output.

More extensive help with mask

Usage: mask [-d -gRepeat.file -h|r -i -mX -ttest.log] DNA.file >DNA.mask

Options:

-d Use full degenerate DNA code plus masking character. Default is to consider (a, c, g, t, n) and masking character to be part of the DNA.

-gRepeat_file The list of repetitive elements returned from the GRAIL server will be written to file Repeat_file.

-h DNA will be searched for human repetitive elements.

-r DNA will be searched for mouse repetitive elements. This is the default, -r switch not really needed. :) Either -h or -r can be used, but not both.

-mX Change the masking character in the input sequence. The default masking character is 'x'. The masking character is treated as a DNA bp. To specify no masking character, use -m alone, without a following

character.

-q Print out this usage information.

-tTest.file The raw output from the Grail server will be written to the file Test.file. This will include any error messages received.

Example 1: mask -gSeq1.repeats Seq1 >Seq1.mask Example 2: mask -h Seq2 >Seq2.mask

The DNA sequence can be raw,FASTA format, or FASTA format plus feature lines. Line numbers and whitespace is ignored, as is any character not part of DNA code. The DNA sequence must be part of the strict DNA code, ACGTN + masking character by default. Or if the degenerate code option is selected, tbe input seq. can include base pairs in the full degenerate code, acgtn + rykmswbdhv + masking character (So mask can be run several times on a seq.).

Last modified: 11/98

Written by Jim Lund in the lab of Roger Reeves, Johns Hopkins University