LAB 9:  Gene Finding

Artemis DNA Sequence Viewer and Annotation Tool
GeneMark
GeneMark home page
Genscan
E. coli K12 DNA sequence
glyceraldehyde-3-phosphate dehydrogenase Ptroglodytes2.fasta

The primary objectives are:

  • Learn how to use GeneMark and GENSCAN to find genes in DNA.

  • Understand gene prediction data.

  • Learn how to integrate different types of gene prediction data.

 

1. Let's try a simple genefinding exercise. Use predictive genefinding to find the best genemodel for the linked segment of DNA from E. coli, Ecoli3.fasta. Please add features to this molecule, using Artemis to annotate the features:
  1. cistrons (as genes or CDS features).
  2. RBS
  • a. Open the seqeunce up in Artemis. Search for ORFs (Menu item Create->"Mark Open Reading Frames"). How many ORFs > 50 aa with start and stop codons are found in this sequence?

  • b. Now use GeneMark to search for likely genes in this sequence. Select fro PDF output and the predicted gene options. How many genes does GeneMark find?

  • c. Use this information to annotate genes in the DNA sequence. Use BLAST to search for proteins with homology to the GenMark predicted proteins and characterize the proteins. You can either BLAST the GenMark predictions yourself or annotate them in Artemis and use the Run->"NCBI Searches" menu tool. A strong BLAST match provides additional support for a gene prediction and may identify a gene. For each predicted protein, indicate a database match (if one is found). ANSWER like this: Gene1: 123-234bp, Glucose dehydrogenase, Gene2...


2. Please use a human DNA sequence, the gene for human glyceraldehyde-3-phosphate dehydrogenase (a highly expressed gene central to metabolism) for the following questions:
  • a. Examine the gene features (using Artemis or a text editor). What would you expect the effect of deleting the translation initiation sequence site (also called the Kozak sequence, consensus [A/G]CCATGG). Give a brief description of effect on gene expression.

  • b. Would the mRNA be significantly shorter? Yes/No, a sentence or two.

  • c. Would the encoded polypeptide likely be different Yes/No, a sentence or two, AND paste the TRANSLATED product that you expect in one-letter code.

  • d. Use the full gene for d. and e. BLASTN can be used to search the human EST database. (EST stands for expressed sequence tags. These are basically just short DNA sequences from the ends of randomly picked cDNA clones. There are millions of these in the databases.) How could you use this information to test the exon/intron structure? Does this information support the exon/intron structure annotated for this gene? Does the ESTs show evidence of alternate splicing in this gene? Yes/No, and a few sentences of explanation.

  • e. Please predict the exon structure of this gene using Genscan, a very effective Hidden Markov Modeler. Very briefly indicate how well the intron/exon structure that Genscan predicts matches with the annotation.

  • f. Use the Genscan help pages to answer this question. Does Genscan work better on low or high GC-content genomic DNA?


3. Please use a chimpanzee DNA sequence Ptroglodytes2.fasta for the following questions:
  • a. Please predict genes in this segment of genomic DNA using Genscan. Describe the genes predicted, giving a name (i.e., gene 1), # of exons, and a brief description of the predicted protein for each.

  • b. This genomic DNA is from Chr 21, bp 18,340K - 18,520K. Find this region in the NCBI Mapviewer for chimpanzee and compare the predictions. To find the MapViewer, follow links 'All Resources', then 'Map Viewer', then click on Pan troglodytes (chimpanzee) Build 2.1', then on the chromosome ideogram. At the left of the map view, you can enter bp coordinates. Indicate genes found correctly (or partially correctly by Genscan), genes predicted by Genscan not present in the reference annotation, and genes only found in the reference annotation. To see the MapView exon bp locations, click on 'Download/View Sequence/Evidence', then 'Display' in GenBank format.

  • c. What other information could you use to refine the Genscan predictions and improve on its gene predictions? Give three additional sources of information and describe what aspect(s) of the prediction it would help with.


    University of Kentucky  BIO520
    Site maintained by Jim Lund