1 a. Let's look at the COGs site, specifically at the eukaryotic clusters of orthologous groups, the KOGs. How many total KOGs are there? How many human, fly, and worm genes are in a KOG?
1 b. We can use the KOGs to get a measure of genes conserved between paris of species. The KOGs are grouped by species conservation pattern. How many genes are in a KOG that contains human and worm genes? Add up the numbers of worm and human genes in each of these classes of KOGs to get the total. What percent of total worm genes is this? What percent of total human genes?
1 c. The number of conserved genes depends on what criteria are used. Orthologs vs. homologs, and what cutoffs are used for determining homology. The Eugenes site also describes homologs between pairs of species (in the 'All Genes summaries' links). The Eugenes site uses BLASTP searches to compare each protein in one species to a library of proteins from other species. A match is called if the E-value for an alignment is less than 1e-30. How many human genes have worm homologs according to Eugenes? How many worm genes have human homologs?
2 a. We saw in one of the early labs that some genes are very ancient. The globin protein domain is conserved back to prokaryotes. Some genes have a more restricted phylogenetic distribution. Use the KOG database to look for genes found in invertebrate metazoans but not in mammals or plants (fungi is OK). Give two such KOGs, a gene in from each KOG, and a description of the gene.
2 b. Humans are the only mammal in the KOG analysis so we can't find mammal-specific genes using KOGs. NCBI does have a more extensive database of homologs, HomoloGene. Search HomoloGene for genes found in mammals but not in C. elegans or D. melanogaster. How many HomoloGenes with this conservation pattern are found in the database? Give two of these genes (pick ones with an informative annotation).
3. Now let's look at conservation of genes and non-coding sequence. Conserved non-coding sequences near genes are often regulatory elements. By lining up genes and comparing the genomic sequence between pairs of species these conserved sequences can be identified. Having multiple genomic sequences for a group of organisms allows a more detailed analysis of these sequences--you can ask when a block of conserved sequence arose, is a particular block of conserved sequence present in all the organisms in a group, or has it been lost by some species? The rhesus monkey sequence is not useful for this--it's too similar to the human sequence.
3 a. Use VISTA to look at genomic sequence conservation of AKT1. Select Human as the 'Reference' or base genome, and use the 'Position' box to search for AKT1. There are two matches for AKT1 for different splice forms of the gene. Select the larger splice variant. The default genomic sequence comparison is to a group of organisms from the rhesus monkey to chickens. There are many tracks in this browser. The top track shows exons. The bottom tracks are graphs of sequence conservation. The sequence track labels exon sequence in blue and conserved non-coding sequence (peaks above 70%, an arbitrary cutoff) in red. Which exons of AKT1 are conserved between human and mouse at above 70% sequence identity?
3 b. Examine the AKT1 genomic region from 2 kb 5' of AKT1 to 2kb 3' of AKT1. Describe conserved non-coding sequences you find near the 5' or 3' end of the gene and in introns between human and mouse. This sort of analysis in not always simple--some sequences are not conserved at all, perhaps indicating the element is not present in a particular species. Other conseerved elements seem to have some conservation but not a lot, perhaps indicating that the element is present but is diverging in sequence and perhaps function. List regions of conserved sequence you find and give % conservation, approximate size, location (5', intron #, or 3').
3 c. Now examine the dog and chicken genome alignments. Are all the exons of human AKT1 conserved with dog and chicken?
3 d. Are the conserved non-coding sequences you describe in 3b conserved in dog and chicken? Are there regions of the sequence conserved between human-dog or human-chicken that are not conserved between human-mouse? List any such regions with location, % conservation, and approximate size.
4 a. Expand the region you are examining (or pan to lower bps) and examine the gene 3' of AKT1, SIVA1. Two of the exons of SIVA1 are not conserved (or very poorly conserved) between human-mouse. Which exons?
4 b. Are the exons you found in 4a found in all SIVA1 transcripts or is one or both of them an alternativelty spliced exon? How did you determine this?