BIO520 Exam 1 Spring 2009


Please email this lab to Yeshi (tgyeshi@uky.edu) with a subject line "BIO520 Exam 1" and name the document like so: "LundJ_exam1" or hand in written answers. Fill in your name on the exam!

You may use any books, notes, web pages, software programs, or related materials to complete this exam. You MAY NOT consult with any person regarding the exams intellectual content.

1. The forkhead box P2 gene has attacted interest because it is a highly conserved gene involved in the development of speech and language regions in the brain and has been under selection in the human lineage.

  • a. (1 pt) Find the RefSeq protein entry for isoform I of this gene. ANSWER=Genbank accession.
  • b. (1 pt) Give the name of a protein domain found in this protein.
  • c. (1 pt) What is the offical gene name or gene symbol for the forkhead box P2 gene?
  • d. (1 pt) Which human chromosome is the forkhead box P2 gene located on?
  • e. (1 pt) How large is this gene? Approximately how many bps does it span on the chromosome? (Answers accurate to with 10% of the correct value are sufficient.)
  • f. (1 pt) What phenotypes are associated with muations in this gene in humans?
  • g. (1 pt) How does a derivative Genbank database like RefSeq differ from a primary Genbank database?
  • h. (1 pt) What is most unusual about the forkhead box P2 protein sequence?
2. Scoring matrices
  • a. (2 pts) Use the linked BLOSUM80 scoring matrix with gap creation and extension costs of -11, -1 to score the alignment shown below:
    LDAGS-R
    LECGSLR
    
  • b. (1 pt) Which scoring matrix would you use to search for proteins in Archaea related to human hemoglobin?
  • c. (1 pt) Which family of scoring matrices is constructed from ungapped protein alignments?
  • d. (1 pt) PAM stands for 'point accepted mutation' and the number i.e. '40' in PAM40 indicates it was made from aligned proteins with 40 out of 100 mutated amino acids. How can the PAM250 matrix then have 250 mutations in 100 amino acids?
3. BLAST
  • a. (2 pts) The BLAST algorithm initiates matches at regions with word matches above a threshhold. BLASTP allows word sizes of two or three aa's. How does setting the word size to two aa from the default of three affect the running and results of the BLAST program?
  • a. (2 pts) After finding a word match, BLAST begins building a local alignment around the word match. What determines if this local alignment appears in the BLAST results?
  • b. (1 pt) The BLAST E-value depends on several factors. Let's say you did a BLASTN search against the nr database in 2002 and then performed the same search this year when the nr database contains 100 times more nucleotides. Your results both times include a match between your query and a particular horse protein. If the E-value was E=1e-12 for a HSP in 2002 what would it be today?
  • c. (1 pt) To find the corresponding gene in other vertebrates is it better to use the genomic sequence, mRNA, or protein as BLAST input? Explain.
  • d. (3 pt) You identify a fly gene in a mutant screen. In preparing a grant application you do a BLAST search for the corresponding human gene so you can discuss how your proposal may impact human health and welfare but are unable to find a matching gene. Give three possible reasons for this result.
4. BLAST search. Refer to the linked BLAST search of a mouse chemokine to answer this question. link
  • a. (1 pt) Which BLAST program was used?
  • b. (1 pt) Which database was searched?
  • c. (1 pt) What organisms is this gene found in, i.e. what is its phylogenetic range?
  • d. (2 pts) Examine the match to the dog gene with Genbank accession AB164433.1. What percent identity and percent positives are found in this alignment?
  • e. (1 pt) Give the E-value for this alignment to AB164433.1 and indicate whether it indicates a strong, moderate, or weak match.
  • f. (1 pt) Do the results for this search show all the matches to this mouse chemokine in this database? Give a yes or no answer and a brief explanation.
  • g. (1 pt) Give the Genbank accession of the weakest significant match shown in these results.

5. (2 pts) Give two things that are easily seen comparing two sequences with a dotplot analysis but difficult or impossible to find using BLAST to align the sequences.

6. PSSMs.
  • a. (2 pts) Why are pseudocounts in a PSSM given less weight when more input sequences are present in the alignment?
  • b. (2 pts) Why is it better to weight pseudocounts by their similarity to the consensus rather than give them each the same value?
  • c. (1 pts) What determines how many columns a PSSM has (how long it is)?
  • d. (2 pts) If you wanted to find all the members of the SIR2 protein family, why would searching databases using a PSSM be more sensitive than using one or two members of the family (for example, human SIRT1 and yeast SIR2)? Are there family members that might be found using a well-chosen (or lucky) simple search that a search using a PSSM would not discover?
7. Refer to the linked CLUSTALW multiple alignment of a set of genes containing the ancient T-BOX domain for the questions below. link, CLUSTAL input sequences
  • a. (2 pts) Examine the guide tree for this CLUSTAL alignment and list the order in which the sequences (or sub-alignments) will be aligned to build the final multiple sequence alignment.
  • b. (1 pts) The guide tree shows anemone and horse in a sub-cluster rather than horse more tightly clustered with mouse and human. Give a reason why this wrong guide tree cluster could have occured.
  • c. (1 pts) The worm protein shows several small indels compared to the other sequences. In what region or secondary structure of the T-BOX protein are these most likely to be found?
  • d. (1 pts) In position 43 of the alignment (140T in the mouse sequence) both Ser and Thr are found. How would you characterize the substitutions at this position?
  • e. (1 pts) Describe a position in the alignment where a substitution changes the charge of an amino acid at an otherwise conserved position in the alignment. Substitutions that change charge can be disruptive but are sometimes seen in salt bridges where a balanced pair of charged aa's that interact both switch charge.
University of Kentucky  BIO520
Site maintained by Jim Lund