Primer name update
4-10-02 update
File containing updated gene names for the primer pairs: pcr.match.4-25-02.plus_gels.txt
Primer pair sequences: Primers.all.8-05.txt
Changes:
- The primer pairs with names that change have been split into different classes.
The previous version of the the update
The third version of the the update
The second version of the the update
The first version of the the update
Description of the Kim lab chip primer set
1323 Primers were poor and were re-made1323 Re-made poor primers
17887 Primers made once
---------
20533 total primer pairs
During the following dicussion, I break up the primers into 2 lists:
1) Poor primers (remade)
2) Primers made once and the re-made primers (all_primers)
Approach to updating the names
1) Download AceDB. The feature file (.gff) for each chromosome contains the bp positions of each exon and repetitive seqeunce features.2) Compare the Kim chip primers with the chromosome sequence. Find where the primers match the genomic sequence. Assemble primer matches into PCR products with their location in the genomic sequence.
3) Find what AceDB features (exons and repetitive features) overlap the PCR products.
4) Filter and summarize the results.
5) Correct for some irregularities.
Decisions to be made in the analysis
1) 5 kb cutoff used.2) 50 bp overlap between a PCR product and gene cutoff.
4) Where the PCR product matches in the gene is not taken into account.
4) Repetitive sequence filtering is not done. See histogram below.
5) Some genes amplify multiple pcr products. Additional PCR products are included up to 1.5 times the size of the smallest predicted product.
Updated gene names
Updated!4-20-02 update all_primers file
Primer pairs which change names
Primers pairs which match multiple genes (648 primer pairs)Primers pairs which match multiple genes, version 1
119 of these genes match one gene, but also match tRNA(s).
Primers pairs which no longer match a gene (67 primer pairs)
Overall summary
Which PCR products changed names?all_primers
Changed 3125
Same 15803
None 203
No predicted PCR product 80
remade
Changed 114
Same 1204
None 4
No predicted PCR product 3
Overall assessment of the primer pair set predicted names:
all_primers
Good prediction 17847
Multiple genes 904
Not a gene 352
No predicted PCR product 80
remade
Good prediction 1235
Multiple genes 82
Not a gene 4
No predicted PCR product 3
Overall assessment of the primer pair set predicted taking into consideration whether the PCR worked:
all_primers
Good array spot 16543
remade
Good array spot 932
Compare the full set of genes in AceDB and in the list of what the Kim primer set amplifies. If we discard PCR products that contain more than 1 gene, and ask for the full set of genes in AceDB and in the list of what the Kim primer set amplifies again:
Kim primers amplify single genes: 19109 (93% of primer pairs)
Acedb genes: 19733
Acedb genes amplified by Kim primers: 17175 (86% of Acedb genes)
Acedb genes not amplified: 2675 (13.5% of Acedb genes)
(a few dozen gene names don't match).
Now use stricter criteria. Require that the 2000_pcr result be good, faint, or not_run_on_a_gel in addition to the above criteria.
Kim primers, good PCR amplifies single gene: 17475 (85% of primer pairs)
Acedb genes: 19733
Acedb genes amplified by good PCR, single gene: 16107 (82% of Acedb genes)
Acedb genes not amplified 3731 (19% of Acedb genes)
(a few dozen gene names don't match).
Summary histograms
Number of genes matching the PCR products
=0 434 1 17478 2 1100 3 107 4 23 5 13 6 11 7 7 8 4 9 1 10 0 >10 33
Number of PCR products per primer pair
=0 200 1 18412 2 377 3 45 4 15 5 17 6 16 7 6 8 5 9 1 10 2 >10 117
PCR product size histogram:
=0 307 1 - <101 3 101 - <201 7 201 - <301 4 301 - <401 17 401 - <501 29 501 - <601 30 601 - <701 146 701 - <801 159 801 - <901 860 901 - <1001 1553 1001 - <1101 2568 1101 - <1201 6049 1201 - <1301 16 1301 - <1401 369 1401 - <1501 641 1501 - <1601 1109 1601 - <1701 2091 1701 - <1801 84 1801 - <1901 162 1901 - <2001 295 2001 - <2101 339 2101 - <2201 482 2201 - <2301 813 2301 - <2401 224 2401 - <2501 435 2501 - <2601 74 2601 - <2701 129 2701 - <2801 210 2801 - <2901 0 2901 - <3001 1 >3000 5
PCR product exon content (bp) histogram:
=0 447 1 - <101 15 101 - <201 280 201 - <301 822 301 - <401 1170 401 - <501 1278 501 - <601 1333 601 - <701 1421 701 - <801 5085 801 - <901 3402 901 - <1001 2068 1001 - <1101 1031 1101 - <1201 275 1201 - <1301 123 1301 - <1401 88 1401 - <1501 76 1501 - <1601 47 1601 - <1701 41 1701 - <1801 36 1801 - <1901 25 1901 - <2001 29 2001 - <2101 23 2101 - <2201 20 2201 - <2301 18 2301 - <2401 19 2401 - <2501 3 2501 - <2601 6 2601 - <2701 7 2701 - <2801 1 2801 - <2901 3 2901 - <3001 0 >3000 21
Simple sequence repeats up to 9mers were tracked.
PCR product repetitive content (bp) histogram:
=0 17726 1 - <101 571 101 - <201 343 201 - <301 136 301 - <401 97 401 - <501 62 501 - <601 73 601 - <701 15 701 - <801 11 801 - <901 29 901 - <1001 19 1001 - <1101 20 1101 - <1201 37 1201 - <1301 2 1301 - <1401 9 1401 - <1501 6 1501 - <1601 5 1601 - <1701 8 1701 - <1801 2 1801 - <1901 1 1901 - <2001 10 >2000 26