8-3-01 update! Added genes found on 3 cosmid which aren't assembled on a chromosome.
Changes:
The first version of the the update
During the following dicussion, I break up the primers into 2 lists:
1) Poor primers (remade)
2) Primers made once and the re-made primers (all_primers)
2) Compare the Kim chip primers with the chromosome sequence. Find where the primers match the genomic sequence. Assemble primer matches into PCR products with their location in the genomic sequence.
3) Find what AceDB features (exons and repetitive features) overlap the PCR products.
4) Filter and summarize the results.
2) How large does the overlap between a PCR product and gene need to be to be counted? Currently use 150 bp.
3) Where does the PCR product match the gene? PCR products containing the 5' ends of genes may not hyb, as the 5' ends of long genes don't get labelled. Currently ignore.
4) Repetitive sequence filtering. Currently ignore.
5) Some genes amplify multiple pcr products. This is information is now included. Additional PCR products are included up to 1.5 times the size of the smallest predicted product.
NEW!
8-3-01 update all_primers file
8-3-01 good PCR product, unpredicted (all_primers file)
8-2-01 good PCR product, unpredicted (remade file)
all_primers
Changed 3866
Same 14132
None 1132
No predicted PCR product 76
remade
Changed 163
Same 1107
None 50
No predicted PCR product 3
Compare the full set of genes in AceDB and in the list of what the Kim primer set amplifies. If we discard PCR products that contain more than 1 gene, and ask for the full set of genes in AceDB and in the list of what the Kim primer set amplifies again:
Kim primers amplify single genes 16049 (81%)
AceDB genes 19733
AceDB genes not amplified 3692 (19%)
Now use stricter criteria. Require that the 2000_pcr result be good, faint, or not_run_on_a_gel in addition to the above criteria.
Kim primers amplify single genes 14982 (76%)
AceDB genes 19733
AceDB genes not amplified 4758 (24%)
Number of genes matching the PCR products
0 1127 1 17089 2 774 3 43 4 14 5 10 6 10 7 9 9 2 10 1 11 3 17 9 23 6 25 6 26 4 51 1 52 1 53 2 55 1
Number of PCR products per primer pair
=0 95 1 18574 2 382 3 50 4 19 5 20 6 16 7 6 8 5 9 1 10 2 >10 132
PCR product size histogram:
=0 0 1 - <101 3 101 - <201 7 201 - <301 6 301 - <401 21 401 - <501 34 501 - <601 37 601 - <701 158 701 - <801 163 801 - <901 879 901 - <1001 1568 1001 - <1101 2588 1101 - <1201 6084 1201 - <1301 18 1301 - <1401 379 1401 - <1501 649 1501 - <1601 1108 1601 - <1701 2099 1701 - <1801 86 1801 - <1901 163 1901 - <2001 306 2001 - <2101 342 2101 - <2201 486 2201 - <2301 823 2301 - <2401 231 2401 - <2501 446 2501 - <2601 74 2601 - <2701 129 2701 - <2801 216 2801 - <2901 0 2901 - <3001 1 >3000 6
PCR product exon content (bp) histogram:
=0 1164 1 - <101 0 101 - <201 552 201 - <301 1318 301 - <401 1578 401 - <501 1608 501 - <601 1506 601 - <701 1481 701 - <801 4534 801 - <901 3058 901 - <1001 1871 1001 - <1101 925 1101 - <1201 243 1201 - <1301 121 1301 - <1401 81 1401 - <1501 69 1501 - <1601 47 1601 - <1701 30 1701 - <1801 41 1801 - <1901 20 1901 - <2001 24 2001 - <2101 17 2101 - <2201 15 2201 - <2301 10 2301 - <2401 15 2401 - <2501 2 2501 - <2601 5 2601 - <2701 6 2701 - <2801 1 2801 - <2901 2 2901 - <3001 1 >3000 20
Simple sequence repeats up to 9mers were tracked.
PCR product repetitive content (bp) histogram:
=0 17777 1 - <101 601 101 - <201 365 201 - <301 124 301 - <401 92 401 - <501 64 501 - <601 58 601 - <701 6 701 - <801 9 801 - <901 6 901 - <1001 2 1001 - <1101 1 1101 - <1201 3 1201 - <1301 0 1301 - <1401 1 1401 - <1501 0 1501 - <1601 0 1601 - <1701 0 1701 - <1801 1 1801 - <1901 0 1901 - <2001 1 >2000 1