During the following dicussion, I break up the primers into 2 lists:
1) Poor primers which were re-made (remade)
2) Primers made once and the re-made primers (all_primers)
2) Compare the Kim chip primers with the chromosome sequence. Find where the primers match the genomic sequence. Assemble primer matches into PCR products with their location in the genomic sequence.
3) Find what AceDB features (exons and repetitive features) overlap the PCR products.
4) Filter and summarize the results.
2) How large does the overlap between a PCR product and gene need to be to be counted? Currently use 150 bp.
3) Where does the PCR product match the gene? PCR products containing the 5' ends of genes may not hyb, as teh 5' ends of long genes don't get labelled. Currently ignore.
4) Repetitive sequence filtering. Currently ignore.
all_primers
Changed 3780
Same 13702
None 1071
No PCR product 660
remade
Changed 8
Same 39
None 6
No PCR product 1270
Compare the full set of genes in AceDB and in the list of what the Kim primer set amplifies.
Kim primers amplify genes 16621 (81%)
AceDB genes 20463
AceDB genes not amplified 3842 (19%)
If we discard PCR products that contain more than 1 gene, and ask for the full set of genes in AceDB and in the list of what the Kim primer set amplifies again:
Kim primers amplify single genes 15470 (75%)
AceDB genes 20463
AceDB genes not amplified 4993 (24%)
PCR product size histogram:
=0 0 1 - <101 1 101 - <201 2 201 - <301 3 301 - <401 14 401 - <501 25 501 - <601 26 601 - <701 141 701 - <801 146 801 - <901 821 901 - <1001 1485 1001 - <1101 2522 1101 - <1201 5928 1201 - <1301 24 1301 - <1401 370 1401 - <1501 642 1501 - <1601 1087 1601 - <1701 2078 1701 - <1801 84 1801 - <1901 161 1901 - <2001 292 2001 - <2101 333 2101 - <2201 482 2201 - <2301 808 2301 - <2401 219 2401 - <2501 430 2501 - <2601 77 2601 - <2701 129 2701 - <2801 213 2801 - <2901 6 2901 - <3001 4 >3000 0
PCR product exon content (bp) histogram:
<1 1071 1 - <101 0 101 - <201 316 201 - <301 941 301 - <401 1320 401 - <501 1445 501 - <601 1400 601 - <701 1390 701 - <801 4375 801 - <901 2934 901 - <1001 1774 1001 - <1101 877 1101 - <1201 223 1201 - <1301 111 1301 - <1401 76 1401 - <1501 65 1501 - <1601 40 1601 - <1701 30 1701 - <1801 35 1801 - <1901 17 1901 - <2001 22 2001 - <2101 17 2101 - <2201 14 2201 - <2301 9 2301 - <2401 14 2401 - <2501 2 2501 - <2601 5 2601 - <2701 5 2701 - <2801 2 2801 - <2901 2 2901 - <3001 1 >3000 20
Simple sequence repeats up to 9mers were tracked.
PCR product repetitive content (bp) histogram:
<1 17255 1 - <101 586 101 - <201 358 201 - <301 122 301 - <401 89 401 - <501 60 501 - <601 56 601 - <701 6 701 - <801 8 801 - <901 4 901 - <1001 2 >1000 6