Preliminary primer name update

These are preliminary results! They need double checking, and some tuning of the filter criteria.

Description of the Kim lab chip primer set

1323 Primers were poor and were re-made
1323 Re-made poor primers
17890 Primers made once
---------
20536 total primer pairs

During the following dicussion, I break up the primers into 2 lists:

1) Poor primers which were re-made (remade)
2) Primers made once and the re-made primers (all_primers)


Plan to update names

1) Download AceDB. The feature file (.gff) for each chromosome contains the bp positions of each exon and repetitive seqeunce features.

2) Compare the Kim chip primers with the chromosome sequence. Find where the primers match the genomic sequence. Assemble primer matches into PCR products with their location in the genomic sequence.

3) Find what AceDB features (exons and repetitive features) overlap the PCR products.

4) Filter and summarize the results.


Decisions to be made in the analysis

1) PCR size cutoff? Currently use 3000bp.

2) How large does the overlap between a PCR product and gene need to be to be counted? Currently use 150 bp.

3) Where does the PCR product match the gene? PCR products containing the 5' ends of genes may not hyb, as teh 5' ends of long genes don't get labelled. Currently ignore.

4) Repetitive sequence filtering. Currently ignore.


Updated gene names

Update all_primers file

Update remade file


Overall summary

Which PCR products changed names?

all_primers

Changed 3780
Same 13702
None 1071
No PCR product 660

remade

Changed 8
Same 39
None 6
No PCR product 1270

Compare the full set of genes in AceDB and in the list of what the Kim primer set amplifies.

Kim primers amplify genes 16621 (81%)
AceDB genes 20463
AceDB genes not amplified 3842 (19%)

If we discard PCR products that contain more than 1 gene, and ask for the full set of genes in AceDB and in the list of what the Kim primer set amplifies again:

Kim primers amplify single genes 15470 (75%)
AceDB genes 20463
AceDB genes not amplified 4993 (24%)


Summary histograms

PCR product size histogram:


=0              0
1 - <101        1
101 - <201      2
201 - <301      3
301 - <401      14
401 - <501      25
501 - <601      26
601 - <701      141
701 - <801      146
801 - <901      821
901 - <1001     1485
1001 - <1101    2522
1101 - <1201    5928
1201 - <1301    24
1301 - <1401    370
1401 - <1501    642
1501 - <1601    1087
1601 - <1701    2078
1701 - <1801    84
1801 - <1901    161
1901 - <2001    292
2001 - <2101    333
2101 - <2201    482
2201 - <2301    808
2301 - <2401    219
2401 - <2501    430
2501 - <2601    77
2601 - <2701    129
2701 - <2801    213
2801 - <2901    6
2901 - <3001    4
>3000   0

PCR product exon content (bp) histogram:


<1      1071
1 - <101        0
101 - <201      316
201 - <301      941
301 - <401      1320
401 - <501      1445
501 - <601      1400
601 - <701      1390
701 - <801      4375
801 - <901      2934
901 - <1001     1774
1001 - <1101    877
1101 - <1201    223
1201 - <1301    111
1301 - <1401    76
1401 - <1501    65
1501 - <1601    40
1601 - <1701    30
1701 - <1801    35
1801 - <1901    17
1901 - <2001    22
2001 - <2101    17
2101 - <2201    14
2201 - <2301    9
2301 - <2401    14
2401 - <2501    2
2501 - <2601    5
2601 - <2701    5
2701 - <2801    2
2801 - <2901    2
2901 - <3001    1
>3000   20

Simple sequence repeats up to 9mers were tracked.

PCR product repetitive content (bp) histogram:


<1      17255
1 - <101        586
101 - <201      358
201 - <301      122
301 - <401      89
401 - <501      60
501 - <601      56
601 - <701      6
701 - <801      8
801 - <901      4
901 - <1001     2
>1000   6


Jim Lund
Beckman Center, B365
279 Campus Dr.
Stanford University
Stanford, CA 94305
Phone: (650) 723-5996
FAX: (650) 725-7739
E-Mail: jiml@stanford.edu