Primer name update

4-10-02 update

File containing updated gene names for the primer pairs: pcr.match.4-25-02.plus_gels.txt

Primer pair sequences: Primers.all.8-05.txt

Changes:

The previous version of the the update
The third version of the the update
The second version of the the update
The first version of the the update

Description of the Kim lab chip primer set

1323 Primers were poor and were re-made
1323 Re-made poor primers
17887 Primers made once
---------
20533 total primer pairs

During the following dicussion, I break up the primers into 2 lists:

1) Poor primers (remade)
2) Primers made once and the re-made primers (all_primers)


Approach to updating the names

1) Download AceDB. The feature file (.gff) for each chromosome contains the bp positions of each exon and repetitive seqeunce features.

2) Compare the Kim chip primers with the chromosome sequence. Find where the primers match the genomic sequence. Assemble primer matches into PCR products with their location in the genomic sequence.

3) Find what AceDB features (exons and repetitive features) overlap the PCR products.

4) Filter and summarize the results.

5) Correct for some irregularities.


Decisions to be made in the analysis

1) 5 kb cutoff used.

2) 50 bp overlap between a PCR product and gene cutoff.

4) Where the PCR product matches in the gene is not taken into account.

4) Repetitive sequence filtering is not done. See histogram below.

5) Some genes amplify multiple pcr products. Additional PCR products are included up to 1.5 times the size of the smallest predicted product.


Updated gene names

Updated!

4-20-02 update all_primers file

4-10-02 update remade file


Primer pairs which change names

Primers pairs which match multiple genes (648 primer pairs)
Primers pairs which match multiple genes, version 1
119 of these genes match one gene, but also match tRNA(s).

Primers pairs which no longer match a gene (67 primer pairs)

Primers pairs for which the updated name is new (not found in the original primer name list), (1448 primer pairs)

Primers pairs for which the updated name is the same as a different primer pair in the original primer name list, (347 primer pairs)


Overall summary

Which PCR products changed names?

all_primers

Changed 3125
Same 15803
None 203
No predicted PCR product 80

remade

Changed 114
Same 1204
None 4
No predicted PCR product 3

Overall assessment of the primer pair set predicted names:

all_primers

Good prediction 17847
Multiple genes 904
Not a gene 352
No predicted PCR product 80

remade

Good prediction 1235
Multiple genes 82
Not a gene 4
No predicted PCR product 3

Overall assessment of the primer pair set predicted taking into consideration whether the PCR worked:

all_primers

Good array spot 16543

remade

Good array spot 932

Compare the full set of genes in AceDB and in the list of what the Kim primer set amplifies. If we discard PCR products that contain more than 1 gene, and ask for the full set of genes in AceDB and in the list of what the Kim primer set amplifies again:

Kim primers amplify single genes: 19109 (93% of primer pairs)
Acedb genes: 19733
Acedb genes amplified by Kim primers: 17175 (86% of Acedb genes)
Acedb genes not amplified: 2675 (13.5% of Acedb genes)
(a few dozen gene names don't match).

Now use stricter criteria. Require that the 2000_pcr result be good, faint, or not_run_on_a_gel in addition to the above criteria.

Kim primers, good PCR amplifies single gene: 17475 (85% of primer pairs)
Acedb genes: 19733
Acedb genes amplified by good PCR, single gene: 16107 (82% of Acedb genes)
Acedb genes not amplified 3731 (19% of Acedb genes)
(a few dozen gene names don't match).


Summary histograms

Number of genes matching the PCR products

=0	434
1	17478
2	1100
3	107
4	23
5	13
6	11
7	7
8	4
9	1
10	0
>10	33

Number of PCR products per primer pair

=0	200
1	18412
2	377
3	45
4	15
5	17
6	16
7	6
8	5
9	1
10	2
>10	117

PCR product size histogram:

=0	307
1 - <101	3
101 - <201	7
201 - <301	4
301 - <401	17
401 - <501	29
501 - <601	30
601 - <701	146
701 - <801	159
801 - <901	860
901 - <1001	1553
1001 - <1101	2568
1101 - <1201	6049
1201 - <1301	16
1301 - <1401	369
1401 - <1501	641
1501 - <1601	1109
1601 - <1701	2091
1701 - <1801	84
1801 - <1901	162
1901 - <2001	295
2001 - <2101	339
2101 - <2201	482
2201 - <2301	813
2301 - <2401	224
2401 - <2501	435
2501 - <2601	74
2601 - <2701	129
2701 - <2801	210
2801 - <2901	0
2901 - <3001	1
>3000	5

PCR product exon content (bp) histogram:

=0	447
1 - <101	15
101 - <201	280
201 - <301	822
301 - <401	1170
401 - <501	1278
501 - <601	1333
601 - <701	1421
701 - <801	5085
801 - <901	3402
901 - <1001	2068
1001 - <1101	1031
1101 - <1201	275
1201 - <1301	123
1301 - <1401	88
1401 - <1501	76
1501 - <1601	47
1601 - <1701	41
1701 - <1801	36
1801 - <1901	25
1901 - <2001	29
2001 - <2101	23
2101 - <2201	20
2201 - <2301	18
2301 - <2401	19
2401 - <2501	3
2501 - <2601	6
2601 - <2701	7
2701 - <2801	1
2801 - <2901	3
2901 - <3001	0
>3000	21

Simple sequence repeats up to 9mers were tracked.

PCR product repetitive content (bp) histogram:

=0	17726
1 - <101	571
101 - <201	343
201 - <301	136
301 - <401	97
401 - <501	62
501 - <601	73
601 - <701	15
701 - <801	11
801 - <901	29
901 - <1001	19
1001 - <1101	20
1101 - <1201	37
1201 - <1301	2
1301 - <1401	9
1401 - <1501	6
1501 - <1601	5
1601 - <1701	8
1701 - <1801	2
1801 - <1901	1
1901 - <2001	10
>2000	26