Supplementary Material:

Single feature polymorphisms between two rice cultivars detected using a median polish method

Weibo Xie, Ying Chen, Gang Zhou, Lei Wang, Chengjun Zhang, Jianwei Zhang, Jinghua Xiao, Tong Zhu, Qifa Zhang*

Address: National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, 430070 Wuhan, China

* Correspondence: Qifa Zhang. Email: qifazh [at] mail.hzau.edu.cn

Abstract

Background

Expression levels measured in microarrays based on oligonucleotide probes have now been adapted as a high throughput approach for identifying DNA sequence variation between genotypes, referred to as single feature polymorphisms (SFPs). Although there have been increasing interests in this method, there is still need for improving the algorithm in order to achieve high sensitivity and specificity especially with complex genome and large datasets, while maintaining optimal computational performance. Moreover, it is generally regarded that sequence mismatch between the targets and probes within the probes on the chip reduces binding affinity, providing the basis of sequence polymorphisms for SFP detection. However, SFPs have been frequently detected between probes and targets with perfect matched sequences. Such observations, although merit detailed investigation, have frequently been ignored in the analyses.

Results

We adapted a median polish method to evaluate the contribution from probe-flanking SNPs in SFP detection from multiple transcriptome data. We showed that the median polish method has the advantage of avoiding fitting complex linear models thus can be used to analyze complex transcriptome datasets. The method is also superior in sensitivity, accuracy and computing time requirement using data from multiple species with different genome complexity compared with a previously used method. Using this method, we identified 6,655 SFPs between two rice varieties and 3,387 yeast SFPs from two yeast stains. 76% of rice SFPs and 89% of yeast SFPs detected from examined transcriptomes can be validated by the presence of SNPs in the probe regions. Further comparison in both rice and yeast genome revealed that SNPs in sequences immediately flanking the probes did contribute to the detection of SFPs in cases where the probes and the targets had perfectly matched sequences, as over 15% of such non-polymorphic SFPs were associated with flanking SNPs. It was shown that differences in minimum free energies caused by flanking SNPs, which may change the stability of RNA secondary structure, may partly explain the SFPs as detected.

Conclusion

The median polish method has superior performance in SFP detection regarding sensitivity and accuracy, and at the same time significantly reducing the computing time required. Polymorphisms in sequences immediately flanking the probes can frequently cause SFPs in microarray analysis, demonstrating possible influence of the probe flanking SNPs on comparative transcriptome analyses using oligonucleotide microarrays. The SFPs between the two rice cultivars representing the parents of the most widely cultivated rice hybrid may greatly facilitate gene discovery in future studies.

Additional files

R 2.5.0 scripts for SFP analysis:
1. Rice: data preprocess,SFP analysis;
2. Yeast: SFP analysis;
3. Barley: SFP analysis;
4. Common functions used to SFP analysis;
Microarray data:
1. Rice microarray raw data (36 .CEL files) and their denomination;
2. Yeast data from NCBI GEO (acc: GSE1975, reference: Ronald J, Akey JM, Whittle J, Smith EN, Yvert G, Kruglyak L: Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome Res 2005, 15(2):284-291);
3. Barley data from naturalvariation.org (reference: Rostoks N, Borevitz JO, Hedley PE, Russell J, Mudie S, Morris J, Cardle L, Marshall DF, Waugh R: Single-feature polymorphism discovery in the barley transcriptome. Genome Biol 2005, 6(6):R54);
Rice Zhenshan 97 & Minghui 63 SFP probes list (p.value, t and lfc were derived from limma results and pq.value was Benjamini & Hochberg (BH) adjusted p values);
Yeast RM11-1a (RM) & BY4716 (BY) SFP probes list;
Rice sequence confirmation table from OryzaSNP project. OryzaSNP data contain two parts, one is a SNP table for high quality SNP list and another is pseudo-sequences which contain all base calls from Perlegen resequencing. In the pseudo-sequences, bases that are displayed in uppercase are Perlegen 'high quality' base calls and bases that are displayed in lowercase are Perlegen 'low quality' base calls. Undetermined bases are shown as N's. To make the results more strict, we used just SNPs from OryzaSNP SNP table to determine polymorphic probes list. However, we considered that the pseudo-sequences can prove monomorphic probes list enough.The definition of columns of the polymorphic probes list is in here;
Yeast RM11-1a sequence confirmation table from NCBI Nucleotide database: polymorphism information in probe region, probe flanking polymorphism information. The definition of columns of probe flanking polymorphism information is in here;
Non-polymorphic SFP probes with SNPs in flanking 1-20 bases: rice, yeast;
More intermediate data can be found in data.