Non-random distribution of T-DNA insertions at various levels of the genome hierarchy as revealed by analyzing 13 804 T-DNA flanking sequences from an enhancer-trap mutant library
Download the document
Jian Zhang1,+1, Dong Guo1,+1, Yuxiao Chang1, Changjun You1, Xingwang Li1, Xiaoxia Dai1, Qijun Weng2, Jianwei Zhang1, Guoxing Chen1, Xianghua Li1, Huifang Liu1, Bin Han2, Qifa Zhang1, Changyin Wu1*
1National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan 430070, China, and
2National Center for Gene Research, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai 200233, China
We isolated 13 804 T-DNA flanking sequence tags (FSTs) from a T-DNA insertion library of rice. A comprehensive analysis of the 13 804 FSTs revealed a number of features demonstrating a highly non-random distribution of the T-DNA insertions in the rice genome: T-DNA insertions were biased towards large chromosomes, not only in the absolute number of insertions but also in the relative density; within chromosomes the insertions occurred more densely in the distal ends, and less densely in the centromeric regions; the distribution of the T-DNA insertions was highly correlated with that of full-length cDNAs, but the correlations were highly heterogeneous among the chromosomes; T-DNA insertions strongly disfavored transposable element (TE)-related sequences, but favored genic sequences with a strong bias toward the 5' upstream and 3' downstream regions of the genes; T-DNA insertions preferentially occurred among the various classes of functional genes, such that the numbers of insertions were in excess in certain functional categories but were deficient in other categories. The analysis of DNA sequence compositions around the T-DNA insertion sites also revealed several prominent features, including an elevated bendability from −200 to 200 bp relative to the insertion sites, an inverse relationship between the GC and TA skews, and reversed GC and TA skews in sequences upstream and downstream of the insertion sites, with both GC and TA skews equal to zero at the insertion sites. It was estimated that 365 380 insertions are needed to saturate the genome with P = 0.95, and that the 45 441 FSTs that have been isolated so far by various groups tagged 14 287 of the 42 653 non-TE related genes.