[an error occurred while processing this directive]

Plant Diversity ›› 2012, Vol. 34 ›› Issue (5): 487-501.DOI: 10.3724/SP.J.1143.2012.12084

• 研究论文 • 上一篇    下一篇

基于Illumina RNASeq短序列的转录组从头组装软件比较与优化

赵磊,Zachary LARSONRABIN,陈斯云,郭振华   

  1. 中国科学院昆明植物研究所中国西南野生生物种质资源库,云南 昆明650201
  • 收稿日期:2012-06-01 出版日期:2012-10-25 发布日期:2012-06-19
  • 基金资助:

    The National Natural Science Foundation of China (30990244), the Knowledge Innovation Project of the Chinese Academy of Sciences (KSCX2YWN067), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry and the Young Academic and Technical Leader Raising Foundation of Yunnan Province (2008PY065), a Chinese Academy of Sciences Young International Scientist Fellowship (awarded to Zachary LarsonRabin), Yunnan Provincial Government through an innovation team program

Comparing De Novo Transcriptome Assemblers Using Illumina RNASeq Reads

 ZhAO  Lei, Zachary  LARSONRABIN, CHEN  Si-Yun, GUO  Zhen-Hua   

  1. Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany,
    Chinese Academy of Sciences, Kunming  650201, China
  • Received:2012-06-01 Online:2012-10-25 Published:2012-06-19

摘要:

利用公共数据库中果蝇F1代和栽培水稻基于高通量Illumina测序平台的RNASeq短序列数据,比较了8个 (ABySS, Velvet, SOAPdenovo, Oases, Trinity, Multiplek, TIDBA and TransABySS) 转录组从头组装软件。结果显示,在基于单一kmer和多重kmer方法的两类软件中,Trinity和TransABySS分别表现出最好的组装性能,而其它软件性能比较接近。我们还发现基于多重kmer比单一kmer可以组装获得更多的总碱基数目,但是即使利用最好的多重kmer组装软件,所获得的数据质量也比研究人员所期望的要低。鉴于此,我们提出了“ETM”优化方法,将多重kmer方法组合到Trinity中,使其在具有最好的组装性能的基础上兼具了多重kmer的优势,测试结果显示了该方法具有一定的优越性。我们的研究结果为用户选择合适的软件提供了依据,对推动基于高通量Illumina测序的转录组研究具有重要意义。

关键词: 高通量, 二代测序, 转录组, 从头组装, 优化

Abstract:

In this study, we carried out a systematic comparison of the de novo transcriptome assembly performance of eight assemblers (ABySS, Velvet, SOAPdenovo, Oases, Trinity, Multiplek, TIDBA and TransABySS), processing Illumina RNASeq reads from F1 hybrids (Drosophila MS) of Drosophila melanogaster and Drosophila sechellia and cultivated rice. Our study showed that Trinity and TransABySS were the most effective for producing transcriptomes from our trial datasets using single kmer and multiple kmer methods, respectively, although the performance levels of the other tested assemblers were comparable. We found that using single kmer assemblers generally produced fewer total numbers of bases than multiple kmer assemblers, although even the best assembler’s results showed lower quality than some researchers may desire. Therefore, we developed and tested a novel de novo transcriptome assembly method, ETM, which employs a combination of multiple kmer tools with Trinity assembler. The ETM method yielded superior results from our trial datasets. Our results will assist the growing number of transcriptome projects using Illumina RNASeq reads and provide guidelines for choosing appropriate software.

Key words: Highthroughput, NGS, Transcriptome, de novo assembly, Optimized

中图分类号: