紧接上文, 质控去除宿主(土壤样本不需要去宿主)后下一步对样本序列进行组装。
1、组装工具
宏基因组学中常用序列组装工具不少,如SOAPdenovo2、megagit,spades、metaSPAdes、MOCAT2、IDBA-UD等各有优劣,下面两个软件是分析过程中比较常用的。
- spades:https://github.com/ablab/spades
- megahit:https://github.com/voutcn/megahit
原文中用的是megahit组装,kmer设置21到101每20为间隔且最短长度要达到1k长,这里我们两个软件都是试下,看哪个结果好点。
megahit组装参数:
magahit -1 sample_paired_1.fastq
-2 sample_paired_2.fastq
--k-list 21,41,61,81,101
--min-contig-len 1000
-o res_megahit
spades组装参数:
spades.py -1 sample_paired_1.fastq
-2 sample_paired_2.fastq
--meta
-k 21,41,61,81,101
-o res_spades
spades没有最短序列的过滤参数,我也懒得再去筛选了哈哈,最终结果用quast统计下。
megahit结果:
Statistics without reference final.contigs
# contigs 378
# contigs (>= 0 bp) 378
# contigs (>= 1000 bp) 378
# contigs (>= 5000 bp) 60
# contigs (>= 10000 bp) 27
# contigs (>= 25000 bp) 7
# contigs (>= 50000 bp) 2
Largest contig 94957
Total length 1479335
Total length (>= 0 bp) 1479335
Total length (>= 1000 bp) 1479335
Total length (>= 5000 bp) 905258
Total length (>= 10000 bp) 679542
Total length (>= 25000 bp) 359522
Total length (>= 50000 bp) 180430
N50 8289
N90 1360
auN 21037
L50 34
L90 250
GC (%) 48.31
Mismatches
# N's per 100 kbp 0
# N's 0
spades结果:
Statistics without reference scaffolds
# contigs 1122
# contigs (>= 0 bp) 4780
# contigs (>= 1000 bp) 402
# contigs (>= 5000 bp) 59
# contigs (>= 10000 bp) 28
# contigs (>= 25000 bp) 8
# contigs (>= 50000 bp) 2
Largest contig 94957
Total length 2015934
Total length (>= 0 bp) 3191156
Total length (>= 1000 bp) 1532089
Total length (>= 5000 bp) 897593
Total length (>= 10000 bp) 683582
Total length (>= 25000 bp) 382605
Total length (>= 50000 bp) 180502
N50 3521
N90 633
auN 15526
L50 86
L90 763
GC (%) 49.69
Mismatches
# N's per 100 kbp 26.29
# N's 530
标签:contigs,样本,spades,组装,基因组,length,bp,Total
From: https://www.cnblogs.com/mmtinfo/p/18355061