目前对peak的数据处理上,发现测序深度对peak的数量有很大影响,即使做了normalization也没办法,所以这里希望从原始的bam文件开始做downsampling。
参考一:Downsample BAM file to specific amount of reads
input_dir=/home/da528/ATAC_Analysis/cut_run/zhixin_analysis/processed_bam/Exp1/ output_dir=/home/da528/ATAC_Analysis/cut_run/zhixin_analysis/downsampling_bam/Exp1 cd $input_dir for bam in `ls *.bam` do echo $bam reads=5000000 fraction=$(samtools idxstats $bam | cut -f3 | awk -v ct=$reads 'BEGIN {total=0} {total += $1} END {print ct/total}') samtools view -b -s ${fraction} $bam > $output_dir/$bam done
以上脚本的不完善的地方就是当测序read低于5000000,它还是执行downsampling,这不是我们想看到的。
input_dir=/home/da528/ATAC_Analysis/cut_run/zhixin_analysis/processed_bam/Exp1/ output_dir=/home/da528/ATAC_Analysis/cut_run/zhixin_analysis/downsampling_bam2/Exp1 cd $input_dir for bam in `ls *.bam` do echo $bam reads=5000000 # fraction=$(samtools idxstats $bam | cut -f3 | awk -v ct=$reads 'BEGIN {total=0} {total += $1} END {print ct/total}') frac=$( samtools idxstats $bam | cut -f3 | awk -v ct=$reads 'BEGIN {total=0} {total += $1} END {frac=ct/total; if (frac > 1) {print 0.99} else {print frac}}' ) echo $frac samtools view -b -s ${frac} $bam > $output_dir/$bam done
samtools view: Incorrect sampling argument "1"
标签:samtools,bam,Downsampling,测序,reads,cut,file,total,dir From: https://www.cnblogs.com/leezx/p/16949289.html