首页 > 其他分享 >gatk UnifiedGenotyper

gatk UnifiedGenotyper

时间:2023-10-10 11:01:59浏览次数:45  
标签:wd UnifiedGenotyper zhouzhy recalibration pig gatk home bam

使用 UnifiedGenotyper注意如下:

(1) 输入:
.recalibration.bam
(2)输入:
.recalibration.bai
(3)dbSNP: vcf
dbsnp,有头部;有与DNA一样的染色体顺序;有idx文件;

UnifiedGenotyper Unable to read index file, for input source: vcf.idx

Sorry for the delayed response. It turns out that this is a problem with OS-level file locking support in some environments. We ran into this at the Broad, which is why the devs added the check. There is a hidden argument called --disableAutoIndexCreationAndLockingWhenReadingRods that disables index auto-creation and related file locking when reading vcfs. If all index files are pre-existing, and no concurrent processes will ever update any of the indices, it should be safe to use this argument.

Ack, sorry @albertoap, I gave you the variable name instead of the argument name. Please try again using --disable_auto_index_creation_and_locking_when_reading_rods, that should do the trick.

I have a feeling I just need to import something else at the top of the .scala file, but I have no idea what it would be.

EDIT: Nevermind, realized I have to use the shortName, not the variable name when working with CommonArguments. So it's
this.disable_auto_index_creation_and_locking_when_reading_rods = true


 

Hi I found IGVTools, best for VCF indexing.

igvtools can be run from the command line or IGV itself (File>Run igvtools...)  After launching, choose the Index command and browse to your .vcf file. The index file (.idx) will be created in the same directory as the .vcf file.

 

 

java -Xmx50g -jar /share_bio/unisvx4/xiehb_kiz/zhouzhy/software/GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar \
  -R /share_bio/unisvx4/xiehb_kiz/zhouzhy/reference_genome/pig/Sus_scrofa.Sscrofa10.2.dna.toplevel.fa \
  --genotype_likelihoods_model BOTH \
  --num_threads 5 -T UnifiedGenotyper \
  -I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DKF1.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DKF2.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DKF3.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DKM1.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DKM2.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DKU1.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DKU2.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DNF1.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DNF2.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DNF3.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DNF4.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DNF5.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DNF6.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DNF7.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DNF8.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DNM1.recalibration.bam  \
-I /home/zhouzhy/wd/pig_imprinting/pig_clean_data/DNM2.recalibration.bam  \
   -mbq 20 \
  --dbsnp  /home/zhouzhy/wd/pighisat2/bwa_align/dbsnp/Sus_scrofa.vcf \
  -stand_call_conf 10 -stand_emit_conf 10  -L 11 \
  -o F0.gatk_chrCHROM_UnifiedGenotyper.vcf
   
MESSAGE: Unable to read index file, for input source: /home/zhouzhy/wd/pighisat2/bwa_align/dbsnp/Sus_scrofa.vcf.idx

  1. Compress your .vcf file using the bgzip program:  
bgzip my.vcf
  1.  For more information about the bgzip command, run it with no arguments to  display the usage message.
  2. Create a tabix index file for the bgzip-compressed VCF (.vcf.gz):  
tabix -p vcf my.vcf.gz


http://data.broadinstitute.org/igv/projects/downloads/igv, igvtools 是不同的。


 

##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determinedit type tag :NAME listing the correct type from among the supported types:

##### ERROR Name      FeatureType   Documentation

##### ERROR BCF2   VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_variant_bcf2_BCF2Codec.html

##### ERROR  VCF   VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_variant_vcf_VCFCodec.html

##### ERROR VCF3   VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_variant_vcf_VCF3Codec.html

##### ERROR ------------------------------------------------------------------------------------------



原因:
dbsnp文件缺少头部:
##fileformat=VCFv4.0
##source=dbSNP
##dbSNP_BUILD_ID=145
##reference=GCF_000003025.5
##variationPropertyDocumentationUrl=ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf
##INFO=<ID=RSPOS,Number=1,Type=Integer,Description="Chr position reported in dbSNP">
##INFO=<ID=RV,Number=0,Type=Flag,Description="RS orientation is reversed">
##INFO=<ID=VP,Number=1,Type=String,Description="Variation Property.  Documentation is at ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf">
##INFO=<ID=GENEINFO,Number=.,Type=String,Description="Pairs each of gene symbol:gene id.  The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
##INFO=<ID=dbSNPBuildID,Number=1,Type=Integer,Description="First dbSNP Build for RS">
##INFO=<ID=SAO,Number=1,Type=Integer,Description="Variant Allele Origin: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both">
##INFO=<ID=VC,Number=1,Type=String,Description="Variation Class">
##INFO=<ID=VLD,Number=0,Type=Flag,Description="Is Validated.  This bit is set if the variant has 2+ minor allele count based on frequency or genotype data.">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO

加上头部就可以了了。



 



标签:wd,UnifiedGenotyper,zhouzhy,recalibration,pig,gatk,home,bam
From: https://blog.51cto.com/emanlee/7789373

相关文章

  • gatk 实现基于染色体合并gvcf文件,并获取变异
     001、基于染色体合并gvcf文件gatkCombineGVCFs-Rreference.fna-Vgvcf.list-LchrN-OchrN.merged.g.vcf.gz其中:referen.fna是参考基因组;gvcf.list是将要合并的gvcf文件的列表文件,一行一个个体;格式如下:ERR2985607.g.vcfERR2985608.g.vcfERR2985609.g.vcfERR2......
  • gatk线程数对标记重复速度的影响
     001、[b20223040323@admin1test]$lsSRR1770413.sorted.bamSRR1770413.sorted.markdup_metrics.txtSRR1770413.sorted.markdup.bamstep4.slurm[b20223040323@admin1test]$timegatk--java-options"-Xmx100g-XX:ParallelGCThreads=1"MarkDu......
  • 生信: 一起读官方文档 GATK2.1版本 篇
    一起读官方文档GATK2.1版本篇参考文章:GATK使用:https://www.plob.org/article/7070.htmlGATK介绍GATK做什么的?它主要用于从sequencing数据中进行variantcalling,包括SNP、INDEL。比如现在风行的exomesequencing找variant,一般通过BWA+GATK的pipeline进行数据分析。BWA......
  • GATK最佳实践之数据预处理SnakeMake流程
    <生信交流与合作请关注公众~号@生信探索>写的数据预处理snakemake流程其实包括在每个单独的分析中比如种系遗传变异和肿瘤变异流程中,这里单独拿出来做演示用,因为数据预处理是通用的,在call变异之前需要处理好数据。数据预处理过程包括,从fastq文件去接头、比对到基因组、去除重复......
  • 01.GATK人种系变异最佳实践SnakeMake流程:WorkFlow简介
    <~生~信~交~流~与~合~作~请~关~注~公~众~号@生信探索>学习的第一个GATK找变异流程,人的种系变异的短序列变异,包括SNP和INDEL。写了一个SnakeMake分析流程,从fastq文件到最后的vep注释后的VCF文件,关于VCF的介绍可以参考上一篇推文基因序列变异信息VCF(VariantCallFormat)流程代......
  • gatk中的 GenomicsDBImport 模块
     官网:https://gatk.broadinstitute.org/hc/en-us/articles/5358869876891-GenomicsDBImport 001、一般用法,变异检测库gatk--java-options"-Xmx4g-Xms4g"Genomic......
  • gatk 实现对vcf文件的合并
     001、测试数据[root@PC1test]#ls##测试数据seg1_1.vcfseg1_2.vcfseg1_3.vcf[root@PC1test]#ll-htotal1.2G-rw-r--r--.1......
  • gatk 对多个样本的g.vcf文件进行合并、进行变异检测
     001、gatkCombineGVCFs-RGCF_000001735.4_TAIR10.1_genomic.fna--variantSRR21814498.g.vcf--variantSRR21814509.g.vcf--variantSRR21814514.g.vcf-Ocoho......
  • GATK源码解析(一)
    程序入口 org.broadinstitute.hellbender.Main类下的main函数publicstaticvoidmain(finalString[]args){newMain().mainEntry(args);}......