R, Bioconductor
filterVcf: Extract Variants of Interest from a Large VCF File (Paul Shannon)
We demonstrate three methods: filtering by genomic region, filtering on attributes of
each specific variant call, and intersecting with known regions of interest (exons, splice
sites, regulatory regions, etc.).
http://www.bioconductor.org/packages/release/bioc/vignettes/VariantAnnotation/inst/doc/filterVcf.pdf
Java
SelectVariants -- Select a subset of variants from a larger callset ( GATK SelectVariants )
Often, a VCF containing many samples and/or variants will need to be subset in order to facilitate certain analyses (e.g. comparing and contrasting cases vs. controls; extracting variant or non-variant loci that meet certain requirements, displaying just a few samples in a browser like IGV, etc.). SelectVariants can be used for this purpose.
Biostars
Question: How To Split Multiple Samples In Vcf File Generated By Gatk?
I did variant calling using BWA + PiCard + GATK and have just got the filtered VCF files from GATK. In the process of running GATK, I used list of inputs (11 samples) and for most steps, I had only one output file for each step. Now, I got two VCF files (one for SNPs and the other is for indels), each of which contains 11 samples. I can see the names of the 11 samples in the header of vcf files, and each sample seems to have one column of data. So I am wondering how to split each VCF files into individual sample vcf files?
https://www.biostars.org/p/78929/
bcftools
for file in *.vcf*; do
for sample in `bcftools view -h $file | grep "^#CHROM" | cut -f10-`; do
bcftools view -c1 -Oz -s $sample -o ${file/.vcf*/.$sample.vcf.gz} $file
done
done
https://www.biostars.org/p/12535/#115691
vcf-subset
vcf-subset -c S1 bigfile.vcf > S1.vcf
https://www.biostars.org/p/78929/
http://campagnelab.org/software/goby/reference-documentation/modes/vcf-subset/
REF:
http://samtools.github.io/hts-specs/VCFv4.2.pdf