用法如下:
gatk GenomicsDBImport \ -V data/gvcfs/mother.g.vcf \ -V data/gvcfs/father.g.vcf \ -V data/gvcfs/son.g.vcf \ --genomicsdb-workspace-path my_database \ --intervals chr20,chr21--intervals 参数是指定的一个区间或者整条染色体 The syntax for using -L is as follows; it applies equally to -XL:
-L chr20 for contig chr20.
-L chr20:1-100 for contig chr20, positions 1-100.-L intervals.list (or intervals.interval_list, or intervals.bed) when specifying a text file containing intervals (see supported formats below).-L variants.vcf when specifying a VCF file containing variant records; their genomic coordinates will be used as intervals.如果是list文件,是从1开始计数
chr1:1-248956422 chr2:1-242193529 chr3:1-198295559 chr4:1-190214555 chr5:1-181538259 chr6:1-170805979如果是bed文件,是从0开始计数,因此需要将1开始的list减去1
chr1 0 248956421 chr2 0 242193528 chr3 0 198295558 chr4 0 190214554使用过程中发现,最好是少于100条染色体,不然可能会变得很慢
gatk GenotypeGVCFs \ -R data/ref/ref.fasta \ -V gendb://my_database \ -newQual \ -O test_output.vcf gatk SelectVariants \ -R data/ref/ref.fasta \ -V gendb://my_database \ -O combined.g.vcf 需要注意的是gatk3的CombineGVCFs是很快的,但是在输入gatk4得到的gvcf结果文件,然后用gatk3进行合并时,会有很多warning的信息gatk4的GenotypeGVCFs只支持输入一个gvcf文件了<wiz_tmp_tag id="wiz-table-range-border" contenteditable="false" style="display: none;">
转载于:https://www.cnblogs.com/raisok/p/11282190.html
相关资源:GATK HaplotypeCaller SNP Calling 自动化流程