Illumina (HiSeq) Exome Variant Detection Pipeline

The data processing pipeline for detecting variants in Illumina HiSeq data is as follows. First the FASTQ files are processed to remove any adapter sequences at the end of the reads using cutadapt (v1.6). The files are then mapped using the BWA mapper (bwa mem v0.7.12 ). After mapping the SAM files are sorted and read group tags are added using the PICARD tools. After sorting in coordinate order the BAM’s are processed with PICARD MarkDuplicates. The marked BAM files are then processed using the GATK toolkit (v 3.2) according the best practices for tumor normal pairs. They are first realigned using ABRA (v 0.92) and then the base quality values are recalibrated with the BaseQRecalibrator. Somatic variants are then called in the processed BAMs using muTect (v1.1.7) for SNV and the Haplotype caller from GATK with a custom post-processing script to call somatic indels. The full pipeline is available here https://github.com/soccin/BIC-variants_pipeline and the post processing code is at https://github.com/soccin/Variant-PostProcess