GATK operates through a series of steps that transform raw sequencing data into a set of high-confidence variants. The key steps include:
Data Preprocessing: This involves quality control, alignment to a reference genome, and marking duplicate reads. Variant Calling: Tools like HaplotypeCaller or Mutect2 are used to identify single nucleotide polymorphisms (SNPs) and insertions/deletions (indels). Variant Filtering: Applying filters to distinguish true variants from sequencing artifacts. Annotation: Adding functional information to the identified variants to understand their biological significance.