• DRAGEN
  • News
  • 03/14/2026

Scalable Mosaic Variant Detection at Low Allele Fractions with DRAGEN

Low-VAF Mosaic Variant Calling, Solved: DRAGEN Delivers Genome-Wide Detection at Scale Without Matched Controls.

Detecting mosaic variants below 5% variant allele fraction (VAF) at genome scale has remained largely impractical, until now. A recent medRxiv preprint1, introduces the DRAGEN mosaic variant caller, a hardware‑accelerated approach that enables sensitive, genome‑wide detection of mosaic single nucleotide variants (SNV) and indels down to ~1-2% VAF, without requiring matched controls. This capability opens the door to population‑scale and tissue‑resolved mosaic analyses that were previously out of reach.

Unlike cancer-associated somatic mutations, mosaic variants often arise post‑zygotically in otherwise healthy tissues, resulting in variants present in only a small fraction of sequencing reads. Because these variants can clonally expand over time, they have been implicated in a wide range of conditions, including cancer initiation, neuropsychiatric disorders, and age‑related pathologies, often emerging years or even decades before clinical symptoms appear.

At VAFs below 5%, true biological signal is easily obscured by technical noise, alignment artifacts, and sequencing errors. Progress has been further constrained by the lack of standardized benchmarks and the computational burden of scaling analyses across large datasets. As a result, low‑VAF mosaic variation has largely escaped systematic study, despite its growing biological relevance across health, aging, and disease. 

The preprint Scalable and comprehensive mosaic variant calling using DRAGEN directly addresses these challenges by introducing a low‑VAF–optimized mosaic variant caller within the DRAGEN framework, alongside a scalable, publicly available genome‑wide benchmark for evaluating mosaic detection across the 1–10% VAF range.

Detecting true low‑VAF signal in noisy bulk data

Low allelic fraction makes mosaic variants particularly difficult to detect. Variants at 1–5% VAF fall within the error regime of short‑read sequencing, where systematic noise and alignment artifacts can obscure true biological signal. Most small‑variant callers are optimized for germline variation, where allele fractions cluster near 50% or 100%, and are not designed to reliably resolve low‑frequency events. High‑coverage bulk sequencing is more scalable and broadly applicable than single‑cell sequencing for mosaic discovery, but existing methods rely on trade-offs such as ultra‑deep coverage, matched controls, or aggressive post‑filtering that make low‑VAF mosaic detection slow and difficult to scale.

To address these challenges, the preprint introduces mosaic variant detection as an extension of the DRAGEN unified analysis framework. Central to the approach is a dedicated machine‑learning model optimized for low‑VAF signal and integrated directly into DRAGEN’s hardware‑accelerated pipeline. Across multiple evaluations, the method demonstrates strong sensitivity for mosaic SNVs and indels at approximately 2–3% VAF, with partial sensitivity extending toward 1% given sufficient coverage, while maintaining low false‑positive rates without requiring matched controls.

In direct comparisons with established mosaic and somatic variant callers including DeepMosaic, MosaicForecast, and DeepSomatic, the DRAGEN mosaic variant caller consistently identified more true mosaic variants while reporting fewer false positives across the low‑VAF regime. Using genome‑wide benchmarks spanning approximately 1–10% VAF, DRAGEN achieved the strongest balance between sensitivity and precision, avoiding the common trade‑off in which increased sensitivity is accompanied by a disproportionate rise in spurious calls.

Beyond synthetic and reference datasets, the study applies the DRAGEN mosaic caller to real biological samples, including blood, brain, and sperm, demonstrating applicability across diverse tissue contexts.

Addressing evaluation gaps and genomic complexity in low‑VAF mosaic detection

Progress in low‑VAF mosaic variant detection has also been limited by the lack of comprehensive genome‑wide benchmarks. While germline variant calling has benefited from well-established reference datasets, mosaic benchmarking resources remain sparse and limited in scope. Existing resources capture only partial aspects of mosaic variation: the Yonsei University College of Medicine (YUCM) benchmark is restricted to deeply sequenced exomes and exclude complex genomic regions, while Genome in a Bottle HG002 is genome‑wide but report only a small number of SNVs at VAFs ≥5%, omitting indels and lower‑frequency variants.

To address this gap, the preprint introduces a scalable benchmark derived from mixtures of well‑characterized reference samples, spanning variants from approximately 1–10% VAF across the genome. This HapMap low‑VAF benchmark provides the first genome‑wide resource for evaluating variant callers in a VAF range that has remained largely inaccessible due to the absence of orthogonal truth sets. By bridging germline and mosaic analyses, it enables standardized, quantitative comparison of low‑frequency variant detection across methods and technologies.

In parallel, the work introduces personalized assembly pangenome references (PAPR) as a complementary advance aimed at improving accuracy in the same challenging genomic contexts highlighted by these benchmarks. By incorporating phased diploid assemblies into DRAGEN’s pangenome framework, this approach improves variant calling in structurally complex and polymorphic regions that are poorly represented by a single linear reference, while preserving GRCh38 coordinates for downstream compatibility. Benchmarking demonstrates consistent gains in germline precision and recall, and application to real tissues helps clarify tissue‑specific mosaic patterns while distinguishing in vivo mosaic variation from cell‑culture artifacts.

Looking ahead: from background noise to biological signal

Together, these advances mark a turning point in mosaic variant analysis, establishing DRAGEN as a powerful platform for making low‑VAF mosaic variation measurable at scale. By combining low‑VAF‑optimized modeling, realistic benchmarking, hardware‑accelerated performance, and pangenome‑aware alignment, DRAGEN enables population‑scale and tissue‑resolved mosaic studies that were previously impractical.

As sequencing datasets continue to grow, the ability to routinely interrogate low‑VAF mosaic variation will enable deeper insights into clonal dynamics, tissue heterogeneity, and early molecular signatures of disease. Mosaic variation is no longer an edge case; it is a fundamental layer of genome biology that can now be studied systematically across health, aging, and disease.

For those interested in reading the technical specifics, please read the full preprint on medRxiv.

 

M-GL-04186

 

References

  1. Behera, et al. (2026). "Scalable and comprehensive mosaic variant calling using DRAGEN." medRxiv. DOI: 10.64898/2026.02.03.26345450