Dr. Touati Benoukraf

Decoding Uncharted Genomic Variations in Acute Myeloid Leukemia Using Long-read Sequencing Technology

Dr. Touati Benoukraf
CRC in Bioinformatics for Personalized Medicine
Faculty of Medicine

Date: October 3, 2022
Time: 1:00pm – 2:00pm.
Room: CSF 1302 

The advent of third-generation long-read sequencing has empowered the characterization of large genomic variations or structural variations (SVs) in humans, which were previously missed by second-generation short-read sequencing. However, due to the high error rate, high cost and low throughput of third-generation sequencing, its utilization in genetic studies or routine clinical investigations remain a challenge. This thesis presents a novel workflow to overcome these limitations to allow cost-efficient and comprehensive discovery of uncharted genomic variants in diseases, notably acute myeloid leukemia (AML). Specifically, we develop a long-read variant calling bioinformatics tool, NanoVar, which is proficient at characterizing SVs accurately at low sequencing depths, thereby reducing cost and labour. NanoVar demonstrates high variant calling sensitivity and precision (F1 > 0.92) amongst existing tools when benchmarked using simulated and real datasets of 4-8X depth.

Leveraging on NanoVar’s capabilities, we seek to identify novel variants in AML, a disease highly associated with large genomic abnormalities. We sequenced the bone marrow mononuclear cells of 11 Asian AML patients and seven healthy donors using the Oxford Nanopore MinION platform (7-11X) and called variants using NanoVar. Using our in-house variant integration and visualization tools, we identified 80 variants that are potentially associated with AML. These variants are present in most AML samples (≥8) but absent in most normal samples (≤1). Among these variants, we discover a retrotransposon (AluYb8) insertion event with a 13 bp target-site duplication located in intron 24 of the DCC gene. This variant is present in 10 out of 11 AML samples but absent in all normal samples. Further research is required to investigate the potential molecular impacts of this insertion. Overall, our work contributes to enhancing the discovery and analysis of large genomic variants by developing three open-source bioinformatics tools. Our AML variant database may serve as a platform for future research into new biomarkers or druggable targets in AML.