Event

QLS Seminar Series - Alex Diaz-Papkovich

Tuesday, February 6, 2024 12:00to13:00

Topological analysis of high-dimensional human genetic data in biobanks

Alex Diaz-Papkovich, Brown University 
Tuesday February 6, 12-1pm
Zoom Linkhttps://mcgill.zoom.us/j/86855481591
In Person: 550 Sherbrooke, Room 189

Abstract: Now storing the genetic data of millions of individuals, biobanks have become rich repositories regularly used for scientific study and discovery. With the human genome spanning some three billion base pairs, any statistical analysis of a biobank is inherently a high-dimensional problem. To say nothing of the complexity of human genetics, we encounter challenges in both the scale of the data and in their composition.

We develop a tractable approach to study biobanks using uniform manifold approximation and projection (UMAP), a form of non-linear dimensionality reduction based in topological data analysis, and HDBSCAN, a density-based clustering algorithm. Using these tools, we visualize the data contained in biobanks and illustrate the relationships between population structure—the phenomenon of non-random genetic variation—and variables like geography, demographic history, migration, social structure, and environmental measures. We identify population structure at a variety of scales, ranging from a handful to hundreds of thousands of individuals, uncover subtle relationships between our data, and discuss applications to exploratory data analysis, data QC, and polygenic scoring.

Back to top