The quest to understand the interplay between evolution, genes and traits has been revolutionized by the collection of rich phenotypic and genetic data across millions of individuals in diverse populations. However analyses of these Biobank-scale datasets present substantial statistical and computational challenges.
I will describe how we bring together statistical and computational insights to design accurate and highly scalable algorithms for a suite of problems that arise in the analysis of Biobank data: highly scalable randomized inference algorithms to dissect the genetic architecture of complex traits and deep-learning based phenotype imputation to deal with complex patterns of missingness. By applying these methods to about half a million individuals from the UK Biobank, we obtain novel insights how genetic effects are distributed across the genome, the relative contributions of additive, dominance and gene-environment interaction effects to trait variation, and new genes that confer risk for hard-to-measure diseases.