Broadly, the Schrider Lab is interested in a number of problems in population and evolutionary genomics, particularly in humans, the fruit fly Drosophila melanogaster, and the malaria vector mosquito Anopheles gambaie. Our main research areas are as follows:

The impact of natural selection on genetic variation

The patterns of genetic variation that we observe among individuals are shaped by several evolutionary forces. First, all variation is the result of mutational mechanisms (e.g. DNA replication error) that introduce new alternative versions of a gene (i.e. alleles). The fate of many of these alleles is determined by random chance, but those with strong enough fitness benefits will be subject to natural selection. Beneficial mutations will rapidly increase in frequency thereby facilitating adaptation, while harmful mutations will quickly be eliminated. These forces will shape patterns of variation in and around the selected region of the genome in characteristic ways. We work to uncover these signatures of selection, and to elucidate their impact on genetic diversity across the genomes of humans and other species. Our work in this area has produced evidence that adaptation has had a larger impact on genomic diversity in humans than previously appreciated.

Using machine learning in order to perform more powerful inference from population genetic data

One of the overarching goals in population genetics is to be able to examine an alignment of gene sequences from multiple individuals and infer the evolutionary forces shaping the diversity across sequences. These phenomena include population size changes, migration across populations, and natural selection. Directly drawing inferences about these forces a sequence alignment, which is simply a large matrix of As, Cs, Gs, and Ts, is far from straightforward. Often, the alignment is summarized by a statistic (e.g. Tajima's D), and extreme values are taken as evidence of natural selection or demographic changes. One problem with this approach is that by boiling the data down to a single number one may lose a fair amount of potentially useful information. We have experimented with applying machine learning tools, which are well suited for large multidimensional data sets (e.g. a large vector of summary statistics rather than a single one), in order to make population genetic inferences while retaining as much information from the original data set as possible. These efforts generally yield far more accurate inferences than those from more traditional methods, and have enormous potential to drive new biological discoveries going forward as data sets continue to grow larger in both size and dimensionality.

Surveying genomic structural variants and investigating the evolutionary forces acting on them

It seems that in common usage the word "mutation" is often synonymous with a single base change, such as a replacement of an adenine nucleotide with a guanine. However, many mutations affect more than a single base pair. For example, large genomic duplications or deletions, which can add or remove up to millions of base pairs at a time, occur quite frequently. Inversions, which can cause a large chunk of a chromosome to flip its orientation, are also common. These mutations, referred to as structural variants, are more difficult to detect than simple base changes even using modern DNA sequencing techniques, but often have greater consequences for both disease risk and evolution. We develop tools for detecting structural variants from DNA sequencing data, and work to uncover their evolutionary consequences: What fraction of these mutations are harmful? How often are they adaptive? In addition to structural variants, we also examine some other intriguing yet understudied mutational oddities such as multinucleotide mutations.