Erik Garrison
email twitter matrix code papers

erik.garrison@gmail.com
US: +1 502 422 6456 / 1289 Island Pl E, Memphis, TN 38103, USA
EU: +39 320 244 2758 / Via F. S. Nitti, 3, 85024 Lavello PZ, Italy

I am a computational biologist working as an assistant professor at the University of Tennessee Health Science Center in Memphis. Although my research journey began in the social sciences, for most of my career I’ve worked in bioinformatics, a field where clear ideas and their implementation in open code can move the world. I study genome evolution, or how nature implements its learning algorithm in code. I create methods to read genomes and understand their variation. These approaches are pangenomic, looking at all scales, from small point mutations to chromosome-scale rearrangements, and working both on single genomes and collections of tens to thousands. As we have developed improved methods to sequence DNA, bioinformatics’ importance has expanded and now most areas of biology can be interpreted digitally. Inspired by this progression, my ongoing research envisions a future in which we can synthesize DNA as easily as we can now read it.

Education

PhD in Genomics
Cambridge University
October 2014 January 2019
Student at the Wellcome Sanger Institute. Advised by Richard Durbin. Thesis “Graphical pangenomics” put forward methods of using pangenomes encoded in sequence variation graphs in alignment and genome inference. Led the development of vg, an open source toolkit enabling the use of genome graphs in bioinformatic analysis. Visiting researcher at Stazione Zoologica Anton Dohrn and visiting student at Cambridge Genetics. Explored applications of variation graphs to analyses in population genetics, ancient DNA, marine biology, metagenomics, and genome assembly.

Bachelor of Arts in Social Studies
Harvard University
Fall 2002 Spring 2006
Undergraduate Fellow, Harvard Institute for Quantitative Social Science. Senior thesis focused on the relationship between social structure and communication technologies. Electives included classes in functional programming, theoretical computer science, peer-to-peer networks, and linear algebra. Spanish language citation. Rower from 2002 to 2005.

Work

University of Tennessee Health Science Center, Memphis
December 2020 present *
Assistant Professor, department of Genetics, Genomics, and Informatics. Design and implementation of a unified process to build a pangenome graph from hundreds of eukaryotic genome assemblies. Application of the approach to human and mouse pangenomes. Co-chair of Human Pangenome Reference Consortium (HPRC) Pangenomes Working Group.

University of California, Santa Cruz
February 2019 November 2020
Postdoctoral Fellow. Developed scalable methods for pangenomic analysis based on genome variation graphs. Participant in the HPRC assembly group and telomere-to-telomere consortium.

DNAnexus
September 2015 March 2018
Part-time contractor with research and development team. Explored machine learning based approaches to variant calling as part of the PrecisionFDA Challenge, producing the hhga variant caller. Maintenance, development, and continued support of vcflib and freebayes.

Boston College
February 2010 September 2014
Research associate in the laboratory of Gabor Marth. Designed and implemented freebayes, a genetic variant detector designed for short-read sequencing data. Developed tools to manipulate sequencing data and descriptions of genetic variation. Wrote first haplotype and graph-based variant detection methods for short-read sequencing data. Generated the final 1000 Genomes Project release, and helped to produce its paper as part of the project writing group.

The Echonest
January 2009 May 2009
Contractor. Designed and implemented control and monitoring systems to manage a compute cluster deployed in the Amazon EC2 cloud.

One Laptop Per Child
May 2008 January 2009
Software engineer. Focused on operating system build processes, customer support, maintenance, software design planning, communication among a globally-dispersed group of volunteers and educators.

Harvard Medical School
August 2006 April 2008
Contractor in the laboratory of George Church. Designed, wrote, and tested data acquisition and system control software for the ”Polonator” open-source DNA sequencing device.

National Bureau of Economic Research
May 2006 May 2007
Research assistant. Wrote software to efficiently process Wikipedia’s XML-based data dumps (wikiq), and evaluated metrics of user contribution. Analyzed data related to the internationalization of clinical trials.

Harvard Kennedy School of Government
January 2005 September 2005
Research assistant. Obtained and processed data for country-level quantitative studies of terrorism and violent extremism.

Research

I build methods that let us understand the precise relationships between thousands of genomes. My work on this topic began with the development of Bayesian methods to detect and genotype genomic variants (Garrison and Marth, 2012, arXiv), with application of these methods to the thousands of human genomes cataloged in the 1000 Genomes Project (1000 Genomes Project Consortium et. al., 2015, Nature). Lessons learned in that effort guided me to work on unbiased methods for genome inference based on graphical models of pangenomes. In these, the genome is encoded in a graph that may represent a population sample of individuals from the same species, a metagenome, the diploid genome of a single individual, or any other useful collection of genomic sequence information. I have shown that this approach provides more accurate alignment of reads when it is possible to construct a high-quality pangenome (Garrison et. al., 2018, Nature Biotechnology). We are currently using this model to build and use pangenome graphs in the Human Pangenome Reference Consortium.

Our first efforts have produced a draft human pangenome (Liao et. al., 2023, Nature). By applying unbiased analysis methods to the pangenome, we confirmed that heterologous acrocentric chromosomes recombine (Guarracino et. al., 2023, Nature)—a hypothesis that stood for fifty years without resolution. On the way to these results, I have participated in projects to create the first complete human genome assembly (Nurk et. al., 2022, Science) and the first collection of complete assemblies for vertebrates (Rhie et. al., 2021, Nature).

In my current work, I focus on:

Implementing basic bioinformatic tools, such as sequence aligners and pangenome graph builders. These are targeted at the unique context of analyzing many complete genome assemblies—a process that I believe will become standard in biology in coming years.

Applying these methods to study genome evolution, with a specific focus on the role of recombination in genome homogenization and variation. This work has begun with studies of the human pangenome, and is now leading me into the laboratory where we will implement an automated platform for yeast evolutionary experiments.

Theory and practice of DNA computing. This poorly-explored problem domain sits at the intersection of my research interests and experience. My theoretical work is aimed towards the construction of an instruction set architecture for DNA that will serve as target for the compilation of arbitrary algorithms. My practical work considers the key bottleneck in DNA computing and data storage: DNA synthesis, which I am tackling through new bioengineering approaches.

I am a firm believer in open science. I develop and share my source code publicly ( https://github.com/ekg) under permissive open licenses. I have reviewed for Bioinformatics, Genetics, Genome Research, Nature, Nature Biotechnology, Nature Communications, and Nucleic Acids Research. I maintain an overview of my contributions at: https://scholar.google.com/citations?user=d5TKoncAAAAJ, and include a list of works I have supported at the end of this curriculum vitae.

Selected Talks

Biological revelations at the frontiers of a draft human pangenome reference (keynote), Nextflow Summit, Barcelona, October 18, 2023.

Learning on Pangenomes, Workshop on Epistasis @Lorentz Center, Leiden, Netherlands, July 17, 2023.

Building and Understanding the Human Pangenome, Stowers Institute for Medical Research, April 26, 2023. https://youtu.be/ukopTzkfPVk

Building and Understanding the Human Pangenome, UniversitÓ di Pisa, July 27, 2023.

Understanding all variation in telomere-to-telomere assemblies. ALPACA conference, 2021.

The pluralistic promise of pangenome graphs. Workshop in Algorithms on Bioinformatics, 2020.

Untangling the pangenome. Cambridge Computational Biology Institute Symposium, 2019.

Variation graphs for efficient unbiased pangenomic sequence interpretation. Biology of Genomes. Cold Spring Harbor, 2018. https://www.youtube.com/watch?v=WWVl1XPpENE

Resequencing against a pangenome. NBDC/DBCLS BioHackathon. Keio University. Tsuruoka, Japan, 2016. https://www.youtube.com/watch?v=kgwBMiMs4pA

Variant detection using a graph of genomic variation. Advances in Genome Biology and Technology, 2014.

From short reads to genotypes, haplotypes, and frequencies. Penn State, 2014.

A generalized human reference as a graph of genomic variation. American Society of Human Genetics, 2013.

Simultaneous assembly of thousands of human genomes. Biology of Genomes, 2013. https://vimeo.com/95222169

Haplotype-based variant detection and interpretation enables the population-scale analysis of multi-nucleotide sequence variants. American Society of Human Genetics, 2012.

Haplotype-based variant detection from short-read sequencing. Biology of Genomes, 2012.

Teaching

Course lead and instructor. MemPang23, Memphis, Tennessee, May 2023.

Instructor. Advanced Bioinformatics course. Utrecht Bioinformatics Center. May 2021.

Course lead and instructor. Computational Pangenomics. Instituto Gulbenkian de Ciŕncia. Oieras, Portugal. March 2018, September 2019, May 2022.

Instructor. NGS alignment and variant calling practical. OBiLab, Consiglio Nazionale delle Ricerche. Napoli, Italy. April 2015.

Instructor. Biology for Adaptation Genomics. Weggis, Switzerland. Winters 2015-2018.

Instructor. Wellcome Genome Campus Advanced Course on Next Generation Sequencing Bioinformatics. Hinxton, UK. November 2015.

Guest lecturer. Iowa Bioinformatics Summer. Iowa City, Iowa, USA. May 2015.

Instructor. SeqShop. University of Michigan. Ann Arbor, Michigan, USA. June 2014 and May 2015.

Trainer. Galaxy Community Conference 2013. Oslo, Norway. June 2013.

Funding

Undergraduate Fellow. Harvard Institute for Quantitative Social Science. 2005-2006. (student)

PhD fellowship. Wellcome Trust. 2014-2018. (student)

Discovery Project Grant DP190103705. Australian Research Council. 2019-2021. (CI)

NLnet Foundation NGI0 Discovery Fund. Privacy-preserving varation graphs. 2020. (PI)

NIH U01HG010961: The construction and utility of reference pan-genome graphs. 2020-2024. (Co-I)

NSF #2118709 PPoSS: LARGE: Panorama: Integrated Rack-Scale Acceleration for Computational Pangenomics. 2021-2026. (PI)

NIH R01HG013017: Complete T2T primate genomes to understand he evolution of genome structure and turnover at rapidly evolving hotspots of human disease. 2023-2027. (PI)

NIH U01DA057530: Understanding nicotine addiction in hybrid rats using pangenome methods. 2023-2026. (Co-I)

Qatari Research, Development, and Innovation Council ARG01-0426-230012. Building a Pangenome Reference for Middle Eastern Populations. 2024-2027. (PI)

Languages

English, Italian, Spanish

Publications

 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]

[1]   Simon Heumos, Andrea Guarracino, Jan-Niklas M. Schmelzle, Jiajie Li, Zhiru Zhang, J÷rg Hagmann, Sven Nahnsen, Pjotr Prins, and Erik Garrison. Pangenome graph layout by path-guided stochastic gradient descent. bioRxiv, sep 2023.

[2]   Alexis L. Sperling, Daniel K. Fabian, Erik Garrison, and David M. Glover. A genetic basis for facultative parthenogenesis in drosophila. Current Biology, 33(17):3545–3560.e13, sep 2023.

[3]   Arang Rhie, Sergey Nurk, Monika Cechova, Savannah J. Hoyt, Dylan J. Taylor, Nicolas Altemose, ..., Erik Garrison, ..., Evan E. Eichler, Rachel J. O’Neill, Michael C. Schatz, Karen H. Miga, Kateryna D. Makova, and Adam M. Phillippy. The complete sequence of a human y chromosome. Nature, 621(7978):344–354, aug 2023.

[4]   Andrea Guarracino, Silvia Buonaiuto, Leonardo Gomes de Lima, Tamara Potapova, Arang Rhie, Sergey Koren, Boris Rubinstein, Christian Fischer, Jennifer L Gerton, Adam M Phillippy, Vincenza Colonna, and Erik Garrison. Recombination between heterologous human acrocentric chromosomes. Nature, 617(7960):335–343, 2023.

[5]   Wen-Wei Liao, Mobin Asri, Jana Ebler, Daniel Doerr, Marina Haukness, Glenn Hickey, Shuangjia Lu, Julian K Lucas, Jean Monlong, Haley J Abel, …, Erik Garrison, Tobias Marschall, Ira M Hall, Heng Li, and Benedict Paten. A draft human pangenome reference. Nature, 617(7960):312–324, 2023.

[6]   Erik Garrison, Andrea Guarracino, Simon Heumos, Flavia Villani, Zhigui Bao, Lorenzo Tattini, J÷rg Hagmann, Sebastian Vorbrugg, Santiago Marco-Sola, Christian Kubica, et al. Building pangenome graphs. bioRxiv, pages 2023–04, 2023.

[7]   Erik Garrison and Andrea Guarracino. Unbiased pangenome graphs. Bioinformatics, 39(1):btac743, 2023.

[8]   David Porubsky, Mitchell R Vollger, William T Harvey, Allison N Rozanski, Peter Ebert, Glenn Hickey, Patrick Hasenfeld, Ashley D Sanders, Catherine Stober, Jan O Korbel, et al. Gaps and complex structurally variant loci in phased genome assemblies. Genome Research, 33(4):496–510, 2023.

[9]   Bryce Kille, Erik Garrison, Todd Treangen, and Adam M Phillippy. Minmers are a generalization of minimizers that enable unbiased local jaccard estimation. bioRxiv, pages 2023–05, 2023.

[10]   Jonas A Sibbesen, Jordan M Eizenga, Adam M Novak, Jouni SirÚn, Xian Chang, Erik Garrison, and Benedict Paten. Haplotype-aware pantranscriptome analyses using spliced pangenome graphs. Nature Methods, pages 1–9, 2023.

[11]   Santiago Marco-Sola, Jordan M Eizenga, Andrea Guarracino, Benedict Paten, Erik Garrison, and Miquel Moreto. Optimal gap-affine alignment in o (s) space. Bioinformatics, 39(2):btad074, 2023.

[12]   Tristan V de Jong, Yanchao Pan, Pasi Rastas, Daniel Munro, Monika Tutaj, Huda Akil, Chris Benner, Apurva S Chitre, William Chow, Vincenza Colonna, et al. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats. bioRxiv, pages 2023–04, 2023.

[13]   Erik Garrison, Zev N Kronenberg, Eric T Dawson, Brent S Pedersen, and Pjotr Prins. A spectrum of free software tools for processing the vcf variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar. PLOS Computational Biology, 18(5):e1009123, 2022.

[14]   Sergey Nurk, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V Bzikadze, Alla Mikheenko, Mitchell R Vollger, Nicolas Altemose, Lev Uralsky, Ariel Gershman, et al. The complete sequence of a human genome. Science, 376(6588):44–53, 2022.

[15]   Andrea Guarracino, Simon Heumos, Sven Nahnsen, Pjotr Prins, and Erik Garrison. Odgi: understanding pangenome graphs. Bioinformatics, 38(13):3319–3326, 2022.

[16]   Erich D Jarvis, Giulio Formenti, Arang Rhie, Andrea Guarracino, Chentao Yang, Jonathan Wood, Alan Tracey, Francoise Thibaud-Nissen, Mitchell R Vollger, David Porubsky, et al. Automated assembly of high-quality diploid human reference genomes. bioRxiv, pages 2022–03, 2022.

[17]   Alexis L Braun, Daniel K Fabian, Erik Garrison, and David M Glover. Virgin birth: A genetic basis for facultative parthenogenesis. bioRxiv, pages 2022–03, 2022.

[18]   Ting Wang, Lucinda Antonacci-Fulton, Kerstin Howe, Heather A Lawson, Julian K Lucas, Adam M Phillippy, Alice B Popejoy, Mobin Asri, Caryn Carson, Mark JP Chaisson, et al. The human pangenome project: a global resource to map genomic diversity. Nature, 604(7906):437–446, 2022.

[19]   Njagi Moses Mwaniki, Erik Garrison, and Nadia Pisanti. Fast exact string to d-texts alignments. arXiv preprint arXiv:2206.03242, 2022.

[20]   Erich D Jarvis, Giulio Formenti, Arang Rhie, Andrea Guarracino, Chentao Yang, Jonathan Wood, Alan Tracey, Francoise Thibaud-Nissen, Mitchell R Vollger, David Porubsky, et al. Semi-automated assembly of high-quality diploid human reference genomes. Nature, pages 1–13, 2022.

[21]   Jouni SirÚn, Jean Monlong, Xian Chang, Adam M. Novak, Jordan M. Eizenga, Charles Markello, Jonas A. Sibbesen, Glenn Hickey, Pi-Chuan Chang, Andrew Carroll, and et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science, 374(6574), Dec 2021.

[22]   Andrea Guarracino, Simon Heumos, Sven Nahnsen, Pjotr Prins, and Erik Garrison. ODGI: understanding pangenome graphs. bioRxiv, Nov 2021.

[23]   Sergey Nurk, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V. Bzikadze, Alla Mikheenko, Mitchell R. Vollger, Nicolas Altemose, Lev Uralsky, Ariel Gershman, and et al. The complete sequence of a human genome. bioRxiv, May 2021.

[24]   Arang Rhie, Shane A. McCarthy, Olivier Fedrigo, Joana Damas, Giulio Formenti, Sergey Koren, Marcela Uliano-Silva, William Chow, Arkarachai Fungtammasan, Juwan Kim, and et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature, 592(7856):737–746, Apr 2021.

[25]   Jonas A. Sibbesen, Jordan M. Eizenga, Adam M. Novak, Jouni SirÚn, Xian Chang, Erik Garrison, and Benedict Paten. Haplotype-aware pantranscriptome analyses using spliced pangenome graphs. BioRxiv, Mar 2021.

[26]   Manuel Tognon, Vincenzo Bonnici, Erik Garrison, Rosalba Giugno, and Luca Pinello. Grafimo: Variant and haplotype aware motif scanning on pangenome graphs. PLOS Computational Biology, 17(9):e1009444, Sep 2021.

[27]   Christof C. Smith, Kelly S. Olsen, Kaylee M. Gentry, Maria Sambade, Wolfgang Beck, Jason Garness, Sarah Entwistle, Caryn Willis, Steven Vensko, Allison Woods, and et al. Landscape and selection of vaccine epitopes in sars-cov-2. Genome Medicine, 13(1), Jun 2021.

[28]   Chen-Shan Chin, Justin Wagner, Qiandong Zeng, Erik Garrison, Shilpa Garg, Arkarachai Fungtammasan, Mikko Rautiainen, Sergey Aganezov, Melanie Kirsche, Samantha Zarate, and et al. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nature Communications, 11(1), Sep 2020.

[29]   Rui Martiniano, Erik Garrison, Eppie R. Jones, Andrea Manica, and Richard Durbin. Removing reference bias and improving indel calling in ancient dna data analysis by mapping to a sequence variation graph. Genome Biology, 21(1), Sep 2020.

[30]   Jordan M. Eizenga, Adam M. Novak, Jonas A. Sibbesen, Simon Heumos, Ali Ghaffaari, Glenn Hickey, Xian Chang, Josiah D. Seaman, Robin Rounthwaite, Jana Ebler, and et al. Pangenome graphs. Annual Review of Genomics and Human Genetics, 21(1):139–162, Aug 2020.

[31]   Jordan M Eizenga, Adam M Novak, Emily Kobayashi, Flavia Villani, Cecilia Cisar, Simon Heumos, Glenn Hickey, Vincenza Colonna, Benedict Paten, and Erik Garrison. Efficient dynamic variation graphs. Bioinformatics, Jul 2020.

[32]   Christof Smith, Sarah Entwistle, Caryn Willis, Steven Vensko, Wolfgang Beck, Jason Garness, Maria Sambade, Eric Routh, Kelly Olsen, Brandon Carpenter, and et al. Translation of a therapeutic neoantigen vaccine workflow to sars-cov-2 vaccine development. Journal for ImmunoTherapy of Cancer, 8(Suppl 3):A510–A512, Nov 2020.

[33]   International Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature, 578(7793):82–93, Feb 2020.

[34]   Kishwar Shafin, Trevor Pesout, Ryan Lorig-Roach, Marina Haukness, Hugh E. Olsen, Colleen Bosworth, Joel Armstrong, Kristof Tigyi, Nicholas Maurer, Sergey Koren, and et al. Nanopore sequencing and the shasta toolkit enable efficient de novo assembly of eleven human genomes. Nature Biotechnology, 38(9):1044–1053, May 2020.

[35]   Glenn Hickey, David Heller, Jean Monlong, Jonas A. Sibbesen, Jouni SirÚn, Jordan Eizenga, Eric T. Dawson, Erik Garrison, Adam M. Novak, and Benedict Paten. Genotyping structural variants in pangenome graphs using the vg toolkit. Genome Biology, 21(1), Feb 2020.

[36]   J. Siren, E. Garrison, A. M. Novak, B. Paten, and R. Durbin. Haplotype-aware graph indexes. Bioinformatics, 36(2):400–407, 2020.

[37]   Deepti Gurdasani, Tommy Carstensen, Segun Fatumo, Guanjie Chen, Chris S Franklin, Javier Prado-Martinez, Heleen Bouman, Federico Abascal, Marc Haber, Ioanna Tachmazidou, et al. Uganda genome resource enables insights into population history and genomic discovery in africa. Cell, 179(4):984–1002, 2019.

[38]   Bastien Llamas, Giuseppe Narzisi, Valerie Schneider, Peter A Audano, Evan Biederstedt, Lon Blauvelt, Peter Bradbury, Xian Chang, Chen-Shan Chin, Arkarachai Fungtammasan, et al. A strategy for building and using a human reference pangenome. F1000Research, 8(1751):1751, 2019.

[39]   Eric T Dawson, Sarah Wagner, David Roberson, Meredith Yeager, Joseph Boland, Erik Garrison, Stephen Chanock, Mark Schiffman, Tina Raine-Bennett, Thomas Lorey, et al. Viral coinfection analysis using a minhash toolkit. BMC bioinformatics, 20(1):389, 2019.

[40]   Vincenza Colonna, Nunzio D’Agostino, Erik Garrison, Anders Albrechtsen, Jonas Meisner, Angelo Facchiano, Teodoro Cardi, and Pasquale Tripodi. Genomic diversity and novel genome-wide association with fruit morphology in capsicum, from 746k polymorphic sites. Scientific Reports, 9(1):1–14, 2019.

[41]   E. Garrison, J. Siren, A. M. Novak, G. Hickey, J. M. Eizenga, E. T. Dawson, W. Jones, S. Garg, C. Markello, M. F. Lin, B. Paten, and R. Durbin. Variation Graph Toolkit Improves Read Mapping by Representing Genetic Variation in the Reference. Nature Biotechnology, 36(9):875–879, October 2018.

[42]   Benedict Paten, Jordan M Eizenga, Yohei M Rosen, Adam M Novak, Erik Garrison, and Glenn Hickey. Superbubbles, ultrabubbles, and cacti. Journal of Computational Biology, 25(7):649–663, 2018.

[43]   Shilpa Garg, Mikko Rautiainen, Adam M Novak, Erik Garrison, Richard Durbin, and Tobias Marschall. A graph-based approach to diploid genome assembly. Bioinformatics, 34(13):i105–i114, 2018.

[44]   Computational pan-genomics consortium. Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics, 19(1):118–135, 2018.

[45]   Adam M Novak, Glenn Hickey, Erik Garrison, Sean Blum, Abram Connelly, Alexander Dilthey, Jordan Eizenga, MA Saleh Elmohamed, Sally Guthrie, AndrÚ Kahles, et al. Genome graphs. bioRxiv:101378, 2017.

[46]   Benedict Paten, Adam M Novak, Jordan M Eizenga, and Erik Garrison. Genome graphs and the evolution of genome inference. Genome research, 27(5):665–676, 2017.

[47]   Eric T Dawson, Erik Garrison, Adam Novak, Benedict Paten, Jordan Eizinga, Glenn Hickey, Stephen Chanock, and Richard Durbin. Germline structural variant detection with variation graphs. American Association for Cancer Research, 2017.

[48]   Adam M Novak, Erik Garrison, and Benedict Paten. A graph extension of the positional burrows–wheeler transform and its applications. Algorithms for Molecular Biology, 12(1):18, 2017.

[49]   Sebastian M Waszak, Grace Tiao, Bin Zhu, Tobias Rausch, Francesc Muyas, Bernardo Rodriguez-Martin, Raquel Rabionet, Sergei Yakneen, Georgia Escaramis, Yilong Li, et al. Germline determinants of the somatic mutation landscape in 2,642 cancer genomes. bioRxiv:208330, 2017.

[50]   G David Poznik, Yali Xue, Fernando L Mendez, Thomas F Willems, Andrea Massaia, Melissa A Wilson Sayres, Qasim Ayub, Shane A McCarthy, Apurva Narechania, Seva Kashin, et al. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nature genetics, 48(6):593, 2016.

[51]   1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature, 526(7571):68, 2015.

[52]   Colby Chiang, Ryan M Layer, Gregory G Faust, Michael R Lindberg, David B Rose, Erik P Garrison, Gabor T Marth, Aaron R Quinlan, and Ira M Hall. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nature Methods, 12(10):966, 2015.

[53]   Danny Challis, Lilian Antunes, Erik Garrison, Eric Banks, Uday S Evani, Donna Muzny, Ryan Poplin, Richard A Gibbs, Gabor Marth, and Fuli Yu. The distribution and mutagenesis of short coding indels from 1,128 whole exomes. BMC Genomics, 16(1):143, 2015.

[54]   Massimiliano Cocca, Marc Pybus, Pier Francesco Palamara, Erik Garrison, Michela Traglia, Cinzia F Sala, Sheila Ulivi, Yasin Memari, Anja Kolb-Kokocinski, Richard Durbin, et al. Purging of deleterious variants in Italian founder populations with extended autozygosity. bioRxiv:022947, 2015.

[55]   Peter H Sudmant, Tobias Rausch, Eugene J Gardner, Robert E Handsaker, Alexej Abyzov, John Huddleston, Yan Zhang, Kai Ye, Goo Jun, Markus Hsi-Yang Fritz, et al. An integrated map of structural variation in 2,504 human genomes. Nature, 526(7571):75, 2015.

[56]   Olivier Delaneau, Jonathan Marchini, Gil A McVean, Peter Donnelly, Gerton Lunter, Jonathan L Marchini, Simon Myers, Anjali Gupta-Hinch, Zamin Iqbal, Iain Mathieson, et al. Integrating sequence and array data to create an improved 1000 genomes project haplotype reference panel. Nature Communications, 5:3934, 2014.

[57]   Wan-Ping Lee, Michael P Stromberg, Alistair Ward, Chip Stewart, Erik P Garrison, and Gabor T Marth. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PloS One, 9(3):e90581, 2014.

[58]   Vincenza Colonna, Qasim Ayub, Yuan Chen, Luca Pagani, Pierre Luisi, Marc Pybus, Erik Garrison, Yali Xue, and Chris Tyler-Smith. Human genomic regions with exceptionally high or low levels of population differentiation identified from 911 whole-genome sequences. bioRxiv:005462, 2014.

[59]   Ekta Khurana, Yao Fu, Vincenza Colonna, Xinmeng Jasmine Mu, Hyun Min Kang, Tuuli Lappalainen, Andrea Sboner, Lucas Lochovsky, Jieming Chen, Arif Harmanci, et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science, 342(6154):1235587, 2013.

[60]   Mengyao Zhao, Wan-Ping Lee, Erik P Garrison, and Gabor T Marth. SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PloS One, 8(12):e82138, 2013.

[61]   Erik Garrison and Gabor Marth. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907, 2012.

[62]   1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422):56, 2012.

[63]   Simon Gravel, Brenna M Henn, Ryan N Gutenkunst, Amit R Indap, Gabor T Marth, Andrew G Clark, Fuli Yu, Richard A Gibbs, Carlos D Bustamante, David L Altshuler, et al. Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences, 108(29):11983–11988, 2011.

[64]   Chip Stewart, Deniz Kural, Michael P Str÷mberg, Jerilyn A Walker, Miriam K Konkel, Adrian M StŘtz, Alexander E Urban, Fabian Grubert, Hugo YK Lam, Wan-Ping Lee, et al. A comprehensive map of mobile element insertion polymorphisms in humans. PLoS Genetics, 7(8):e1002236, 2011.

[65]   Derek W Barnett, Erik P Garrison, Aaron R Quinlan, Michael P Str÷mberg, and Gabor T Marth. Bamtools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 27(12):1691–1692, 2011.

[66]   1000 Genomes Project Consortium et al. A map of human genome variation from population-scale sequencing. Nature, 467(7319):1061, 2010.