Editorial Article: What’s behind the world’s largest crowd-sourced microbiome project?

A sneak-peek into the busiest microbiome laboratory: what technologies assist large projects and what happens to all the data from thousands of samples

26 Sep 2019

Qiita, microbiome reference resource plot3D principal coordinates plot of all public microbiome samples in Qiita, a microbiome reference resource used by the Knight lab, UCSD1. Each point in the plot represents an entire microbiome sample. Points that are close to each other are more similar to one another, and the coloring depicts high-level environment information (for example, distal gut or marine sediment). Image courtesy of Dr. Daniel McDonald. 

In 2012, two scientists co-founded what went on to become the world’s largest crowd-sourced, citizen science microbiome research project: the American Gut Project.

Co-founder Dr. Rob Knight’s lab at the University of California, San Diego, processes over 100,000 samples per year as a part of several microbiome projects. Today we speak with experienced members of the Knight lab, scientific director and American Gut Project manager Dr. Daniel McDonald, and wet lab research supervisor, Greg Humphrey, to get a sneak-peek into what goes on inside one of the busiest microbiome labs in the world. We uncover the technologies enabling this high-throughput research, and finally, what happens to all the data collected from participating citizens.

Why crowd-source the microbiome?

Greg Humphrey and Daniel McDonald, Knight lab, UCSD
Greg Humphrey (left) and Dr. Daniel McDonald (right), from the Knight lab, UCSD, focus on understanding the complex microbial ecosystems of the human body using advanced computational and experimental techniques.

“With the American Gut Project, virtually anybody can participate in microbiome research. We're crowd-sourcing and crowdfunding samples,” explains. McDonald. “As a purely exploratory project, we want to get an idea for what type of microbes are out there and how these microbial communities are associated with different health and lifestyle characteristics of individuals.” Characterizing the gut microbes associated with health versus disease is one of the big questions of the American Gut Project.

“We have all kinds of microorganisms that live on and inside us. The microbes that live inside us produce compounds absolutely essential for your health and well-being,” says McDonald. “10-40% of small molecules in your blood have some type of association with the microbiome. What you consume has a large impact on your body, as a whole.”

More than just the American Gut

Expanding the American Gut Project internationally, The Microsetta Initiative also includes the British Gut Project and is now underway with a goal of replicating studies in another population, and then cross-examining data across different populations for associations with diseases. McDonald explains: “One of our post-docs noticed an association of particular microbes with people who reported that they had a medical diagnosis for depression when we published the American Gut manuscript. He was then able to replicate this finding within the U.K. cohort, which is truly phenomenal.”

Dealing with large sample sizes

epMotion, Eppendorf

Automated DNA purification


High-throughput purification of DNA from difficult samples, such as human gut samples, can be difficult due to the presence of high levels of PCR-inhibiting compounds.

Eppendorf’s epMotion 5075 TMX Workstation provides a rapid, high-throughput solution to this issue through a hands-free, automated liquid-handling system that enables high-quality nucleic acid purification.

The world’s largest microbiome project isn’t without its challenges: with great sample sizes come streamlined protocols.

“My name has been on way too many packages with poop in it,” quips Humphrey. “We get a handful of the American Gut samples every day. Then we also get larger shipments from some of our collection sites either in the UK or Australia.”

The Knight lab staff check for consent on all samples, after which the scientific process begins. “We do a bit of sample organization, and then it enters into DNA extraction,” explains Humphrey. “Right now, for American Gut, the main assay we’re using is our 16S Amplicon assay. From the genomic DNA that’s extracted, we then put it into our pipeline to produce 16S amplicons, which then goes into our in-house Illumina MiSeq to produce the sequence data.” 

A high-throughput project needs automation

The recent years have seen an increased interest in the gut microbiome, resulting from, among other things, technological advances that enable scaling-up experiments. Speaking to the massive expansion of the microbiome project in the Knight lab, McDonald says, “The improvement in automation and efficiency of the sample preparation, for example, the ability to run a very large number of samples on a single sequencing run have helped enable some massive study designs that are necessary to assess potentially subtle effects.”

From a protocol perspective, Humphrey appreciates the liquid handling automation. “We set up the protocol, it moves liquids, and we can walk away and work on something else,” says Humphrey. “There are a few hands-on steps in the pipeline. While we're working on those hands-on steps, the robots are doing some of the liquid handling without us having to intervene.”

Humphrey continues: “That's really been hugely instrumental in making us a high-throughput facility. The pipeline becomes more and more efficient as the technology for robots becomes better.”

The current focus at the Knight lab is on miniaturizing as many assays as possible to not only reduce the cost, but to make the workflow more efficient and take less time. “We're now able to do a 1:10 or a 1:8 miniaturization on the sample prep. That allows us to greatly reduce cost and also increase the samples that we can multiplex or sequence on a lane,” says Humphrey.

What happens to all the data?

“One of the central tenets of biology is cataloging of life,” says McDonald. “One of our goals is to simply amass a reference database to catalogue microbial communities.”

All of the American Gut data (de-identified) is placed into the public domain so anybody can re-use them. McDonald describes: “If a researcher wanted to take their data and put it in the content of a particular population, they can go and grab the American Gut Data and they put their data in the context of, say, United States or United Kingdom.”

“All of the data that we generate, whether it's 16S sample data or shotgun data from America Gut or other studies… all of this stuff ends up getting deposited into Qiita, which is a microbial database that we run here at UC San Diego,” explains McDonald.  

Qiita acts as an open-source microbial management platform as well as a resource database for the global community of scientists, mainly non-bioinformaticians, to easily share and use existing datasets. To understand associations across multiple studies or samples, Qiita enables aggregation of data for meta-analyses.

“At this time, we've deposited quite a few samples into Qiita, but the collective number of samples from us as well as other researchers is well in excess of 500,000 at this point,” McDonald says.

An open-source microbiome pipeline for all

The Knight lab, along with collaborators worldwide, have been instrumental in developing an open-source bioinformatics pipeline for performing microbial analysis. Named QIIME (Quantitative Insights Into Microbial Ecology), this pipeline aids scientists throughout the whole microbiome research workflow: starting from raw sequencing data through to producing statistics for use in publications. 

McDonald explains how the QIIME has become an indispensable part of their team’s research: “All the American Gut data gets routed through QIIME for initial processing such as sequence quality control, taking the sequences and grouping them into discrete elements, and performing taxonomic assignment.”

The QIIME package used along with the Qiita database ultimately help perform comparisons of entire microbial communities from one study with others and to create small summary reports presented back to the participants for educational purposes.

From exploratory to translational research

Going from a project like the American Gut to developing a therapy for disorders isn’t as simple as it seems, according to McDonald. 

“I suspect that going forward in the future, with different types of diseases there’s going to be the identification of types of microbes that may be beneficial or detrimental for particular foods or drugs. It may be possible to use microbiome data to help improve the particular pharmaceutical therapy,” says McDonald. “Even if you’re not able to modify the microbiome directly to benefit somebody’s health, you might be able to modify the microbiome to better take advantage of the different compounds presented to the microbiome that in turn may help the individual.”

Watch this video interview with Prof. Knight to find out how the project uses techniques such as metabolomics, metatranscriptomics, and metaproteomics>>


  1. McDonald D, Vázquez-Baeza Y, Koslicki D, McClelland J, Reeve N, Xu Z, Gonzalez A, Knight R. Striped UniFrac: enabling microbiome analysis at unprecedented scale. Nat Methods. 2018 Nov;15(11):847-848.

Discover more of the latest developments in our Microbiome Special Feature >>