Skip to content

Commit 3bde34f

Browse files
committed
Make edits from AW and BC
1 parent b1071ea commit 3bde34f

File tree

7 files changed

+60
-52
lines changed

7 files changed

+60
-52
lines changed

abundance-measurement.Rmd

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# How bias affects abundance measurements {#abundance-measurement}
22

3-
This section extends the theoretical results of @mclaren2019cons to describe the effect that consistent taxonomic bias within an MGS experiment affects the relative and absolute abundances measured for various microbial species.
3+
This section extends the theoretical results of @mclaren2019cons to describe how taxonomic bias in an MGS experiment affects the relative and absolute abundances measured for various microbial species.
44
We show that some approaches to quantifying species abundance yield constant fold errors (FEs), while others yield FEs that depend on overall community composition and thus can vary across samples.
55

6-
## Model of MGS measurement
6+
## A model of MGS measurements
77

88
Our primary tool for understanding the impact of taxonomic bias on MGS measurement is the theoretical model of MGS measurement developed and empirically validated by @mclaren2019cons.
99
This model describes the mathematical relationship between the read counts obtained by MGS and the (actual) abundances of the various species in a sample.
@@ -49,7 +49,7 @@ is the _sample mean efficiency_, defined as the mean efficiency of all species w
4949
## Relative abundance {#relative-abundance}
5050

5151
We distinguish between two types of species-level *relative abundances* within a sample.
52-
The *proportion* $P_{i}^{(a)}$ of species $i$ in sample $a$ equals its abundance relative to the total abundance of all species in $S$,
52+
The *proportion* $P_{i}^{(a)}$ of species $i$ in sample $a$ equals its abundance divided by the total abundance of all species in $S$,
5353
\begin{align}
5454
(\#eq:prop)
5555
P_{i}^{(a)} &\equiv \frac{A_i^{(a)}}{A_\tot^{(a)}}.
@@ -71,8 +71,7 @@ From Equations \@ref(eq:mgs-model), \@ref(eq:total-reads), and \@ref(eq:prop-mea
7171
(\#eq:prop-error)
7272
\tilde P_{i}^{(a)} &= P_{i}^{(a)} \cdot \frac{B_i}{\bar B^{(a)}}.
7373
\end{align}
74-
<!-- Taxonomic bias thus creates a fold-error (FE) in the measured proportion $\tilde P_{i}^{(a)}$ of species $i$ equal to the efficiency $B_i$ of species $i$ $divided by the mean efficiency $\bar B^{(a)}$ in the sample. -->
75-
Taxonomic bias thus creates a fold-error (FE) in the measured proportion of a species that is equal to its efficiency divided by the mean efficiency in the sample.
74+
Taxonomic bias creates a fold-error (FE) in the measured proportion of a species that is equal to its efficiency divided by the mean efficiency in the sample.
7675
Since the mean efficiency varies across samples, so does the FE.
7776
This phenomenon can be seen for Species 3 in the two hypothetical communities in Figure \@ref(fig:error-proportions).
7877
Species 3, which has an efficiency of 6, is under-measured in Sample 1 (FE < 1) but over-measured (FE > 1) in Sample 2.
@@ -89,8 +88,8 @@ From Equations \@ref(eq:mgs-model) and \@ref(eq:ratio-meas), it follows that the
8988
(\#eq:ratio-error)
9089
\tilde R_{i/j}^{(a)} = R_{i/j}^{(a)} \cdot \frac{B_i}{B_j}.
9190
\end{align}
92-
Taxonomic bias thus creates a FE in the measured ratio that is equal to the ratio in the species' efficiencies; the FE is therefore constant across samples.
93-
For instance, in Figure \@ref(fig:error-proportions), the ratio of Species 3 (with an efficiency of 6) to Species 1 (with an efficiency of 1) is over-estimated by a factor of 6 in both communities despite their varying compositions.
91+
Taxonomic bias creates a FE in the measured ratio that is equal to the ratio in the species' efficiencies; the FE is therefore constant across samples.
92+
For instance, in Figure \@ref(fig:error-proportions), the ratio of Species 3 (with an efficiency of 6) to Species 1 (with an efficiency of 1) is over-measured by a factor of 6 in both communities despite their varying compositions.
9493
A demonstration in bacterial mock communities is shown in [Figure 3D](https://doi.org/10.7554/eLife.46923.004) of @mclaren2019cons.
9594

9695
<!-- begin figure -->
@@ -113,10 +112,10 @@ We further define the efficiency of taxon $I$ as the abundance-weighted average
113112
(\#eq:efficiency-general)
114113
B_I^{(a)} \equiv \frac{\sum_{i\in I} A_{i}^{(a)} B_{i}}{\sum_{i\in I} A_{i}^{(a)}}.
115114
\end{align}
116-
With these definitions, the read count for taxon $I$ can be expressed as
115+
With these definitions, the read count for higher-order taxon $I$ can be expressed as
117116
$M_{I}^{(a)} = A_{I}^{(a)} B_I^{(a)} F^{(a)}$.
118-
Thus $B_I^{(a)}$ plays a role analogous to the efficiency of an individual species, but differs in that it need not be constant across samples:
119-
If the constituent species have different efficiencies, then the efficiency of the higher-order taxon $I$ depends on the relative abundances of its constituents and so will tend to vary across samples (@mclaren2019cons).
117+
Thus $B_I^{(a)}$ plays a role analogous to the efficiency of an individual species, but differs in that it is not constant across samples:
118+
If the constituent species have different efficiencies, then the efficiency of the higher-order taxon $I$ depends on the relative abundances of its constituents and so will vary across samples (@mclaren2019cons).
120119
As an example, suppose that Species 1 and Species 2 in Figure \@ref(fig:error-proportions) were in the same phylum.
121120
The efficiency of the phylum would then be $\tfrac{1}{2} \cdot 1 + \tfrac{1}{2} \cdot 18 = 9.5$ in Sample 1 and $\tfrac{15}{16} \cdot 1 + \tfrac{1}{16} \cdot 18 \approx 2.1$ in Sample 2.
122121
Equations \@ref(eq:prop-error) and \@ref(eq:ratio-error) continue to describe the measurement error in proportions and ratios involving higher-order taxa, so long as the sample-dependent, higher-order taxa efficiencies $B_I^{(a)}$ and $B_J^{(a)}$ are used.
@@ -126,22 +125,22 @@ In this way, we see that both proportions and ratios among higher-order taxa may
126125

127126
Several extensions of the standard MGS experiment make it possible to measure absolute species abundances.
128127
These extensions fall into two general approaches.
129-
The first approach leverages information about the abundance of the total community; for example, @vandeputte2017quan measured total-community abundance using flow cytometry and multiplied this number by MGS genus proportions to obtain the absolute abundances of individual genera (@vandeputte2017quan).
128+
The first approach leverages information about the abundance of the total community; for example, @vandeputte2017quan measured total-community abundance using flow cytometry and multiplied this number by genus proportions measured by MGS to quantify the absolute abundances of individual genera (@vandeputte2017quan).
130129
A second approach leverages information about the abundance of one or more individual species; for example, a researcher might 'spike in' a known, fixed amount of an extraneous species to all samples prior to MGS, and normalize the read counts of all species to the spike-in species (@harrison2021theq).
131130
We consider each approach in detail to determine how taxonomic bias affects the resulting absolute-abundance measurements.
132131

133132
### Leveraging information about total-community abundance
134133

135134
Suppose that the total abundance of all species in the sample, $A_{\tot}^{(a)}$, has been measured by a non-MGS method, yielding a measurement $\tilde A_\tot^{(a)}$.
136-
The absolute abundance of an individual species can be measured by multiplying the species' proportion from MGS by this total-abundance measurement,
135+
The absolute abundance of an individual species can be quantified by multiplying the species' proportion from MGS by this total-abundance measurement,
137136
\begin{align}
138137
(\#eq:density-prop-meas)
139138
\tilde A_i^{(a)} &= \tilde P_i^{(a)} \tilde A_\tot^{(a)}.
140139
\end{align}
141140
Total-abundance measurements recently used for this purpose include counting cells with microscopy (@lloyd2020evid) or flow cytometry (@props2017abso, @vandeputte2017quan, @galazzo2020howt), measuring the concentration of a marker-gene with qPCR or ddPCR (@zhang2017soil, @barlow2020aqau, @galazzo2020howt, @tettamantiboshier2020comp), and measuring bulk DNA concentration with a florescence-based DNA quantification method (@contijoch2019gutm).
142141

143-
Importantly, these methods of measuring total abundance are themselves subject to taxonomic bias.
144-
Flow cytometry may, for example, yield lower cell counts for species whose cells tend to clump together or are prone to lysis during steps involved in sample collection, storage, and preparation.
142+
Importantly, these methods of measuring total abundance are themselves subject to taxonomic bias that is analogous to, but quantitatively different from, the MGS relative abundance measurements.
143+
Flow cytometry may yield lower cell counts for species whose cells tend to clump together or are prone to lysis during steps involved in sample collection, storage, and preparation.
145144
Marker-gene concentrations measured by qPCR are affected by variation among species in extraction efficiency, marker-gene copy number, and PCR binding and amplification efficiency (@lloyd2013meta).
146145
We can easily understand the impact of taxonomic bias on total-abundance measurement under simplifying assumptions analogous to those in our MGS model.
147146
Suppose that each species $i$ has an _absolute efficiency_ $B_{i}^{\mtot}$ for the total-abundance measurement that is constant across samples.
@@ -155,15 +154,15 @@ Neglecting other error sources, the total-abundance measurement equals
155154
\end{align}
156155
<!-- Note: We have assumed that only species in S contribute to the total abundance measurement. -->
157156

158-
Species abundance measurements derived by this method are affected by taxonomic bias in both the MGS and total-abundance measurement.
159-
We can determine the resulting fold error (FE) by substituting Equations \@ref(eq:prop-error) and \@ref(eq:total-density-error) into Equation \@ref(eq:density-prop-meas), yielding
157+
Species abundance measurements derived by this method (Equation \@ref(eq:density-prop-meas)) are affected by taxonomic bias in both the MGS and total-abundance measurement.
158+
We can determine the resulting fold error (FE) in the estimate $\tilde A_i^{(a)}$ by substituting Equations \@ref(eq:prop-error) and \@ref(eq:total-density-error) into Equation \@ref(eq:density-prop-meas), yielding
160159
\begin{align}
161160
(\#eq:density-prop-error)
162161
\tilde A_\tot^{(a)}
163162
= A_\tot^{(a)} \cdot \frac{B_i \bar B^{\mtot (a)}}{\bar B^{(a)}}.
164163
\end{align}
165164
Equation \@ref(eq:density-prop-error) indicates that the FE in the measured absolute abundance of a species equals its MGS efficiency relative to the mean MGS efficiency in the sample, multiplied by the mean efficiency of the total measurement.
166-
As in the case of proportions (Equation \@ref(eq:prop-error)), the FE depends on sample composition through the two mean efficiency terms and so will vary across samples unless the two perfectly covary.
165+
As in the case of proportions (Equation \@ref(eq:prop-error)), the FE depends on sample composition through the two mean efficiency terms and so will, in general, vary across samples.
167166

168167
### Leveraging information about a reference species
169168

0 commit comments

Comments
 (0)