-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Hi,
I'm following the "Workflow for Microbiome Data Analysis" to analyse some 16S and ITS datasets. I'm unsure about the effects that prevalence filtering could have in downstream analysis I'm planning to do (a differential abundance analysis with DESeq2, for example).
This is how I'm performing prevalence filtering right now:
ps_16s_relative <- transform_sample_counts(ps_16s, function(x) x/sum(x))
prevalence_16s <- apply(X = otu_table(ps_16s_relative),
MARGIN = 1,
FUN = function(x) sum(x >= 0.0001))
prevalence_df_16s <- data.frame(prevalence = prevalence_16s,
relative_prevalence = prevalence_16s/nsamples(ps_16s_relative),
total_abundance = taxa_sums(ps_16s_relative),
tax_table(ps_16s_relative))
ps_16s <- prune_taxa(rownames(prevalence_df_16s)[(prevalence_df_16s$relative_prevalence >= 0.05)], ps_16s)
And these are the numbers I'm getting (with the 16S data):
'Number of ASVs before the prevalence filtering: 28582'
'Number of phyla before the prevalence filtering: 40'
'Number of ASVs after the prevalence filtering: 5762'
'Number of phyla after the prevalence filtering: 23'
I'm felling that even though the filtered data may be a better representation of the microbial community, I'm losing too much data.
Is the prevalence filtered data recommended for downstream analysis?
Metadata
Metadata
Assignees
Labels
No labels