Effects of prevalence filtering on downstream analysis

Hi,

I'm following the ["Workflow for Microbiome Data Analysis"](http://web.stanford.edu/class/bios221/MicrobiomeWorkflowII.html) to analyse some 16S and ITS datasets. I'm unsure about the effects that prevalence filtering could have in downstream analysis I'm planning to do (a differential abundance analysis with DESeq2, for example).

This is how I'm performing prevalence filtering right now:
```
ps_16s_relative <- transform_sample_counts(ps_16s, function(x) x/sum(x))

prevalence_16s <- apply(X = otu_table(ps_16s_relative),
                        MARGIN = 1,
                        FUN = function(x) sum(x >= 0.0001))

prevalence_df_16s <- data.frame(prevalence = prevalence_16s,
                                relative_prevalence = prevalence_16s/nsamples(ps_16s_relative),
                                total_abundance = taxa_sums(ps_16s_relative),
                                tax_table(ps_16s_relative))

ps_16s <- prune_taxa(rownames(prevalence_df_16s)[(prevalence_df_16s$relative_prevalence >= 0.05)], ps_16s)
```

And these are the numbers I'm getting (with the 16S data):
```
'Number of ASVs before the prevalence filtering: 28582'
'Number of phyla before the prevalence filtering: 40'

'Number of ASVs after the prevalence filtering: 5762'
'Number of phyla after the prevalence filtering: 23'
```

I'm felling that even though the filtered data may be a better representation of the microbial community, I'm losing too much data.

Is the prevalence filtered data recommended for downstream analysis?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Effects of prevalence filtering on downstream analysis #28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Effects of prevalence filtering on downstream analysis #28

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions