Skip to content

Effects of prevalence filtering on downstream analysis #28

@zefrieira

Description

@zefrieira

Hi,

I'm following the "Workflow for Microbiome Data Analysis" to analyse some 16S and ITS datasets. I'm unsure about the effects that prevalence filtering could have in downstream analysis I'm planning to do (a differential abundance analysis with DESeq2, for example).

This is how I'm performing prevalence filtering right now:

ps_16s_relative <- transform_sample_counts(ps_16s, function(x) x/sum(x))

prevalence_16s <- apply(X = otu_table(ps_16s_relative),
                        MARGIN = 1,
                        FUN = function(x) sum(x >= 0.0001))

prevalence_df_16s <- data.frame(prevalence = prevalence_16s,
                                relative_prevalence = prevalence_16s/nsamples(ps_16s_relative),
                                total_abundance = taxa_sums(ps_16s_relative),
                                tax_table(ps_16s_relative))

ps_16s <- prune_taxa(rownames(prevalence_df_16s)[(prevalence_df_16s$relative_prevalence >= 0.05)], ps_16s)

And these are the numbers I'm getting (with the 16S data):

'Number of ASVs before the prevalence filtering: 28582'
'Number of phyla before the prevalence filtering: 40'

'Number of ASVs after the prevalence filtering: 5762'
'Number of phyla after the prevalence filtering: 23'

I'm felling that even though the filtered data may be a better representation of the microbial community, I'm losing too much data.

Is the prevalence filtered data recommended for downstream analysis?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions