Skip to content

Does phastCons on windows of the MSA identify more conserved elements vs. phastCons on whole MSA? #6

@sosie101

Description

@sosie101

I ran a multi-sequence alignment (MSA) using Progressive Cactus (of ~30 species in two closely-related genera), converted the .hal to a .maf file (using --noAncestors, --dupeMode single, --filterGapCausingDupes), and split the .maf file by 1 Mb windows using PHAST msa_split on the reference genome (output was .fasta file). Then, I ran phyloFit (specifying --EM) on each 1 Mb .fasta file of the multi-sequence alignment and phastCons on the output (--msa-format FASTA --target-coverage 0.3 --expected-length 45 --rho 0.3 --viterbi). The most conserved bed file created from flag --viterbi covers most of the reference, which is unexpected. Since the most conserved regions are context-dependent since it's calculated using the hidden markov model, would the most conserved bed file have less sequences if I ran phastCons on the full alignment? Can I rely on the results I get from running it on 1 Mb windows?

My most conserved .bed files identifies short regions (a couple hundred kb) that are overlapping.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions