Does phastCons on windows of the MSA identify more conserved elements vs. phastCons on whole MSA?

I ran a multi-sequence alignment (MSA) using Progressive Cactus (of ~30 species in two closely-related genera), converted the .hal to a .maf file (using --noAncestors, --dupeMode single, --filterGapCausingDupes), and split the .maf file by 1 Mb windows using PHAST msa_split on the reference genome (output was .fasta file). Then, I ran phyloFit (specifying --EM) on each 1 Mb .fasta file of the multi-sequence alignment and phastCons on the output (--msa-format FASTA --target-coverage 0.3 --expected-length 45 --rho 0.3 --viterbi). The most conserved bed file created from flag --viterbi covers most of the reference, which is unexpected. Since the most conserved regions are context-dependent since it's calculated using the hidden markov model, would the most conserved bed file have less sequences if I ran phastCons on the full alignment? Can I rely on the results I get from running it on 1 Mb windows?

My most conserved .bed files identifies short regions (a couple hundred kb) that are overlapping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does phastCons on windows of the MSA identify more conserved elements vs. phastCons on whole MSA? #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does phastCons on windows of the MSA identify more conserved elements vs. phastCons on whole MSA? #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions