You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran a multi-sequence alignment (MSA) using Progressive Cactus (of ~30 species in two closely-related genera), converted the .hal to a .maf file (using --noAncestors, --dupeMode single, --filterGapCausingDupes), and split the .maf file by 1 Mb windows using PHAST msa_split on the reference genome (output was .fasta file). Then, I ran phyloFit (specifying --EM) on each 1 Mb .fasta file of the multi-sequence alignment and phastCons on the output (--msa-format FASTA --target-coverage 0.3 --expected-length 45 --rho 0.3 --viterbi). The most conserved bed file created from flag --viterbi covers most of the reference, which is unexpected. Since the most conserved regions are context-dependent since it's calculated using the hidden markov model, would the most conserved bed file have less sequences if I ran phastCons on the full alignment? Can I rely on the results I get from running it on 1 Mb windows?
My most conserved .bed files identifies short regions (a couple hundred kb) that are overlapping.