-
Notifications
You must be signed in to change notification settings - Fork 30
Description
I am currently analyzing approximately 3500 samples from a specific country to discern the prevalence of various mutations within a larger population and visualize their distribution. However, I am facing not understanding the specific data to be loaded required for this task.
I have made aligned fasta sequences with the reference genome, a MAT file (.pb) containing annotated mutations for each sequence, and a jsonl file detailing the phylogenetic tree. With these datasets, I am uncertain about the what other data should be required to effectively plot and identify the spread of mutations.
Could you guide the specific data elements I should focus on to create an accurate representation of mutation prevalence within the sampled population?
Thank you