Skip to content

inconsistent estimations in heterozygous plant genomes #8

@dcopetti

Description

@dcopetti

Hello,
I am looking for some advice on how to run findGSE and how to interpret the data when dealing with heterozygosity and plant genomes.
I summarized the runs on 4 species in this file:
FindGSE_tests_220124.pdf
The results show inaccuracy in genome size estimation as well as inconsistency in the resulting values when parameters change.

Briefly:

  • the estimations vary when having exp_hom=NN or not - all 4 cases
  • correctly, if using exp_hom=NN at the mode of the homo peak, no estimation results (except for Cgil - because the het peak is buried?
  • if using exp_hom=NN larger than the mode of the homo peak, the estimation does not change
  • at different exp_hom=NN values, some estimations vary by a lot, some by very little.

by species:

  • in Caus, findGSE seems to be working well, concordant with the HiFi assembly
  • Lmul varies by a lot, with the correct value resulting when using exp_hom=NN LOWER than the homo peak
  • Cgig is always below 1 Gb (expected: 1.4 Gb), with some runs failing
  • Cgil: a 4-fold size variation, though with 38 Gb raw HiFi data and a homo peak at 87, the genome could be at ~438 Mb.

The documentation says that the exp_hom=NN should be between the homo peak and its double 2*hom_peak>x>hom_peak ! and I see that there is consistency in the estimations in that. The only thing is, sometimes the values are correct (Caus), others they are off (Cgig, Lmul).
Can you please detail some guidelines on how to use the tool and get a reliable and consistent estimation in the case that the flow cytometry value is not known?
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions