Skip to content

Consolidate issues into one dataframe #112

@apriha

Description

@apriha

Building on #107, consolidate several issues (e.g., duplicate_rsid, discrepant_XY) into one dataframe with the following columns / dtypes:

Column pandas dtype
rsid pd.StringDtype()
chrom pd.CategoricalDtype()
pos pd.UInt32Dtype()
genotype pd.CategoricalDtype()
duplicate_rsid pd.BooleanDtype()
discrepant_loci pd.BooleanDtype()
discrepant_XY pd.BooleanDtype()
heterozygous_MT pd.BooleanDtype()
discrepant_vcf_position pd.BooleanDtype()
discrepant_merge_position pd.BooleanDtype()
discrepant_merge_genotype pd.BooleanDtype()

Multiple issue columns could take on the value of True, and getting SNPs with issues (e.g., discrepant_XY) could be handled by filtering the issues dataframe.

rsids could appear more than once in this dataframe. However, if an rsid has two or more rows that are equivalent (same values for chrom, pos, and genotype), their issues should be consolidated into one row, with the issue columns flagging the issue(s).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions