-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Milestone
Description
Building on #107, consolidate several issues (e.g., duplicate_rsid
, discrepant_XY
) into one dataframe with the following columns / dtypes:
Column | pandas dtype |
---|---|
rsid |
pd.StringDtype() |
chrom |
pd.CategoricalDtype() |
pos |
pd.UInt32Dtype() |
genotype |
pd.CategoricalDtype() |
duplicate_rsid |
pd.BooleanDtype() |
discrepant_loci |
pd.BooleanDtype() |
discrepant_XY |
pd.BooleanDtype() |
heterozygous_MT |
pd.BooleanDtype() |
discrepant_vcf_position |
pd.BooleanDtype() |
discrepant_merge_position |
pd.BooleanDtype() |
discrepant_merge_genotype |
pd.BooleanDtype() |
Multiple issue columns could take on the value of True
, and getting SNPs with issues (e.g., discrepant_XY
) could be handled by filtering the issues dataframe.
rsid
s could appear more than once in this dataframe. However, if an rsid
has two or more rows that are equivalent (same values for chrom
, pos
, and genotype
), their issues should be consolidated into one row, with the issue columns flagging the issue(s).
Metadata
Metadata
Assignees
Labels
No labels