Convert VCF files into 23andMe-style genotype files using the OpenVar pipeline.
This project allows you to process VCF files through filtering, rsID checking, and conversion into a 23andMe-compatible format for personal research and genealogical use.
- Filter VCF variants based on sample-level
FT
values and chromosome. - Remove variants that do not pass quality filters.
- Add missing rsIDs using a dbSNP VCF or remove variants without rsIDs.
- Convert the final VCF into a 23andMe-style genotype file (biallelic variants only).
- Excludes indels, ensuring only SNPs are included in the 23andMe-style output.
- Create the environment from
environment.yml
:
conda env create -f environment.yml
conda activate vcfto23me
- install the pipeline:
pip install .
Direct script execution
You can run the program directly with Python e.g.:
python vcf_to_23andme/main.py <input_dir> <output_dir> <genome_build> <sample.vcf>
Alternatively, run as a module using python -m:
python -m vcf_to_23andme.main <input_dir> <output_dir> <genome_build> <sample.vcf>
- <input_dir>: Directory containing the original VCF file.
- <output_dir>: Directory where output files will be written.
- <genome_build>: Either GRCh37 or GRCh38.
- <sample.vcf>: Name of the input VCF file.
The pipeline produces:
- Filtered VCF files at each processing stage.
- Final rsID-checked VCF.
- 23andMe-style genotype file: .23andme.txt.
- Columns: rsid, chromosome, position, allele1, allele2.
- Only biallelic SNPs with valid rsIDs are included.
- Missing genotypes are represented as N.
- Includes a header indicating it was generated by OpenVar/vcf_to_23-me and is for personal research only.
This program is licensed under the GNU Affero General Public License v3. See LICENSE.txt for details.