GitHub - VerisimilitudeX/DNAnalyzer: Precision genomics for everyone, everywhere. Powered by private AI.

Next-Generation On-Device DNA Insights

Private. Precise. Powered by AI.

About DNAnalyzer

DNAnalyzer is a biotechnology research and deployment company revolutionizing genomic analysis through AI-powered, privacy-first technology. Supported by Anthropic for Startups, our mission is to democratize DNA analysis by delivering enterprise-grade genomic insights through secure on-device computation.

Founded by Piyush Acharya, DNAnalyzer brings together 46 leading computational biologists and computer scientists from Microsoft Research, the University of Macedonia, and Northeastern University.

Our groundbreaking work has been presented at Y Combinator's Mini YC and reached semifinals (top 10%) for a16z's $1M investment.

Why DNAnalyzer Matters

Industry Standard	DNAnalyzer's Innovation
$100 average cost for DNA sequencing	Completely Free analysis
Up to $600 for basic health insights	Universally accessible, empowering underserved communities worldwide*
78% of companies share genetic data with third parties	100% Private: All computation happens locally on your device
Data breaches compromise millions (23andMe: 6.9M users in 2023)	Zero central storage: Your genetic data never leaves your device

"Unlike a password, compromised genetic data is permanently exposed. You cannot change it."

*Excluding testing costs. We're developing an affordable in-house testing kit to eliminate this final barrier.

Core Capabilities

Codon & Protein Detection Rapidly identifies protein-coding regions, amino acid chains, and critical genomic indicators with unprecedented accuracy.	GC-rich Region Analysis Precisely pinpoints genomic promoter areas with significant biological implications (45-60% GC-content).	Neurological Genomics Detects genetic markers associated with neurological conditions including autism, ADHD, and schizophrenia.
Promoter Element Identification Locates key transcription initiation sequences (BRE, TATA, INR, DPE) with surgical precision.	Multi-format FASTA Integration Seamlessly supports comprehensive DNA database analysis from uploads or external sources.	Met CLI Automation Harnesses a powerful CLI interface for scripting, automation, and enterprise-scale analysis tasks.
Privacy-First Ancestry Insights Estimates continental origin using on-device reference panels without compromising privacy.

See the Ancestry Snapshot guide for detailed usage instructions.

New: Interactive web dashboard for real-time visualization is now available under web/dashboard, seamlessly communicating with the local REST API at /api.

Intelligent Natural Language Reports

When an OpenAI key is available, each CLI run now produces two narratives immediately after the metrics:

Researcher Report – Technical interpretation with references to the detected GC windows, ORFs, PRS coverage, etc.
Layperson Report – Plain-language explainer that highlights the most actionable findings.

Set OPENAI_API_KEY (or AI_PROVIDER=openai plus the corresponding key) before launching DNAnalyzer and the summaries will appear in the terminal and be saved to analysis_output/.../reports/. Use --no-ai if you want to skip the call for a particular run.

Quickstart Guide

Ready to unlock your genomic insights? Begin precision DNA analysis in seconds:

# Clone the repository
git clone https://github.com/VerisimilitudeX/DNAnalyzer.git

# Navigate to project directory
cd DNAnalyzer

# Install dependencies
./gradlew build

Enable AI summaries (optional): export your API key before running the CLI, e.g.

export OPENAI_API_KEY=sk-...
# optionally pick a model
export OPENAI_MODEL=gpt-4o-mini

Pass --no-ai if you prefer a run without contacting the API; all analytic outputs still generate.

🚀 NEW: Intuitive Launch Script

We've transformed DNAnalyzer's user experience! Say goodbye to complex command-line options:

# Simple preset modes
./easy_dna.sh your_file.fa basic      # Standard analysis
./easy_dna.sh your_file.fa detailed   # Comprehensive analysis  
./easy_dna.sh your_file.fa mutations  # Generate mutations
./easy_dna.sh your_file.fa all        # Complete suite
./easy_dna.sh your_file.fa custom     # Interactive mode

# Or use the traditional Java method
java -jar build/libs/DNAnalyzer-1.2.1.jar your_file.fa

easy_dna.sh automatically uses a feature-complete jar from build/libs/ when available, falls back to ./gradlew run, and warns if only the legacy basic jar is present. Override the jar path with DNANALYZER_JAR=/path/to/dnanalyzer.jar if needed.

📁 NEW: Intelligent Output Organization

All generated files are automatically organized in a clean, intuitive directory structure:

output/dnanalyzer_output_{filename}_{timestamp}/
├── charts/          # Quality control and analysis visualizations (PNG)
├── sequences/       # Generated mutations and processed sequences (FASTA)
└── reports/         # Comprehensive analysis reports and summaries (HTML)

🎯 NEW: Smart Analysis Profiles

Leverage predefined profiles tailored to your workflow:

# Select analysis profiles optimized for common use cases
java -jar build/libs/DNAnalyzer-1.2.1.jar --profile research your_file.fa
java -jar build/libs/DNAnalyzer-1.2.1.jar --profile clinical your_file.fa
java -jar build/libs/DNAnalyzer-1.2.1.jar --profile mutation your_file.fa

# Available profiles: basic, detailed, quick, research, mutation, clinical
# List available presets directly from the CLI
java -jar build/libs/DNAnalyzer-1.2.1.jar --profile list

📚 Documentation

Getting Started Guide - Essential setup and configuration
Enhanced Features Guide - NEW! Comprehensive guide to all user experience improvements
Command Reference - Complete command-line options and examples
Changelog - NEW! Detailed release notes and version history

Polygenic Health-Risk Scores

DNAnalyzer now features an advanced polygenic risk score calculator alongside engaging trait predictions. Simply provide your 23andMe data file with a CSV of SNP weights to compute personalized scores:

./gradlew run --args='--23andme my_data.txt --prs assets/risk/heart_disease_prs.csv sample.fa'
java -jar build/libs/DNAnalyzer-1.2.1.jar --23andme my_data.txt --prs assets/risk/heart_disease_prs.csv sample.fa

The CLI now parses the standard tab-delimited 23andMe export, aligns it with each provided weight table, and reports the raw and normalized contribution of every SNP in the trait. Missing or uncallable variants are clearly identified so you can assess coverage before acting on a score. See the Polygenic Risk Scoring guide for a detailed walkthrough and example outputs.

Trait predictions and risk scores are displayed following standard DNA analysis. Disclaimer: Trait predictions are provided for educational purposes only and should not be used for medical or health decisions.

REST API

The project now ships with a production-ready Spring Boot service that powers the web dashboard and external integrations. Start it locally with:

./gradlew bootRun

The API is exposed under /api/v1 and returns JSON. Key endpoints include:

Endpoint	Method	Description
`/api/v1/status`	GET	Health check and version metadata
`/api/v1/analyze`	POST (multipart)	Run the full analysis pipeline on an uploaded FASTA/FASTQ/plain-text sequence
`/api/v1/base-pairs`	POST (JSON)	Return base pair counts, percentages, and GC content for a sequence
`/api/v1/reading-frames`	POST (JSON)	Identify open reading frames in forward and reverse directions
`/api/v1/find-proteins`	POST (JSON)	Predict candidate proteins (top 10 by length)
`/api/v1/manipulate`	POST (JSON)	Reverse, complement, or reverse-complement a sequence
`/api/v1/parse`	POST (multipart)	Extract the first sequence record from FASTA, FASTQ, or plain text uploads
`/api/v1/analyze-genetic`	POST (multipart)	Score 23andMe/Ancestry genotype files for ancestry matches and bundled polygenic risk panels

Example: analyze a sequence file and receive the same payload used by the dashboard visualizations.

curl -F dnaFile=@sample.fa http://localhost:8080/api/v1/analyze

Need just the base-pair breakdown? Send the raw sequence as JSON:

curl -X POST http://localhost:8080/api/v1/base-pairs \
     -H 'Content-Type: application/json' \
     -d '{"sequence": "ATGCGCATTA"}'

Genotype data uploads are also supported. Include snpAnalysis=true to stream per-SNP contributions in the response.

curl -F geneticFile=@my_23andme.txt -F snpAnalysis=true \
     http://localhost:8080/api/v1/analyze-genetic

All responses include descriptive error messages for invalid payloads, making the API easy to drive from scripts in Python, R, or any other environment.

GPU-Accelerated Smith-Waterman

Our optional PyOpenCL module delivers GPU acceleration for local sequence alignment. When no compatible GPU is detected, the implementation gracefully falls back to optimized Python execution.

Execute the module directly or via CLI:

python -m src.python.gpu_smith_waterman SEQ1 SEQ2

From the DNAnalyzer CLI, request Smith-Waterman alignment by combining --sw-align with --align. If you already provided a primary FASTA file, pass the reference sequence to --align:

java -jar dnanalyzer.jar sample.fa --align reference.fa --sw-align

Alternatively, supply both query and reference sequences directly to --align:

java -jar dnanalyzer.jar --align query.fa reference.fa --sw-align

See GPU_Smith_Waterman.md for comprehensive technical details.

Packaging Analysis Sessions

After completing your DNAnalyzer run, archive inputs, logs, and interactive HTML reports using package-session.sh:

./scripts/package-session.sh sample.fa

The generated time-stamped ZIP archive includes the original FASTA, the captured analysis.log, a self-contained report.html summarizing base counts and percentages, plus any QC images found in assets/reports.

Development Roadmap

Upcoming Innovation	Description
Optimized SQL Database	Scalable architecture supporting genomic datasets across diverse species
Enhanced Neural Network	Seamless integration with third-party genotype datasets (23andMe, AncestryDNA)
DIAMOND Implementation	Harmonizing DIAMOND's speed with BLAST's accuracy for next-generation analyses
AI Trait Predictor Suite	Engaging, shareable predictions—from cilantro taste to chronotype—validated by peer-reviewed SNP studies
Secure Share & Compare	Offline-generated QR summaries enable selective insight sharing with healthcare providers—raw genome remains private

Contribute to DNAnalyzer

We enthusiastically welcome contributions from all experience levels:

Academic Citations

When referencing DNAnalyzer in academic work, please cite:

@software{Acharya_DNAnalyzer_ML-Powered_DNA_2022,
  author = {Acharya, Piyush},
  doi = {10.5281/zenodo.14556577},
  month = oct,
  title = {{DNAnalyzer: ML-Powered DNA Analysis Platform}},
  url = {https://github.com/VerisimilitudeX/DNAnalyzer},
  version = {3.5.0-beta.0},
  year = {2022}
}

⚖ Terms of Use

DNAnalyzer is provided "as-is." Usage of this software implies acceptance of all associated risks and liabilities. DNAnalyzer disclaims responsibility for any loss or damage arising from its use.

For assistance or inquiries: help@dnanalyzer.org

DNAnalyzer, © Piyush Acharya 2025. A fiscally sponsored 501(c)(3) nonprofit (EIN: 81-2908499), licensed under MIT License.

Impact Metrics

Metric	Current Value
GitHub Stars	147
Forks	62
Contributors	46
Monthly FASTA files analyzed*	5,000+
Total downloads (Gradle/CLI)	4,042
Deployments via GitHub Pages	485

Community Engagement

Discord · Active #genomics-ai channel (80+ members)
Open Issues for First-Timers · Labeled good-first-issue to mentor newcomers
Monthly Release Notes · Transparent changelogs with contributor recognition

*Monthly FASTA throughput calculated from anonymized CLI telemetry and public workflow logs.

Project Growth

Support DNAnalyzer

23andMe

Get 10% off your order
DNAnalyzer earns $20 per referral

Ancestry® Membership

Get up to 24% off membership
DNAnalyzer earns $10 per referral

Name		Name	Last commit message	Last commit date
Latest commit History 1,690 Commits
.devcontainer		.devcontainer
.github		.github
.idx		.idx
.vscode		.vscode
assets		assets
docs		docs
gradle/wrapper		gradle/wrapper
installer		installer
lib		lib
output		output
sample-plugins		sample-plugins
scripts		scripts
src		src
web		web
.classpath		.classpath
.deepsource.toml		.deepsource.toml
.gitattributes		.gitattributes
.gitignore		.gitignore
.project		.project
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.md		LICENSE.md
LICENSE_zh.md		LICENSE_zh.md
README.md		README.md
SECURITY.md		SECURITY.md
build.gradle		build.gradle
demo_all_features.sh		demo_all_features.sh
easy_dna.sh		easy_dna.sh
gradlew		gradlew
gradlew.bat		gradlew.bat
implementation_plan.md		implementation_plan.md
open_web_ui.sh		open_web_ui.sh
run_detailed.sh		run_detailed.sh
run_dna.sh		run_dna.sh
run_dna_verbose.sh		run_dna_verbose.sh
settings.gradle.kts		settings.gradle.kts
show_help.sh		show_help.sh
test_enhanced.sh		test_enhanced.sh
web_structure.txt		web_structure.txt

Uh oh!

Uh oh!

License

Uh oh!

VerisimilitudeX/DNAnalyzer

Folders and files

Latest commit

History

Repository files navigation

Next-Generation On-Device DNA Insights

About DNAnalyzer

Why DNAnalyzer Matters

Core Capabilities

Intelligent Natural Language Reports

Quickstart Guide

🚀 NEW: Intuitive Launch Script

📁 NEW: Intelligent Output Organization

🎯 NEW: Smart Analysis Profiles

📚 Documentation

Polygenic Health-Risk Scores

REST API

GPU-Accelerated Smith-Waterman

Packaging Analysis Sessions

Development Roadmap

Contribute to DNAnalyzer

Academic Citations

⚖ Terms of Use

Impact Metrics

Community Engagement

Project Growth

Support DNAnalyzer

23andMe

Ancestry® Membership

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 15

Sponsor this project

Uh oh!

Uh oh!

Contributors 47

Uh oh!

Languages