- An agent-based pipeline and web UI for variant prioritization. The code orchestrates multiple specialized agents to gather data, score variants, and produce reports.
- A Flask-based frontend (in
app/) lets users upload VCFs or submit phenotype queries and streams run logs in real time.
This project is intended for research and experimentation. It relies on a number of scientific and machine-learning libraries and may require a dedicated conda environment.
- CLI entrypoint:
main.pyruns the orchestrator pipeline on a VCF and phenotype. - Web UI:
app/app.pyprovides a small Flask app (templates inapp/templates/) to upload VCFs and view streaming logs. - Core modules:
agents.py(agent classes),utils.py,llm_utils.py,Evo2score.py,prompts.py,config.py. - Example and sample data:
data/anduploads/(uploads are created at runtime).
- Python 3.12 (code and pyc files indicate 3.12).
- A conda environment is recommended. The project contains a pinned
requirements.txtwith many heavy packages (Torch, Transformers, scientific stack). - The following environment variables and API keys should be available as environment variables -
ALPGENOME_API_KEY,PCAI_EVO2_ENDPOINT,PCAI_EVO2_TOKEN - Ollama should be serving and have appropriate models available as mentioned in the
config.py - Currently EVO2 is running on St. Jude's internal network as specified in
agents.runEvo2().
- Create and activate a conda environment (example):
conda create -n kids25 python=3.12 -y
conda activate kids25- Install Python dependencies:
pip install -r requirements.txtProcess a VCF with a phenotype:
python main.py --vcf_file ./data/clinvar.hg19.chr17.test.vcf --phenotype "breast cancer"main.py exposes additional arguments (e.g., --conda_env) — run python main.py -h for details.
Start the Flask server (development mode):
python app/app.pyOpen http://localhost:5001 in your browser. The UI supports uploading .vcf / .vcf.gz files and submitting phenotype text. Logs stream to the UI while the pipeline runs.
main.py— CLI entrypoint and orchestrator setupagents.py— agent implementations and coordinationapp/— Flask application, templates, static assetsdata/— example VCFs, reference FASTA and example outputsuploads/— runtime uploads (created automatically)requirements.txt— pinned Python packages- helper modules:
utils.py,llm_utils.py,prompts.py,config.py LICENSE— project license
- The codebase references local model servers and services (comments in
main.pymention Ollama and model endpoints). Ensure any required external services or credentials are available before running those parts. - Pickled artifacts live in the
./data(e.g.,aggregated_variants_object.pkl). These are data artifacts used by agents. - This README is a concise starting guide. I can expand it with development notes, testing instructions, architecture diagrams, or example outputs on request.
See the LICENSE file in the repository root.
