Skip to content

uzh-dqbm-cmi/BEDICT-V2

Repository files navigation

BEDICT-V2:Predicting base editing outcomes with an attention-based deep learning algorithm

Logo


Overview

BEDICT-V2 is a deep learning model designed to predict base editing outcomes using an attention-based algorithm. This repository provides the source code and instructions for using the model. We also have a web app you can try out here. https://go.bedict.app/

Logo

---

Table of Contents


The folder structure:

packages/button
├── absolute_efficiency_model
│   ├── models
│   ├── output
│   └── src
├── dataset
├── main_py_files
│   ├── train.py
│   ├── ....
│   └──inference.py
├── dataset
├── notebooks
├── proportion_model
│   ├── output
│   └── src
├── utils
├── web_application
│   ├── templates
│   ├── static
│   └── app.y
├── README.md
└── requirment.txt

Environment Setup

Set up the environment

Create a virtual environment and install the required dependencies using Conda:

# Create a virtual environment
conda create --name bedict_v2

# Activate the virtual environment
conda activate bedict_v2

# install python
conda install -c anaconda python=3.10

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

# Install dependencies
pip install -r requirements.txt

Usage

Run Inference on Custom Sequences

You can use the pre-trained BEDICT-V2 models to run inference on your own DNA sequences. Choose between running locally via a notebook or using our web app.


🧪 Option 1: Local Inference Using the Notebook

  1. Prepare your input file:

    • Create an Excel file with:
      • Target sequences (20 bases long)
      • PAM sequences (4 bases long)
    • Place this file in the dataset/ directory.
  2. Open the notebook:

    Navigate to the notebooks/ folder and open Inference_user_defined_sequence.ipynb.

  3. Configure your run:

    In the notebook, specify:

    • The input Excel file name
    • The editor name (e.g., ABE8e-NG)
    • Whether you're predicting in vivo or in vitro
  4. Run inference:

    The notebook will automatically run:

    • The absolute efficiency model
    • The proportional model

    It will then merge the predictions into a final result table.

🌐 Option 2: Use the Web App (Easiest)

The easiest way to use BEDICT-V2 is through our web app. Just upload your sequences and get results instantly — no setup required!

📦 Note on Pre-trained Models

Pre-trained models are already included in the repository under corresponding folders, such as BEDICT-V2/absolute_efficiency_model/output/...

Train the Model on Your Own Dataset

To deploy BEDICT-V2 on your own dataset (e.g., screening data), follow the steps below:


1. Prepare the Data

An example dataset is provided in the dataset/ folder. Your dataset should be in Excel format and include the following columns:

  • Target protospacer (20 bases)
  • PAM sequence (4 bases)
  • Outcome sequence (20 bases)

2. Pre-process the Data

Use the preprocessing script to convert your Excel input into model-ready formats:

python main_py_files/generate_two_stage_model_data.py

3. Train the Model

Navigate to the main_py_files/ directory and run:

python train.py

This will run both the absolute efficiency model and the proportional model.
If needed, you can also run them separately using:

  • Absolute_efficiency_main.py for the absolute efficiency model
  • trainval_test_proportions_main.py for the proportional model

Note:
Before training, be sure to specify the appropriate editor (e.g., ABE8e-NG) and whether you're working with in vivo or in vitro conditions in the configuration file.

4. Infer the Model**

Once the model is trained, navigate to the main_py_files/ directory and run the inference script:

python inference.py

License

License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published