BEDICT-V2 is a deep learning model designed to predict base editing outcomes using an attention-based algorithm. This repository provides the source code and instructions for using the model. We also have a web app you can try out here. https://go.bedict.app/
---packages/button
├── absolute_efficiency_model
│ ├── models
│ ├── output
│ └── src
├── dataset
├── main_py_files
│ ├── train.py
│ ├── ....
│ └──inference.py
├── dataset
├── notebooks
├── proportion_model
│ ├── output
│ └── src
├── utils
├── web_application
│ ├── templates
│ ├── static
│ └── app.y
├── README.md
└── requirment.txt
Create a virtual environment and install the required dependencies using Conda:
# Create a virtual environment
conda create --name bedict_v2
# Activate the virtual environment
conda activate bedict_v2
# install python
conda install -c anaconda python=3.10
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
# Install dependencies
pip install -r requirements.txt
You can use the pre-trained BEDICT-V2 models to run inference on your own DNA sequences. Choose between running locally via a notebook or using our web app.
-
Prepare your input file:
- Create an Excel file with:
- Target sequences (20 bases long)
- PAM sequences (4 bases long)
- Place this file in the
dataset/
directory.
- Create an Excel file with:
-
Open the notebook:
Navigate to the
notebooks/
folder and openInference_user_defined_sequence.ipynb
. -
Configure your run:
In the notebook, specify:
- The input Excel file name
- The editor name (e.g., ABE8e-NG)
- Whether you're predicting in vivo or in vitro
-
Run inference:
The notebook will automatically run:
- The absolute efficiency model
- The proportional model
It will then merge the predictions into a final result table.
The easiest way to use BEDICT-V2 is through our web app. Just upload your sequences and get results instantly — no setup required!
Pre-trained models are already included in the repository under corresponding folders, such as BEDICT-V2/absolute_efficiency_model/output/...
To deploy BEDICT-V2 on your own dataset (e.g., screening data), follow the steps below:
An example dataset is provided in the dataset/
folder. Your dataset should be in Excel format and include the following columns:
- Target protospacer (20 bases)
- PAM sequence (4 bases)
- Outcome sequence (20 bases)
Use the preprocessing script to convert your Excel input into model-ready formats:
python main_py_files/generate_two_stage_model_data.py
Navigate to the main_py_files/
directory and run:
python train.py
This will run both the absolute efficiency model and the proportional model.
If needed, you can also run them separately using:
Absolute_efficiency_main.py
for the absolute efficiency modeltrainval_test_proportions_main.py
for the proportional model
Note:
Before training, be sure to specify the appropriate editor (e.g.,ABE8e-NG
) and whether you're working with in vivo or in vitro conditions in the configuration file.
Once the model is trained, navigate to the main_py_files/
directory and run the inference script:
python inference.py