Skip to content

cellethology/GLM-Nullsette-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GLM-Nullsette-Benchmark

This is a repository of scripts for reproducing benchmarking GLM on Nullsette.

Figure1

Installation

git clone https://github.com/cellethology/GLM-Nullsette-Benchmark.git
cd GLM-Nullsette-Benchmark
conda env create -f environment.yml
conda activate glm_eval

Inference data

Example data is in data/ directory. For inference data used in the paper, please unzip the data/processed_data.zip file.

Expression cassette data

You will find expression cassette data stored in database directory. It can be easily imported using the following script.

from database import deboer_database, zahm_database, kosuri_database, lagator_database

Model inference script

Inference script for several representation models are in model directory.

Acknowledgements

We acknowledge the valuable contributions to genomic language modeling made by the authors of the following repositories: Evo1, Evo2, Nucleotide Transformer, DNABERT-2, GENERator, METAGENE-1, Caduceus, GPN, GENA-LM, gLM2, PDLLM.

About

This is a repository of scripts for reproducing benchmarking genomic language model on Nullsettes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages