GitHub - SeedLLM/molly: molly, an LLM designed to understand multi-omics data.

Molly

Molly is a Large Language Model composed of multiple encoders, capable of understanding multi-omics data (DNA, RNA, and protein).

Molly 是一个集成了多个 encoder 的大语言模型，能够理解 DNA，RNA 和 protein 序列信息。 Omics-Specific Models（OSMs）指代各自组学赛道中性能领先的专用模型；Enc-Head 则是“组学 Encoder + 分类头”的简洁架构，将预训练编码器与任务相关分类头直接连接。

🌟 Feature

Base Model: Enhanced Qwen3 with nucleotide-transformer and ESM-2 encoders
Optimization: Support Liger-Kernel and FlashAttention for 100% training speedup, see example script

🤗 Download trained model

molly-1.7B molly-4B molly-8B

⚡ How to inference

```bash
./scripts/infer/inference_nt_lora.sh
```

🔥 How to train

Hotfix transformers source code

## transformers/modeling_utils.py
## add 4 lines 
    if not model._tp_plan:
        model_tp_plan = {}
    else:
        model_tp_plan = model._tp_plan

## model._tp_plan -> model_tp_plan
    tp_plan_regex = (
        re.compile("|".join([re.escape(plan) for plan in model_tp_plan]))
        if _torch_distributed_available and torch.distributed.is_initialized()
        else None
    )

Run training script

swanlab login

./scripts/train/run_train.sh

# or for test
./scripts/train/run_train_mini.sh

📌 LICENSE

This project follows apache license.

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
.github/workflows		.github/workflows
baselines		baselines
data_tools		data_tools
scripts		scripts
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Molly

🌟 Feature

🤗 Download trained model

⚡ How to inference

🔥 How to train

📌 LICENSE

About

Uh oh!

Contributors 4

Uh oh!

Languages

License

SeedLLM/molly

Folders and files

Latest commit

History

Repository files navigation

Molly

🌟 Feature

🤗 Download trained model

⚡ How to inference

🔥 How to train

📌 LICENSE

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 4

Uh oh!

Languages