Molly is a Large Language Model composed of multiple encoders, capable of understanding multi-omics data (DNA, RNA, and protein).
Molly 是一个集成了多个 encoder 的大语言模型,能够理解 DNA,RNA 和 protein 序列信息。
 Omics-Specific Models(OSMs)指代各自组学赛道中性能领先的专用模型;Enc-Head 则是“组学 Encoder + 分类头”的简洁架构,将预训练编码器与任务相关分类头直接连接。
Omics-Specific Models(OSMs)指代各自组学赛道中性能领先的专用模型;Enc-Head 则是“组学 Encoder + 分类头”的简洁架构,将预训练编码器与任务相关分类头直接连接。
- Base Model: Enhanced Qwen3 with nucleotide-transformer and ESM-2 encoders
- Optimization: Support Liger-Kernel and FlashAttention for 100% training speedup, see example script
```bash
./scripts/infer/inference_nt_lora.sh
```
- 
Hotfix transformers source code ## transformers/modeling_utils.py ## add 4 lines if not model._tp_plan: model_tp_plan = {} else: model_tp_plan = model._tp_plan ## model._tp_plan -> model_tp_plan tp_plan_regex = ( re.compile("|".join([re.escape(plan) for plan in model_tp_plan])) if _torch_distributed_available and torch.distributed.is_initialized() else None ) 
- 
Run training script swanlab login ./scripts/train/run_train.sh # or for test ./scripts/train/run_train_mini.sh
This project follows apache license.
