Skip to content

SeedLLM/molly

Repository files navigation

Image

Molly

Molly is a Large Language Model composed of multiple encoders, capable of understanding multi-omics data (DNA, RNA, and protein).

Molly 是一个集成了多个 encoder 的大语言模型,能够理解 DNA,RNA 和 protein 序列信息。 Image Omics-Specific Models(OSMs)指代各自组学赛道中性能领先的专用模型;Enc-Head 则是“组学 Encoder + 分类头”的简洁架构,将预训练编码器与任务相关分类头直接连接。

🌟 Feature

🤗 Download trained model

⚡ How to inference

```bash
./scripts/infer/inference_nt_lora.sh
```

🔥 How to train

  1. Hotfix transformers source code

    ## transformers/modeling_utils.py
    ## add 4 lines 
        if not model._tp_plan:
            model_tp_plan = {}
        else:
            model_tp_plan = model._tp_plan
    
    ## model._tp_plan -> model_tp_plan
        tp_plan_regex = (
            re.compile("|".join([re.escape(plan) for plan in model_tp_plan]))
            if _torch_distributed_available and torch.distributed.is_initialized()
            else None
        )
  2. Run training script

    swanlab login
    
    ./scripts/train/run_train.sh
    
    # or for test
    ./scripts/train/run_train_mini.sh

📌 LICENSE

This project follows apache license.

About

molly, an LLM designed to understand multi-omics data.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •