Skip to content

TranslationalBioinformaticsUnit/Transformers-for-Multiscale-Genomics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Foundation Transformer Models for Multiscale Genomics

Alt text

This repository supplements the manuscript Multimodal Foundation Transformer Models for Multiscale Genomics. It is designed to introduce the application of Transformer models across different genomic data modalities including DNA, single-cell RNA, and spatial transcriptomics.

Purpose

  • Adaptation of Transformer Models:
    Originally developed for natural language processing, Transformer models have been repurposed here to analyze complex genomic datasets.
  • Key Applications:
    • Predicting RNA Expression: Inferring RNA expression levels from DNA sequences.
    • Classifying Promoter Regions: Distinguishing promoter regions in DNA.
    • Annotating Cell Types: Leveraging single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data for accurate cell type annotation.

Scope

  • DNA Analysis:
    Focus on classifying promoter regions and predicting gene expression from DNA sequences.
  • Single-Cell RNA Sequencing (scRNA-seq):
    Capturing intricate gene expression patterns for precise cell type annotation.
  • Spatial Transcriptomics:
    Integrating gene expression profiles with spatial coordinates to enhance cell type classification.

Background & Learning Objectives

Background

Transformer models have revolutionized natural language processing by effectively managing sequential data. Their success has inspired adaptations in genomics, where they help uncover patterns and dependencies in high-dimensional biological data.

Learning Objectives

  • Understanding Transformers:
    Learn how Transformer architectures can be applied to sequential and high-dimensional genomic data.
  • Application in Genomics:
    Gain insights into predicting gene expression, classifying promoter regions, and annotating cell types using Transformer models.
  • Data Modalities:
    Explore methods for integrating DNA sequences, scRNA-seq, and spatial transcriptomics data.

Prerequisites

  • Technical Background:
    • Basic understanding of single-cell technologies.
    • Python programming skills.
    • Familiarity with tools such as Scanpy and Anndata.

Installation Guide

Before diving into the tutorials, ensure that the necessary packages are installed. We will need Scanpy and Squidpy for RNA-seq and Spatial Transcriptomics analysis and Transformers from Hugging Face for our Transformer models. Use the following commands to install these packages:

Required Packages

  • Scanpy & Squidpy:
    Used for RNA-seq and spatial transcriptomics analysis.
    pip install scanpy
    pip install squidpy
    
    
  • To install Transformers, use this command in a notebook cell:
    pip install transformers
    

Data

All necessary data files and detailed instructions are provided within the notebook to ensure you can easily follow along and apply the concepts demonstrated in the tutorials.

Tutorials

Explore practical applications of Transformers through these hands-on tutorials available via Google Colab:

System Requirements

These tutorials and associated tools are designed to be platform-independent and can be run on a variety of systems. Below are the specifics regarding system compatibility and access to all the notebooks:

Feel free to explore the notebooks, experiment with the code, and dive deeper into applying Transformer models in genomics!

About

No description, website, or topics provided.

Resources

License

GPL-3.0, Unknown licenses found

Licenses found

GPL-3.0
LICENSE
Unknown
Licence

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published