CLL_GRN_paper

Welcome to the CLL_GRN_paper repository. This project contains the full code, data and analysis scripts supporting our manuscript:

📄 Time-series RNA-Seq and data-driven network inference unveil dynamic cell phenotypes in Chronic Lymphocytic Leukaemia

This comprehensive resource is designed for reproducibility and transparency in our research, providing a step-by-step account of our workflow from raw data processing to network inference and figure generation.

Overview

Our research integrates time-series RNA sequencing with data-driven gene regulatory network (GRN) inference to elucidate the dynamic cell phenotypes in Chronic Lymphocytic Leukaemia (CLL) within a simulated tumour microenvironment (TME). We compare monoculture and autologous culture conditions over five time points using an in vitro model based on patient-derived CLL and immune cells. Our goal is to reveal how immune interactions drive significant changes in gene expression, leading to the identification of critical regulatory pathways and culture/patient specific network dynamics.

Motivation and background

Cancer is a complex disease influenced by a multitude of factors, including the interplay between cancer cells and their surrounding immune cells. In CLL, the interaction with immune cells can rewire gene regulatory networks and alter cellular phenotypes, affecting disease progression. This repository documents our comprehensive approach:

Time series analysis: Capturing dynamic changes over five time points (14 days).
GRN inference: Employing data-driven algorithms based on transcription factor activity.
Patient specificity: Emphasizing the role of patient heterogeneity in regulatory mechanisms.
Network analysis: Dissecting gene modules related to biological processes such as immune response.

Our workflow offers a reproducible blueprint for similar studies in oncology and systems biology.

Repository structure

Data

Raw_data/ → Contains original RNA-Seq files (e.g., count matrices) and list of human TFs.
Processed_data/ → Includes normalized (TPM) datasets.
Metadata/ → Detailed experimental design and time point information.

Figures

This repository contains only the final figures used in the paper. It includes high-quality, publication-ready plots such as PCA, heatmaps, networks, and time-series visualizations. No intermediate or exploratory plots are provided.

Scripts and Results

The Results directory contains the processed outputs from various computational analyses conducted in this project. Each subfolder corresponds to a specific analytical method and is directly associated with the script(s) (Scripts/) responsible for generating the results, ensuring full reproducibility. Below is the structured mapping of results to their corresponding scripts:

Normalized data (Normalized_data/)

Generated by: Data_processing.R and Data_processing.Rmd

Differential Gene Expression Analysis (Differential_expression/)

Generated by: DGEA_pathway_enrichemnt.R

Ligand-receptor analysis (LR_analysis/)

Generated by: LR_pair_analysis_CLL.Rmd

Gene Set Variation Analysis (GSVA) (GSVA/)

Generated by: GSVA.Rmd

Independent Component Analysis (ICA) (ICA/)

Generated by: Biodica.R and Biodica.Rmd

VIPER analysis (VIPER/)

Generated by: Viper.R and Viper.Rmd

msVIPER analysis (msVIPER/)

Generated by: msVIPER.R, msVIPER.Rmd, extract_msViper_results.R
Pathway enrichment of most differentially activated TFs is generated by: path_enrichment_TF_targets.R, concatenate_pathways_results.R

GRN inference with dynGENIE3 and TF activity correlation (Network_analysis/)

Generated by: GRN_inference_dynGENIE3.Rmd and process_msviperres.R

Each script implements a computational method to process the RNA-Seq dataset, contributing to downstream analyses and biological interpretation. This structured organization ensures clarity and reproducibility in the workflow.

Data description

Two different types of cell cultures are used for this study. Autologous culture: This culture comprises all peripheral blood mononuclear cells (PBMCs). The culture generation protocol for this category involved the collection of samples from three separate patients. Each patient's samples underwent biological duplication, generating two distinct replicates per patient. B-CLL monoculture: B-CLL cells are isolated and cultured individually. For the two condition, the cells in suspension are collected At 5 different time points (D1, D4, D8, D11 and D14), on which we perform RNA sequencing.

Installation and dependencies

This project requires R 4.4.2 (or later) on Ubuntu 24.04.2 LTS. The analysis relies on Bioconductor and CRAN packages for data processing, network inference, and visualization.

Prerequisites

Operating system: Ubuntu 24.04.2 LTS (or equivalent Linux-based system)
R Version: 4.4.2 (2024-10-31)

Session information

For a complete list of dependencies and system details, refer to the session information file:

📄 sessionInfo.txt (located in the project root)

Environment management with renv

To ensure reproducibility, the project uses renv for dependency management. Restore the project environment:

renv::restore()

This ensures a fully reproducible and stable environment for the analysis workflow.

Usage instructions

Running the complete pipeline

The analysis workflow encompasses multiple stages, including data preprocessing, network inference, and other analyses. Each analysis is conducted using RMarkdown (.Rmd) scripts. To execute any stage, open and knit the corresponding .Rmd file in the Script directory. These scripts handle various tasks and outputs from each stage are systematically stored in the corresponding results directories, facilitating structured data interpretation and manuscript preparation.

Note: Each R Markdown file contains detailed instructions and explanations for its respective section of the analysis.

Running individual modules

Interactive sessions: You can also run individual R scripts interactively in RStudio for step-by-step analysis.
Configuration files: Some scripts may refer to configuration settings; please modify these directly in the R Markdown files as needed.

Contribution guidelines

Contributions are welcomed! To contribute:

🔹 Fork the repository.

🔹 Create a feature branch (git checkout -b feature/your-feature).

🔹 Commit your changes with descriptive messages.

🔹 Push to your branch (git push origin feature/your-feature).

🔹 Open a pull request describing your changes.

Follow coding conventions & include documentation updates!

Citation

The preprint with all results is available in bioRxiv.

Troubleshooting and FAQ

Common issues

Package installation errors? Ensure that you are using R version 4.4.2. For a reproducible environment, use renv::restore() to install the correct package versions, or refer to sessionInfo.txt to verify library versions.
Data format errors? Verify that your raw data and metadata files are formatted correctly and match the expected inputs in the R Markdown files.
Knit failures? Review the error messages when knitting R Markdown files. Often, missing packages or syntax errors in the code chunks are the cause.

FAQ

Q: How do I update the GRN inference parameters?
A: Modify the parameter section in Script/Run_dynGENIE3.Rmd as needed, then re-knit the document.
Q: Can this pipeline be adapted for other datasets?
A: Yes. The R scripts and R Markdown files are modular and can be adapted for similar RNA-Seq and network analysis projects.

License

This project is licensed under the MIT LICENSE. Please refer to the LICENSE file for additional details.

Contact

For questions, suggestions, or collaboration opportunities, please contact:

Hugo Chenel – hugo.chenel@inserm.fr
Malvina Marku – malvina.marku@inserm.fr
Vera Pancaldi – vera.pancaldi@inserm.fr

Maintained by the NetB(IO)² research team at CRCT, Toulouse.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Data		Data
Figures		Figures
Results		Results
Script		Script
Supplementary files		Supplementary files
omnipathr-log		omnipathr-log
renv		renv
.Rprofile		.Rprofile
LICENSE		LICENSE
README.md		README.md
renv.lock		renv.lock
sessionInfo.txt		sessionInfo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CLL_GRN_paper

Table of contents

Overview

Motivation and background

Repository structure

Data

Figures

Scripts and Results

Data description

Installation and dependencies

Prerequisites

Session information

Environment management with renv

Usage instructions

Running the complete pipeline

Running individual modules

Contribution guidelines

Citation

Troubleshooting and FAQ

Common issues

FAQ

License

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

VeraPancaldiLab/CLL_GRN_paper

Folders and files

Latest commit

History

Repository files navigation

CLL_GRN_paper

Table of contents

Overview

Motivation and background

Repository structure

Data

Figures

Scripts and Results

Data description

Installation and dependencies

Prerequisites

Session information

Environment management with renv

Usage instructions

Running the complete pipeline

Running individual modules

Contribution guidelines

Citation

Troubleshooting and FAQ

Common issues

FAQ

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages