Welcome to the CLL_GRN_paper repository. This project contains the full code, data and analysis scripts supporting our manuscript:
📄 Time-series RNA-Seq and data-driven network inference unveil dynamic cell phenotypes in Chronic Lymphocytic Leukaemia
This comprehensive resource is designed for reproducibility and transparency in our research, providing a step-by-step account of our workflow from raw data processing to network inference and figure generation.
- Overview
- Motivation and background
- Repository structure
- Data description
- Installation and dependencies
- Usage instructions
- Contribution guidelines
- Citation
- Troubleshooting and FAQ
- License
- Contact
Our research integrates time-series RNA sequencing with data-driven gene regulatory network (GRN) inference to elucidate the dynamic cell phenotypes in Chronic Lymphocytic Leukaemia (CLL) within a simulated tumour microenvironment (TME). We compare monoculture and autologous culture conditions over five time points using an in vitro model based on patient-derived CLL and immune cells. Our goal is to reveal how immune interactions drive significant changes in gene expression, leading to the identification of critical regulatory pathways and culture/patient specific network dynamics.
Cancer is a complex disease influenced by a multitude of factors, including the interplay between cancer cells and their surrounding immune cells. In CLL, the interaction with immune cells can rewire gene regulatory networks and alter cellular phenotypes, affecting disease progression. This repository documents our comprehensive approach:
- Time series analysis: Capturing dynamic changes over five time points (14 days).
- GRN inference: Employing data-driven algorithms based on transcription factor activity.
- Patient specificity: Emphasizing the role of patient heterogeneity in regulatory mechanisms.
- Network analysis: Dissecting gene modules related to biological processes such as immune response.
Our workflow offers a reproducible blueprint for similar studies in oncology and systems biology.
- Raw_data/ → Contains original RNA-Seq files (e.g., count matrices) and list of human TFs.
- Processed_data/ → Includes normalized (TPM) datasets.
- Metadata/ → Detailed experimental design and time point information.
This repository contains only the final figures used in the paper. It includes high-quality, publication-ready plots such as PCA, heatmaps, networks, and time-series visualizations. No intermediate or exploratory plots are provided.
The Results directory contains the processed outputs from various computational analyses conducted in this project. Each subfolder corresponds to a specific analytical method and is directly associated with the script(s) (Scripts/) responsible for generating the results, ensuring full reproducibility. Below is the structured mapping of results to their corresponding scripts:
Normalized data (Normalized_data/
)
- Generated by:
Data_processing.R
andData_processing.Rmd
Differential Gene Expression Analysis (Differential_expression/
)
- Generated by:
DGEA_pathway_enrichemnt.R
Ligand-receptor analysis (LR_analysis/
)
- Generated by:
LR_pair_analysis_CLL.Rmd
Gene Set Variation Analysis (GSVA) (GSVA/
)
- Generated by:
GSVA.Rmd
Independent Component Analysis (ICA) (ICA/
)
- Generated by:
Biodica.R
andBiodica.Rmd
VIPER analysis (VIPER/
)
- Generated by:
Viper.R
andViper.Rmd
msVIPER analysis (msVIPER/
)
- Generated by:
msVIPER.R
,msVIPER.Rmd
,extract_msViper_results.R
- Pathway enrichment of most differentially activated TFs is generated by:
path_enrichment_TF_targets.R
,concatenate_pathways_results.R
GRN inference with dynGENIE3 and TF activity correlation (Network_analysis/
)
- Generated by:
GRN_inference_dynGENIE3.Rmd
andprocess_msviperres.R
Each script implements a computational method to process the RNA-Seq dataset, contributing to downstream analyses and biological interpretation. This structured organization ensures clarity and reproducibility in the workflow.
Two different types of cell cultures are used for this study. Autologous culture: This culture comprises all peripheral blood mononuclear cells (PBMCs). The culture generation protocol for this category involved the collection of samples from three separate patients. Each patient's samples underwent biological duplication, generating two distinct replicates per patient. B-CLL monoculture: B-CLL cells are isolated and cultured individually. For the two condition, the cells in suspension are collected At 5 different time points (D1, D4, D8, D11 and D14), on which we perform RNA sequencing.
This project requires R 4.4.2 (or later) on Ubuntu 24.04.2 LTS. The analysis relies on Bioconductor and CRAN packages for data processing, network inference, and visualization.
- Operating system: Ubuntu 24.04.2 LTS (or equivalent Linux-based system)
- R Version: 4.4.2 (2024-10-31)
For a complete list of dependencies and system details, refer to the session information file:
📄 sessionInfo.txt
(located in the project root)
To ensure reproducibility, the project uses renv for dependency management. Restore the project environment:
renv::restore()
This ensures a fully reproducible and stable environment for the analysis workflow.
The analysis workflow encompasses multiple stages, including data preprocessing, network inference, and other analyses. Each analysis is conducted using RMarkdown (.Rmd) scripts. To execute any stage, open and knit the corresponding .Rmd file in the Script
directory. These scripts handle various tasks and outputs from each stage are systematically stored in the corresponding results directories, facilitating structured data interpretation and manuscript preparation.
Note: Each R Markdown file contains detailed instructions and explanations for its respective section of the analysis.
-
Interactive sessions: You can also run individual R scripts interactively in RStudio for step-by-step analysis.
-
Configuration files: Some scripts may refer to configuration settings; please modify these directly in the R Markdown files as needed.
Contributions are welcomed! To contribute:
🔹 Fork the repository.
🔹 Create a feature branch (git checkout -b feature/your-feature
).
🔹 Commit your changes with descriptive messages.
🔹 Push to your branch (git push origin feature/your-feature
).
🔹 Open a pull request describing your changes.
Follow coding conventions & include documentation updates!
The preprint with all results is available in bioRxiv.
-
Package installation errors? Ensure that you are using R version 4.4.2. For a reproducible environment, use
renv::restore()
to install the correct package versions, or refer tosessionInfo.txt
to verify library versions. -
Data format errors? Verify that your raw data and metadata files are formatted correctly and match the expected inputs in the R Markdown files.
-
Knit failures? Review the error messages when knitting R Markdown files. Often, missing packages or syntax errors in the code chunks are the cause.
-
Q: How do I update the GRN inference parameters?
-
A: Modify the parameter section in
Script/Run_dynGENIE3.Rmd
as needed, then re-knit the document. -
Q: Can this pipeline be adapted for other datasets?
-
A: Yes. The R scripts and R Markdown files are modular and can be adapted for similar RNA-Seq and network analysis projects.
This project is licensed under the MIT LICENSE. Please refer to the LICENSE file for additional details.
For questions, suggestions, or collaboration opportunities, please contact:
- Hugo Chenel – hugo.chenel@inserm.fr
- Malvina Marku – malvina.marku@inserm.fr
- Vera Pancaldi – vera.pancaldi@inserm.fr
Maintained by the NetB(IO)² research team at CRCT, Toulouse.