cluefish

Table of Contents

Overview
Installation
Usage
Contributing
License
Contact
Acknowledgments

Overview

Cluefish is a free and open-source, semi-automated R workflow designed for comprehensive and untargeted exploration of transcriptomic data series. Its name reflects the three key concepts driving the workflow: Clustering, Enrichment, and Fishing—metaphorically aligned with “fishing for clues”🎣 in complex biological data.

When used alongside the DRomics (Dose-Response for Omics) R package, Cluefish provides a more comprehensive analysis of dose-response transcriptomic data. In toxicology/ecotoxicology, this will support the understanding/highlighting of contaminant’s mode of action.

This workflow addresses the limitations of the standard Over-Representation Analysis (ORA) by applying ORA to pre-clustered networks. These clusters serve as anchors for ORA, enhancing enrichment detection sensitivity and thus enabling the identification of smaller, more specific biological processes while simultaneously forming exploratory gene groups.

Cluefish is designed to be adaptable to a wide range of organisms, both model and non-model, ensuring broad applicability across various biological contexts.

If you’re ready to dive straight into using Cluefish, check out the Introduction to Cluefish vignette

Graphical abstract of the Cluefish workflow.

(back to top)

Installation

The Cluefish tool is developed in R, so having R installed is a prerequisite. You can download it here.

For an enhanced experience, we recommend using the RStudio integrated development environment (IDE), which is available for download at the same link, here.

You can use Cluefish locally in one of two ways:

Clone the repository via a terminal:

git clone https://github.com/ellfran-7/cluefish.git

Install the developmental version of Cluefish from GitHub in R (remotes needed):

if (!requireNamespace("remotes", quietly = TRUE))
  install.packages("remotes")

remotes::install_github("ellfran-7/cluefish")

(back to top)

Additional Requirements

Cluefish relies on external open source software for an intermediate step within its workflow. Please ensure the following tools are installed:

Cytoscape:

Cluefish uses Cytoscape in order to visualize PPI networks. Install Cytoscape from their download page.
Required Cytoscape Apps:

Within Cytoscape, install the StringApp and clusterMaker2 apps. To do this:
- Open Cytoscape
- Navigate to Apps > App Store > Show App Store
- Search for and install “StringApp” (for retrieving STRING protein interactions) and “clusterMaker2”” (for clustering network data).

You can also view more about these apps on the Cytoscape App Store.

(back to top)

Usage

To run the Cluefish workflow, you can use the make.R script, which serves as the ‘master’ script for the entire process. We recommend using this script as a template to ensure smooth and sequential execution of the workflow steps.

Required R packages

A key feature of Cluefish is the integration of renv to create reproducible environments. This allows you to install the required R packages in two ways:

Run renv::install() to install the most recent version of the packages listed in the renv.lock file.
For full reproducibility, run renv::restore() to install the exact package versions specified in the renv.lock file. Note that this process may take longer.

Required inputs

Cluefish requires two key inputs:

A background transcript list: Typically, this includes the identifiers for all detected transcripts in the experiment.
A deregulated transcript list: A subset of the background list, containing the identifiers of significantly deregulated transcripts. This list can be derived using any selection method.

Recommended Selection Method

While the inputs can be derived from any selection method, Cluefish was optimised to work seamlessly with the results from DRomics, a tool tailored for dose-response modelling of omics data.

Although using DRomics is optional, Cluefish leverages some of its visualization functions and modelling metrics to provide deeper insights into the biological interpretation of the data.

For more information on DRomics, please refer to their documentation.

(back to top)

Workflow

A schematic overview of the Cluefish workflow is shown below. For a full, step-by-step guide, refer to the vignette, Introduction to Cluefish, which provides instructions using the ZebrafishDBP example dataset. The raw count data is publicly available on NCBI GEO and can be accessed with GSE283957.

Schematic of the Cluefish workflow.

(back to top)

Citation

If you use Cluefish, please cite the associated paper as follows:

Ellis Franklin, Elise Billoir, Philippe Veber, Jérémie Ohanessian, Marie Laure Delignette-Muller, Sophie Martine Prud’homme, Cluefish: mining the dark matter of transcriptional data series with over-representation analysis enhanced by aggregated biological prior knowledge, NAR Genomics and Bioinformatics, Volume 7, Issue 3, September 2025, lqaf103, https://doi.org/10.1093/nargab/lqaf103

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag “enhancement”. Don’t forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingIdea)
Commit your Changes (git commit -m 'Add some AmazingIdea')
Push to the Branch (git push origin feature/AmazingIdea)
Open a Pull Request

(back to top)

License

This project is distributed under the CeCILL Free Software License Agreement v2.1 (CECILL-2.1). See LICENSE.txt for more information.

CECILL-2.1 is compatible with GNU GPL. See the official CeCILL site for more information.

Please note that the creative assets, such as the logos and schematics associated with Cluefish, are distributed under the CC-BY-SA-4.0 license.

(back to top)

Contact

If you have any need that is not yet covered, any feedback on Cluefish, or anything other question, feel free to contact me !

Ellis Franklin - Website - LinkedIn Bluesky - ellis.franklin@univ-lorraine.fr

Project Link: https://github.com/ellfran-7/cluefish

(back to top)

Acknowledgments

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 311 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
R		R
analyses		analyses
data		data
docs		docs
figures		figures
inst		inst
man		man
outputs		outputs
renv		renv
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codemeta.json		codemeta.json
make.R		make.R
renv.lock		renv.lock
report_comparison_results.qmd		report_comparison_results.qmd
report_lonely_results.qmd		report_lonely_results.qmd
report_workflow_results.qmd		report_workflow_results.qmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cluefish

Overview

Installation

Additional Requirements

Usage

Required R packages

Required inputs

Recommended Selection Method

Workflow

Citation

Contributing

License

Contact

Acknowledgments

About

Uh oh!

Releases 2

Packages

Languages

License

ellfran-7/cluefish

Folders and files

Latest commit

History

Repository files navigation

cluefish

Overview

Installation

Additional Requirements

Usage

Required R packages

Required inputs

Recommended Selection Method

Workflow

Citation

Contributing

License

Contact

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages