A comprehensive Python toolkit for Advanced Sequencing Facility (ASF) operations at the Francis Crick Institute.
ASF Tools is a Python-based command-line application designed to streamline and automate repetitive tasks within ASF operations. It provides a comprehensive suite of utilities for:
- Sequencing Data Management: Processing and organizing Illumina and Oxford Nanopore (ONT) sequencing data
- LIMS Integration: Interfacing with Clarity LIMS for sample metadata and barcode information
- Pipeline Automation: Creating and managing Nextflow pipeline runs for demultiplexing and analysis
- Data Delivery: Automated symlink creation and data delivery to researchers
- Infrastructure Management: SLURM job monitoring and SSH-based operations on Nemo
Authors: Chris Cheshire, Areda Elezi
Repository: github.com/FrancisCrickInstitute/asf-tools
ASF Tools is deployed as a containerized application on Nemo. The recommended approach for production use is via the automation scripts in asf-automation-scripts.
All operations must be run from the scripts
folder where the config.sh
file is located:
cd asf-automation-scripts/scripts
./asf_tools.sh [COMMAND] [OPTIONS]
For development or direct access:
# Activate environment
. .venv/bin/activate && uv sync --group dev
# Run commands
asf-tools pipeline [COMMAND] [OPTIONS]
All pipeline commands are accessed via the pipeline
subcommand:
asf-tools pipeline [COMMAND] [OPTIONS]
Creates run directories and SLURM batch scripts for demultiplexing pipelines. Supports both ONT and Illumina modes.
asf-tools pipeline gen-demux-run \
--source_dir /path/to/raw/data \
--target_dir /path/to/pipeline/runs \
--mode ont \
--pipeline_dir /path/to/nextflow/pipeline \
--nextflow_cache /path/to/nf/cache \
--nextflow_work /path/to/nf/work \
--container_cache /path/to/singularity/cache \
--runs_dir /host/path/to/runs
Required Options:
--source_dir
: Directory containing raw sequencing data--target_dir
: Directory where pipeline runs will be created--mode
: Data type (ont
,illumina
, orgeneral
)--pipeline_dir
: Path to Nextflow pipeline code--nextflow_cache
: Nextflow cache directory--nextflow_work
: Nextflow work directory--container_cache
: Singularity container cache directory--runs_dir
: Host path for runs folder (for containerized environments)
Optional Flags:
--use_api
: Generate samplesheets using Clarity LIMS API--contains TEXT
: Filter runs by substring in folder name--samplesheet_only
: Only update samplesheets, don't create new runs--nextflow_version VERSION
: Override default Nextflow version in SLURM header
Example - ONT demultiplexing with LIMS integration:
asf-tools pipeline gen-demux-run \
--source_dir /data/ont/raw \
--target_dir /data/ont/demux \
--mode ont \
--pipeline_dir /pipelines/nanopore_demux \
--nextflow_cache /cache/nextflow \
--nextflow_work /work/nextflow \
--container_cache /cache/singularity \
--runs_dir /mnt/data/runs \
--use_api \
--contains "PAK"
Creates symlinks to deliver processed data to researcher directories.
asf-tools pipeline deliver-to-targets \
--source_dir /path/to/processed/data \
--target_dir /path/to/delivery/area
Required Options:
--source_dir
: Source directory (run directory for non-interactive, parent directory for interactive)--target_dir
: Target delivery directory
Optional Options:
--host_delivery_folder
: Host path for delivery when running in container--interactive
: Run in interactive mode to manually select runs
Example - Interactive delivery:
asf-tools pipeline deliver-to-targets \
--source_dir /data/ont/demux \
--target_dir /delivery/ont \
--interactive
Monitors the status of sequencing and pipeline runs, checking completion states and SLURM job status.
asf-tools pipeline scan-run-state \
--raw_dir /path/to/raw/data \
--run_dir /path/to/pipeline/runs \
--target_dir /path/to/delivery/area \
--mode ont
Required Options:
--raw_dir
: Directory containing raw sequencing data--run_dir
: Directory containing pipeline runs--target_dir
: Data delivery directory--mode
: Data type (ont
,illumina
, orgeneral
)
Optional Options:
--slurm_user
: SLURM username for job status checking--job_prefix
: SLURM job name prefix for filtering--slurm_file
: Path to SLURM job output file
Generates samplesheets for viral genomics pipelines from FASTQ file directories.
asf-tools pipeline gen-viral-genomics-samplesheet \
--source_dir /path/to/fastq/files \
--target_dir /path/to/output \
--curr-prefix /old/path/prefix \
--new-prefix /new/path/prefix
Required Options:
--source_dir
: Directory containing FASTQ files--target_dir
: Directory to write the samplesheet
Optional Options:
--curr-prefix
: Current path prefix to replace in FASTQ file paths--new-prefix
: New path prefix to substitute
Behavior:
- Creates CSV samplesheet with sample metadata
- Each (sample_id, lane) pair becomes a row
- Automatically detects paired-end reads
- Sorts output by sample ID and read paths for consistency
Uploads analysis reports and metadata to database tables.
asf-tools pipeline upload-report \
--data-file /path/to/report.pkl \
--run-id RUN123 \
--report-type quality_metrics \
--upload-table reports_table
Required Options:
--data-file
: Path to pickle file containing report data--run-id
: Unique run identifier--report-type
: Type of report being uploaded--upload-table
: Target database table
Optional Options:
--table_override
: Override default table suffix
- Python: 3.13+ (managed via
asdf
orpyenv
) - UV: For fast dependency management
- Just: For task automation
- Operating System: Linux or macOS
# Clone the repository
git clone https://github.com/FrancisCrickInstitute/asf-tools.git
cd asf-tools
# Set up development environment (creates .venv automatically)
just dev
The just dev
command will:
- Create a
.venv
virtual environment if it doesn't exist - Install all dependencies including development tools
- Activate the environment and spawn a new shell
# Create virtual environment
uv venv .venv
source .venv/bin/activate
# Install dependencies
uv sync --group dev
# Verify installation
python -c "import asf_tools; print('Installation successful')"
just dev # Set up development environment
just test # Run pytest suite
just test-cli # Run tests with CLI output
just lint # Run ruff linting
just python-upgrade # Upgrade Python version
This project follows strict TDD practices:
- Write tests first - Before implementing any feature
- Run tests frequently - Use
just test
after each change - Maintain 100% coverage - All new code must be tested
- Use descriptive test names - Tests should document behavior
Formatting & Linting:
# Format code
black .
isort .
# Check linting
ruff check .
# All checks
just lint
Testing:
# Run all tests
pytest
# Run with coverage
pytest --cov=asf_tools
# Run specific test file
pytest tests/test_specific_module.py
# Run tests with CLI output
just test-cli
Pre-commit Checklist:
- All tests pass (
just test
) - Code is formatted (
black .
,isort .
) - No linting errors (
just lint
) - New functionality has tests
- Documentation updated if needed
asf_tools/
├── api/ # External API integrations
│ └── clarity/ # Clarity LIMS interface
├── config/ # Configuration management
├── database/ # Database models and operations
├── illumina/ # Illumina-specific data processing
├── io/ # File I/O and data management
├── nextflow/ # Nextflow pipeline generation
├── slack/ # Slack webhook notifications
├── slurm/ # SLURM job management
└── ssh/ # SSH connections and remote operations
Clarity LIMS Integration:
clarity_lims.py
: Direct API client for Clarity LIMSclarity_helper_lims.py
: High-level wrapper with domain logicmodels.py
: Pydantic models for data validation
Key functions:
- Sample metadata retrieval
- Barcode information extraction
- Project and user information
- Custom field parsing
Storage Interface:
- Abstraction layer for local and remote file operations
- Supports SSH-based remote operations
- Handles permissions and directory creation
Data Management:
- Pipeline state monitoring
- Run completion detection
- Data delivery automation
- Directory cleanup utilities
Pipeline Generators:
gen_ont_demux_run.py
: ONT demultiplexing pipeline setupgen_illumina_demux_run.py
: Illumina demultiplexing pipeline setupgen_viral_genomics_run.py
: Viral genomics samplesheet generation
Features:
- SLURM batch script generation
- Nextflow parameter management
- Container cache handling
- Module loading for HPC environments
SLURM Integration:
- Job status monitoring
- Queue management
- Resource allocation
SSH Operations:
- Remote file operations on Nemo
- Secure file transfer
- Remote command execution
- Raw Data Ingestion: Monitor sequencing instrument output
- Metadata Retrieval: Query Clarity LIMS for sample information
- Pipeline Setup: Generate Nextflow configurations and SLURM scripts
- Execution Monitoring: Track pipeline progress and job status
- Data Delivery: Create symlinks and deliver results to researchers
- Cleanup: Archive and cleanup temporary files
Configuration is managed through:
- Environment variables
- TOML configuration files (
asf_tools.config.toml_loader
) - Command-line arguments with Click
- Container environment setup
Tests are organized in a flat structure mirroring the source code:
tests/
├── test_api_clarity_lims.py # API functionality
├── test_io_data_management.py # Data management
├── test_nextflow_gen_ont_demux_run.py # Pipeline generation
└── ...
Test Structure:
class TestModuleName:
def setup_method(self):
# Test setup
def test_specific_functionality(self):
# Setup
# Test
# Assert using assert_that()
Assertions:
Use assertpy
for readable assertions:
from assertpy import assert_that
assert_that(result).is_equal_to(expected)
assert_that(file_path).exists()
assert_that(response.status_code).is_equal_to(200)
Mocking:
# Use pytest fixtures and mocks
@pytest.fixture
def mock_api():
return ClarityHelperLimsMock()
def test_with_mock(mock_api):
result = process_data(mock_api)
assert_that(result).is_not_none()
# All tests
pytest
# Specific module
pytest tests/test_io_data_management.py
# With coverage
pytest --cov=asf_tools --cov-report=html
# Verbose output
pytest -v
# Stop on first failure
pytest -x
asf-tools/
├── asf_tools/ # Main source code
│ ├── __main__.py # CLI entry point
│ ├── api/ # External integrations
│ ├── config/ # Configuration
│ ├── database/ # Database operations
│ ├── illumina/ # Illumina processing
│ ├── io/ # I/O operations
│ ├── nextflow/ # Pipeline generation
│ ├── slack/ # Notifications
│ ├── slurm/ # HPC integration
│ └── ssh/ # Remote operations
├── tests/ # Test suite (flat structure)
├── docs/ # Documentation
├── output/ # Runtime output (gitignored)
├── pyproject.toml # Project configuration
├── justfile # Task automation
├── Dockerfile # Container build
├── pytest.ini # Test configuration
├── uv.lock # Dependency lock file
└── README.md # This file
pyproject.toml
: Project metadata, dependencies, and tool configurationjustfile
: Development task automation (replaces Makefile)asf_tools/__main__.py
: CLI entry point with Click frameworktests/
: Comprehensive test suite with pytestDockerfile
: Production container image
See LICENSE for details.
Maintainers:
- Chris Cheshire: chris.cheshire@crick.ac.uk
- Areda Elezi
Issues & Contributions: Please use the GitHub repository for bug reports, feature requests, and contributions.