asf-tools

A comprehensive Python toolkit for Advanced Sequencing Facility (ASF) operations at the Francis Crick Institute.

Overview

ASF Tools is a Python-based command-line application designed to streamline and automate repetitive tasks within ASF operations. It provides a comprehensive suite of utilities for:

Sequencing Data Management: Processing and organizing Illumina and Oxford Nanopore (ONT) sequencing data
LIMS Integration: Interfacing with Clarity LIMS for sample metadata and barcode information
Pipeline Automation: Creating and managing Nextflow pipeline runs for demultiplexing and analysis
Data Delivery: Automated symlink creation and data delivery to researchers
Infrastructure Management: SLURM job monitoring and SSH-based operations on Nemo

Authors: Chris Cheshire, Areda Elezi
Repository: github.com/FrancisCrickInstitute/asf-tools

User Guide

Production Usage

ASF Tools is deployed as a containerized application on Nemo. The recommended approach for production use is via the automation scripts in asf-automation-scripts.

Running via Automation Scripts

All operations must be run from the scripts folder where the config.sh file is located:

cd asf-automation-scripts/scripts
./asf_tools.sh [COMMAND] [OPTIONS]

Direct CLI Usage

For development or direct access:

# Activate environment
. .venv/bin/activate && uv sync --group dev

# Run commands
asf-tools pipeline [COMMAND] [OPTIONS]

CLI Commands

All pipeline commands are accessed via the pipeline subcommand:

asf-tools pipeline [COMMAND] [OPTIONS]

Data Pipeline Management

`gen-demux-run`

Creates run directories and SLURM batch scripts for demultiplexing pipelines. Supports both ONT and Illumina modes.

asf-tools pipeline gen-demux-run \
  --source_dir /path/to/raw/data \
  --target_dir /path/to/pipeline/runs \
  --mode ont \
  --pipeline_dir /path/to/nextflow/pipeline \
  --nextflow_cache /path/to/nf/cache \
  --nextflow_work /path/to/nf/work \
  --container_cache /path/to/singularity/cache \
  --runs_dir /host/path/to/runs

Required Options:

--source_dir: Directory containing raw sequencing data
--target_dir: Directory where pipeline runs will be created
--mode: Data type (ont, illumina, or general)
--pipeline_dir: Path to Nextflow pipeline code
--nextflow_cache: Nextflow cache directory
--nextflow_work: Nextflow work directory
--container_cache: Singularity container cache directory
--runs_dir: Host path for runs folder (for containerized environments)

Optional Flags:

--use_api: Generate samplesheets using Clarity LIMS API
--contains TEXT: Filter runs by substring in folder name
--samplesheet_only: Only update samplesheets, don't create new runs
--nextflow_version VERSION: Override default Nextflow version in SLURM header

Example - ONT demultiplexing with LIMS integration:

asf-tools pipeline gen-demux-run \
  --source_dir /data/ont/raw \
  --target_dir /data/ont/demux \
  --mode ont \
  --pipeline_dir /pipelines/nanopore_demux \
  --nextflow_cache /cache/nextflow \
  --nextflow_work /work/nextflow \
  --container_cache /cache/singularity \
  --runs_dir /mnt/data/runs \
  --use_api \
  --contains "PAK"

`deliver-to-targets`

Creates symlinks to deliver processed data to researcher directories.

asf-tools pipeline deliver-to-targets \
  --source_dir /path/to/processed/data \
  --target_dir /path/to/delivery/area

Required Options:

--source_dir: Source directory (run directory for non-interactive, parent directory for interactive)
--target_dir: Target delivery directory

Optional Options:

--host_delivery_folder: Host path for delivery when running in container
--interactive: Run in interactive mode to manually select runs

Example - Interactive delivery:

asf-tools pipeline deliver-to-targets \
  --source_dir /data/ont/demux \
  --target_dir /delivery/ont \
  --interactive

`scan-run-state`

Monitors the status of sequencing and pipeline runs, checking completion states and SLURM job status.

asf-tools pipeline scan-run-state \
  --raw_dir /path/to/raw/data \
  --run_dir /path/to/pipeline/runs \
  --target_dir /path/to/delivery/area \
  --mode ont

Required Options:

--raw_dir: Directory containing raw sequencing data
--run_dir: Directory containing pipeline runs
--target_dir: Data delivery directory
--mode: Data type (ont, illumina, or general)

Optional Options:

--slurm_user: SLURM username for job status checking
--job_prefix: SLURM job name prefix for filtering
--slurm_file: Path to SLURM job output file

Samplesheet Generation

`gen-viral-genomics-samplesheet`

Generates samplesheets for viral genomics pipelines from FASTQ file directories.

asf-tools pipeline gen-viral-genomics-samplesheet \
  --source_dir /path/to/fastq/files \
  --target_dir /path/to/output \
  --curr-prefix /old/path/prefix \
  --new-prefix /new/path/prefix

Required Options:

--source_dir: Directory containing FASTQ files
--target_dir: Directory to write the samplesheet

Optional Options:

--curr-prefix: Current path prefix to replace in FASTQ file paths
--new-prefix: New path prefix to substitute

Behavior:

Creates CSV samplesheet with sample metadata
Each (sample_id, lane) pair becomes a row
Automatically detects paired-end reads
Sorts output by sample ID and read paths for consistency

Data Upload

`upload-report`

Uploads analysis reports and metadata to database tables.

asf-tools pipeline upload-report \
  --data-file /path/to/report.pkl \
  --run-id RUN123 \
  --report-type quality_metrics \
  --upload-table reports_table

Required Options:

--data-file: Path to pickle file containing report data
--run-id: Unique run identifier
--report-type: Type of report being uploaded
--upload-table: Target database table

Optional Options:

--table_override: Override default table suffix

Developer Guide

Installation & Setup

Requirements

Python: 3.13+ (managed via asdf or pyenv)
UV: For fast dependency management
Just: For task automation
Operating System: Linux or macOS

Quick Setup

# Clone the repository
git clone https://github.com/FrancisCrickInstitute/asf-tools.git
cd asf-tools

# Set up development environment (creates .venv automatically)
just dev

The just dev command will:

Create a .venv virtual environment if it doesn't exist
Install all dependencies including development tools
Activate the environment and spawn a new shell

Manual Setup

# Create virtual environment
uv venv .venv
source .venv/bin/activate

# Install dependencies
uv sync --group dev

# Verify installation
python -c "import asf_tools; print('Installation successful')"

Available Just Commands

just dev          # Set up development environment
just test         # Run pytest suite
just test-cli     # Run tests with CLI output
just lint         # Run ruff linting
just python-upgrade  # Upgrade Python version

Development Workflow

Test-Driven Development

This project follows strict TDD practices:

Write tests first - Before implementing any feature
Run tests frequently - Use just test after each change
Maintain 100% coverage - All new code must be tested
Use descriptive test names - Tests should document behavior

Code Quality Standards

Formatting & Linting:

# Format code
black .
isort .

# Check linting
ruff check .

# All checks
just lint

Testing:

# Run all tests
pytest

# Run with coverage
pytest --cov=asf_tools

# Run specific test file
pytest tests/test_specific_module.py

# Run tests with CLI output
just test-cli

Pre-commit Checklist:

All tests pass (just test)
Code is formatted (black ., isort .)
No linting errors (just lint)
New functionality has tests
Documentation updated if needed

Architecture

Module Overview

asf_tools/
├── api/                 # External API integrations
│   └── clarity/         # Clarity LIMS interface
├── config/              # Configuration management
├── database/            # Database models and operations
├── illumina/            # Illumina-specific data processing
├── io/                  # File I/O and data management
├── nextflow/            # Nextflow pipeline generation
├── slack/               # Slack webhook notifications
├── slurm/               # SLURM job management
└── ssh/                 # SSH connections and remote operations

Core Components

API Layer (`asf_tools.api`)

Clarity LIMS Integration:

clarity_lims.py: Direct API client for Clarity LIMS
clarity_helper_lims.py: High-level wrapper with domain logic
models.py: Pydantic models for data validation

Key functions:

Sample metadata retrieval
Barcode information extraction
Project and user information
Custom field parsing

Data Management (`asf_tools.io`)

Storage Interface:

Abstraction layer for local and remote file operations
Supports SSH-based remote operations
Handles permissions and directory creation

Data Management:

Pipeline state monitoring
Run completion detection
Data delivery automation
Directory cleanup utilities

Pipeline Integration (`asf_tools.nextflow`)

Pipeline Generators:

gen_ont_demux_run.py: ONT demultiplexing pipeline setup
gen_illumina_demux_run.py: Illumina demultiplexing pipeline setup
gen_viral_genomics_run.py: Viral genomics samplesheet generation

Features:

SLURM batch script generation
Nextflow parameter management
Container cache handling
Module loading for HPC environments

Infrastructure (`asf_tools.slurm`, `asf_tools.ssh`)

SLURM Integration:

Job status monitoring
Queue management
Resource allocation

SSH Operations:

Remote file operations on Nemo
Secure file transfer
Remote command execution

Data Flow

Raw Data Ingestion: Monitor sequencing instrument output
Metadata Retrieval: Query Clarity LIMS for sample information
Pipeline Setup: Generate Nextflow configurations and SLURM scripts
Execution Monitoring: Track pipeline progress and job status
Data Delivery: Create symlinks and deliver results to researchers
Cleanup: Archive and cleanup temporary files

Configuration Management

Configuration is managed through:

Environment variables
TOML configuration files (asf_tools.config.toml_loader)
Command-line arguments with Click
Container environment setup

Testing

Test Organization

Tests are organized in a flat structure mirroring the source code:

tests/
├── test_api_clarity_lims.py           # API functionality
├── test_io_data_management.py         # Data management
├── test_nextflow_gen_ont_demux_run.py # Pipeline generation
└── ...

Testing Guidelines

Test Structure:

class TestModuleName:
    def setup_method(self):
        # Test setup
        
    def test_specific_functionality(self):
        # Setup
        # Test
        # Assert using assert_that()

Assertions: Use assertpy for readable assertions:

from assertpy import assert_that

assert_that(result).is_equal_to(expected)
assert_that(file_path).exists()
assert_that(response.status_code).is_equal_to(200)

Mocking:

# Use pytest fixtures and mocks
@pytest.fixture
def mock_api():
    return ClarityHelperLimsMock()

def test_with_mock(mock_api):
    result = process_data(mock_api)
    assert_that(result).is_not_none()

Running Tests

# All tests
pytest

# Specific module
pytest tests/test_io_data_management.py

# With coverage
pytest --cov=asf_tools --cov-report=html

# Verbose output
pytest -v

# Stop on first failure
pytest -x

Project Structure

asf-tools/
├── asf_tools/              # Main source code
│   ├── __main__.py         # CLI entry point
│   ├── api/                # External integrations
│   ├── config/             # Configuration
│   ├── database/           # Database operations
│   ├── illumina/           # Illumina processing
│   ├── io/                 # I/O operations
│   ├── nextflow/           # Pipeline generation
│   ├── slack/              # Notifications
│   ├── slurm/              # HPC integration
│   └── ssh/                # Remote operations
├── tests/                  # Test suite (flat structure)
├── docs/                   # Documentation
├── output/                 # Runtime output (gitignored)
├── pyproject.toml          # Project configuration
├── justfile               # Task automation
├── Dockerfile             # Container build
├── pytest.ini            # Test configuration
├── uv.lock               # Dependency lock file
└── README.md             # This file

Key Files

pyproject.toml: Project metadata, dependencies, and tool configuration
justfile: Development task automation (replaces Makefile)
asf_tools/__main__.py: CLI entry point with Click framework
tests/: Comprehensive test suite with pytest
Dockerfile: Production container image

License

See LICENSE for details.

Contact

Maintainers:

Chris Cheshire: chris.cheshire@crick.ac.uk
Areda Elezi

Issues & Contributions: Please use the GitHub repository for bug reports, feature requests, and contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 1,050 Commits
.github		.github
asf_tools		asf_tools
docs		docs
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.prettierrc		.prettierrc
.tool-versions		.tool-versions
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
conftest.py		conftest.py
justfile		justfile
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
tox.ini		tox.ini
update_version.py		update_version.py
uv.lock		uv.lock

License

FrancisCrickInstitute/asf-tools

Folders and files

Latest commit

History

Repository files navigation

asf-tools

Table of Contents

Overview

User Guide

Production Usage

Running via Automation Scripts

Direct CLI Usage

CLI Commands

Data Pipeline Management

gen-demux-run

deliver-to-targets

scan-run-state

Samplesheet Generation

gen-viral-genomics-samplesheet

Data Upload

upload-report

Developer Guide

Installation & Setup

Requirements

Quick Setup

Manual Setup

Available Just Commands

Development Workflow

Test-Driven Development

Code Quality Standards

Architecture

Module Overview

Core Components

API Layer (asf_tools.api)

Data Management (asf_tools.io)

Pipeline Integration (asf_tools.nextflow)

Infrastructure (asf_tools.slurm, asf_tools.ssh)

Data Flow

Configuration Management

Testing

Test Organization

Testing Guidelines

Running Tests

Project Structure

Key Files

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

`gen-demux-run`

`deliver-to-targets`

`scan-run-state`

`gen-viral-genomics-samplesheet`

`upload-report`

API Layer (`asf_tools.api`)

Data Management (`asf_tools.io`)

Pipeline Integration (`asf_tools.nextflow`)

Infrastructure (`asf_tools.slurm`, `asf_tools.ssh`)

Packages