Job processing (introduction)

The job processing pipeline

BOINC is designed for processing large numbers of jobs - possibly millions per project per day. Jobs move through a software pipeline:

picture: WG->DB->scheduler->client->app->validator->assimilator

The pipeline stages:

A work generator creates jobs and moves their input files into a 'staging area' from which they can be accessed via HTTP by worker nodes. Work generators are application specific. They may work by creating batches of jobs, or by 'streaming' jobs into the pipeline in a way that keeps it full but no overflowing.
A scheduler (part of BOINC) dispatches jobs to clients.
The client downloads input files as needed, runs the application program, and uploads output files when the job is completed.
A validator examines output files and tries to verify that they are correct. These are in general application-specific. They may syntax-check individual output files, or they may compare pairs of output files to do replication-based validation.
An assimilator handles a completed and validated job. It's application-specific. It may move output files from BOINC's staging area to a final destination, or it may parse output files and insert their information into a database.

The application-specific parts of the pipeline can be written in C++ (for maximum performance) or in scriptng languages like Python and PHP.

Staging input and output files

BOINC stores input and output files in staging areas on the server, which can be accessed via HTTP by clients. These area are 2-level directory hierarchies, one for uploads, one for downloads. Each hierarcy has 1024 subdirectories (named '0' to '3ff'). A file is placed in a subdirectory based on the MD5 has of its filename.

This was done because projects can have large numbers (millions) of files at a given time, and some filesystems become extremely slow when a single directory contains that many files.

Before a job is submitted, its input files are moved or copied to the download staging area. When a client completes a job, it uploads the output files to the appropriate place in the upload staging area.

Batches

For convenience, jobs may be collected into batches. Batches may be monitored and controlled as a unit.

Ownership of jobs and batches

In projects with multiple job submitters, job submitters are identified by their user account on the project (the same kind of accounts as volunteers; job submitters can also act as resource providers). A batch is associated with the user ID of its submitters.

Interfaces for job submission and file management

There are three general ways to submit jobs and manage the associated files:

Server-level

In this approach, job submitters (scientists) log in to the server and submit jobs by running command-line programs.

Remote via RPC

BOINC provides a set of XML/HTTP RPCs for managing files and submitting jobs. These RPCs have Python bindings. This lets you create systems that allow scientists to process job without logging into the BOINC server.

Remote via web interface

You can develop application-specific web interfaces that run on the BOINC project server, and that allow scientists to manage files and submit jobs remotely via a web browser.

Failures and retries

A job can fail on a BOINC worker node for a variety of reasons:

The application crashes.
The user on that node aborts the job.
The job exceeds its memory or disk space limits.
The job times out.

In some cases the job would succeed on a different node.

By default jobs are not retried. But you can configure an app so that jobs are retried up to some limit: if a job fails on a node, a second copy (or 'instance') of the job is sent to a different node. This is repeated until an instance succeeds, or until the limit is reached, in which case the job is marked as failing and no further instances are created.

Home

Job processing (introduction)

The job processing pipeline

Staging input and output files

Batches

Ownership of jobs and batches

Interfaces for job submission and file management

Server-level

Remote via RPC

Remote via web interface

Failures and retries

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!