Skip to content

Job processing (introduction)

David Anderson edited this page Jan 16, 2024 · 10 revisions

Staging input and output files

Batches

Pipeline components

work generator validator assimilator

Job submission (and file management)

local remote via RPC python bindings remote via web interface

Failures and retries

A job can fail on a BOINC worker node for a variety of reasons:

  • The application crashes.
  • The user on that node aborts the job.
  • The job exceeds its memory or disk space limits.
  • The job times out.

In some cases the job would succeed on a different node. So BOINC provides a 'retry' mechanism: if a job fails on a node, a second copy (or 'instance') of the job is sent to a different node. This is repeated until an instance succeeds, or until a limit on the number of instances is reached, in which case the job is marked as failing and no further instances are created.

Clone this wiki locally