-
Notifications
You must be signed in to change notification settings - Fork 479
Job processing (cookbook)
This cookbook shows how to create job processing components (work generator, validator, and assimilator) in Python.
We'll use the same example application as in the previous cookbooks:
it takes a text file and converts it to uppercase.
We'll assume that you've already created a project
and deployed this app (perhaps using VirtualBox),
and that the app name is worker
.
In this example, you can create a directory containing input files - potentially thousands of them. Then - with one command - you can create a batch of jobs, one per input file.
The validator will make sure that output files are uppercase. We'll use two-fold replication, and will accept results only if the two instances produce the same result.
The assimilator will put the output files into a new directory.
This is, of course, a toy example. But it should be straightforward to use the mechanisms to handle real applications.
- If you haven't already done so, create an account on the BOINC project. Make a not of your user ID (an integer, shown on your user page).
- Go to the project's admin web page.
- Click
User job submission privileges
. - Click 'Add user'.
- Enter your user ID and click OK.
- Select 'All apps' and click OK.
We'll use an existing script called demo_submit_batch
which is used as follows:
bin/demo_submit_batch user_id app_name infile_dir
It creates a batch of jobs for the given app, owned by the given user. It creates one job for each file in the given directory (the app is assumed to take one input file).
The source for demo_submit_batch
is here.
Let's look at how it works, so that you can adapt it to your own apps.
files = []
for entry in os.scandir(dir):
if not entry.is_file():
raise Exception('not file')
files.append(entry.name)
This scans the input file directory and makes a list of the files it contains.
cmd = [
'bin/create_batch',
'--app_name', app_name,
'--user_id', str(user_id),
'--njobs', str(len(files)),
'--name', '%s__%d'%(app_name, int(time.time()))
]
ret = subprocess.run(cmd, capture_output=True)
if ret.returncode:
raise Exception('create_batch failed (%d): %s'%(ret.returncode, ret.stdout))
batch_id = int(ret.stdout)
This creates a batch by running a BOINC-supplied program, create_batch
.
It parses the batch ID written by this program.
cmd = ['bin/stage_file', '--copy', dir]
ret = subprocess.run(cmd, capture_output=True)
if ret.returncode:
raise Exception('stage_file failed (%d): %s'%(ret.returncode, ret.stdout))
This stages the input files, copying them from the input file directory to the project's download hierarchy.
fstr = '\n'.join(files)
cmd = [
'bin/create_work',
'--appname', app_name,
'--batch', str(batch_id),
'--stdin'
]
ret = subprocess.run(cmd, input=fstr, capture_output=True, encoding='ascii')
if ret.returncode:
raise Exception('create_work failed (%d): %s'%(ret.returncode, ret.stdout))
This creates the jobs using a BOINC-supplied program create_work
.
The --stdin
tells create_work
that job descriptions will be passed via stdin, one per line.
In this case the job description is just the name of the input file.
It could also include command-line parameters;
see details.
cmd = ['bin/create_work', '--enable', str(batch_id)]
ret = subprocess.run(cmd, capture_output=True)
if ret.returncode:
raise Exception('enable batch failed (%d): %s'%(ret.returncode, ret.stdout))
This marks the batch as 'in progress'.
Our validator will consist of two Python scripts. The first one checks whether a file is uppercase and exits 0 or 1 accordingly.
def is_uc(path):
with open(path) as f:
data = f.read()
return data == data.upper()
exit(0 if is_uc(sys.argv[1]) else 1)
The second one checks whether 2 files are identical:
def read(path):
with open(path) as f:
data = f.read()
return data
exit(0 if read(sys.argv[1])==read(sys.argv[2]) else 1)
These scripts are in the BOINC source code tree,
in samples/worker
.
Copy them to ~/projects/test/bin
.
Add the following to ~/projects/test/config.xml
,
in the <daemons>
section:
<daemon>
<cmd>script_validator --app worker --init_script "validate_init.py" --compare_script "validate_compare.py" </cmd>
<output>validator_worker.out</output>
<pid_file>validator_worker.pid</pid_file>
</daemon>
In the BOINC server:
cd ~/projects/test
mkdir infiles
Then put some text files into infiles/
.
As many as you want; long, short, doesn't matter.