Often the need arises to perform a series of tasks that are related to one another. For example, you might have a workflow that’s something like “do task A, then do task B, and then do task C”. These dependencies, task A must be done before task B and task B must be done before task C, can be encapsulated in a directed acyclic graph (DAG). There are many libraries and frameworks for constructing and executing DAGs. One in particular, HTCondor, I utilize in my work.
This tutorial walks through constructing an example DAG (shown below) using PyCondor, a Python package for building workflows that can be submitted to an HTCondor cluster. This example DAG can be found in the PyCondor example scripts on GitHub.
from pycondor import Job, Dagman
# Define the error, output, log, and submit directories
error = 'condor/error'
output = 'condor/output'
log = 'condor/log'
submit = 'condor/submit'
# Instantiate a Dagman
dagman = Dagman(name='tutorial_dagman',
submit=submit)
# Instantiate Jobs
job_date = Job(name='date_job',
executable='/bin/date',
submit=submit,
error=error,
output=output,
log=log,
dag=dagman)
job_sleep = Job(name='sleep_job',
executable='/bin/sleep',
submit=submit,
error=error,
output=output,
log=log,
dag=dagman)
job_sleep.add_arg('1')
job_sleep.add_arg('2')
job_sleep.add_arg('3')
# Add inter-job relationship
# Ensure that job_sleep finishes before job_date starts
job_sleep.add_child(job_date)
# Write all necessary submit files and submit job to HTCondor
dagman.build_submit()
Job and Dagman objects
The basic building blocks in PyCondor are the Job and Dagman objects. A Job object represents an executable (e.g. a shell command, Python script, etc.) that you would like to run on an HTCondor cluster. While the Dagman object represents a collection of Jobs to be run.
from pycondor import Job, Dagman
Both the Job and Dagman objects can be imported directly from pycondor
.
Job and Dagman file directories
There are several files that will be generated for both Job and Dagman objects. For each Job and Dagman object, PyCondor will create a submit file. This file will be formatted such that it can be submitted to HTCondor for execution. In addition to submit files, log files and files containing the standard output and error for each Job will also be generated.
# Define the error, output, log, and submit directories
error = 'condor/error'
output = 'condor/output'
log = 'condor/log'
submit = 'condor/submit'
For this tutorial, we will explicitly specify the directories that we would like the error, output, log, and submit files to be saved to. If not specified, the current working directory will be used.
Note that these file directories can also be specified by setting the PYCONDOR_SUBMIT_DIR
, PYCONDOR_ERROR_DIR
, PYCONDOR_LOG_DIR
, and PYCONDOR_OUTPUT_DIR
environment variables. For example, setting PYCONDOR_SUBMIT_DIR=condor/submit
is equivalent to the above.
Setting up a Dagman
The Dagman object (short for directed acyclic graph manager) represents a collection of Jobs to be run.
# Instantiate a Dagman
dagman = Dagman(name='tutorial_dagman',
submit=submit)
For a Dagman, only a name
has to be provided (used to construct the submit, log, etc. file names). While the submit
parameter is the directory path where the Dagman submit file will be saved (if not provided, it defaults to the current directory).
Setting up Jobs
Now we’re ready to add some Jobs to the Dagman. Both a name
and an executable
must be provided to create a Job.
# Instantiate Jobs
job_date = Job(name='date_job',
executable='/bin/date',
submit=submit,
error=error,
output=output,
log=log,
dag=dagman)
job_sleep = Job(name='sleep_job',
executable='/bin/sleep',
submit=submit,
error=error,
output=output,
log=log,
dag=dagman)
job_sleep.add_arg('1')
job_sleep.add_arg('2')
job_sleep.add_arg('3')
In this example, job_date
will run the shell date
command, and job_sleep
will run the shell sleep
command. A Job can be added to a Dagman object by passing a Dagman to the Job dag
parameter.
In addition to defining an executable for a Job to run, you can also pass arguments to the executable using the Job add_arg
method. Here, we’ve added three arguments, 1
, 2
, and 3
, to job_sleep
. This Job will now run the sleep
command on each of the provided arguments, i.e. sleep 1
, sleep 2
, and sleep 3
.
Adding inter-job relationships
One useful feature of Dagman objects is that they can support inter-job relationships between the Jobs they manage. Inter-job relationships in PyCondor can be specified using the Job add_child
and add_parent
methods.
# Add inter-job relationship
# Ensure that job_sleep finishes before job_date starts
job_sleep.add_child(job_date)
In this example, job_sleep.add_child(job_date)
sets job_date
as a child Job of job_sleep
. This means that job_date
will start running only after job_sleep
has finished. Note that job_sleep.add_child(job_date)
is equivalent to job_date.add_parent(job_sleep)
.
Build and submit Dagman
Now that the relationships between the Jobs in our Dagman have been specified, we’re ready to build all the appropriate Job and Dagman submit files and submit them to HTCondor for execution.
# Write all necessary submit files and submit job to HTCondor
dagman.build_submit()
The Dagman build_submit
method is used to both build the Job and Dagman submit files as well as submit them to HTCondor. Note that the build_submit
method is just a convenience method for the build
Dagman method followed by the submit_dag
method.
There you go, we’ve built a DAG using PyCondor! For more examples see the examples section of the PyCondor documentation.