The Job Jar is a simple batch queuing system for Unix. Its main distinguishing feature is that there is no central daemon. Instead, an arbitrary set of workers cooperatively claim jobs from a central directory. A job is any Unix executable file (usually scripts). Jobs are run in a fresh directory, and with a controlled environment that includes information such as the path to that directory.
Any user may run a Job Jar system. No special privileges are required, aside from read and write permission in the Job Jar installation directory (or whichever directories you have configured the system to run in; see Configuration). You may wish to create a user specifically to run the Job Jar system, for auditing purposes. If so, it is easiest to create and log in as that user before proceeding.
A standard Job Jar installation is simply a directory hierarchy, located on a disk that can be accessed by all potential workers.
Assuming the Job Jar distribution is in the compressed tar file jobjar-1_0_0.tgz, to unpack the Job Jar distribution under the directory /opt, type:
cd /opt gunzip jobjar-1_0_0.tgz | tar xvf -
This will create and populate the directory /opt/jobjar-1_0_0.
Unpacking the distribution will create the following subdirectories:
In the rest of this manual, it will be assumed that the Job Jar is installed under the directory named by the environment variable JJ_HOME .
The Job Jar system needs no special configuration to run. However, you can change where workers look for jobs, and control some aspects of the system's behavior, through environment variables and configuration files. See the section Configuration for details.
The Job Jar distribution includes a Python library for interfacing with the Job Jar system from Python. You can use this library from the initial installation directory, or install it in a directory in your PYTHONPATH. If you use the Python client library from the Job Jar installation directory, it will automatically infer the location of the job directories. If you install the Python client library elsewhere, you must tell it where the job and worker directories are. To do this, either:
See the section Configuration for more details.
The crontabs directory in a Jab Jar installation contains example crontab files which you can use to:
- Automatically restart workers if they quit unexpectedly, for example if there is a power failure or a worker node reboots;
- Automatically recover jobs stranded by crashed workers
A cron job is only effective on the nodes on which it is installed. It cannot restart workers on other nodes, or find stranded jobs that were claimed by workers on other nodes. Thus, the cron job must be run on each node on which you wish to restart workers, or recover stranded jobs, automatically.
You must edit an example crontab file before submitting it to cron, to fill in the path to the Job Jar installation. Additionally, cron jobs must be run in an environment that contain the Python interpreter in the default search path.
Node: A single computer on which one or more Job Jar workers run
Worker: A long-lived process running on a single node, which claims and runs jobs from the unclaimed jar
Job: Any Unix executable which has been submitted to the Job Jar system
Seed Job: A job which submits other jobs to the Job Jar system when run
Priority: An integer associated with a job, which determines in which order the job will be claimed relative to other jobs (higher-priority jobs are claimed before lower-priority jobs)
Category: A string associated with a job, which determines which workers will attempt to claim the job
Limbo: A directory containing jobs that have not yet been submitted to the unclaimed jar
Unclaimed Jar: A directory containing jobs that have not yet been claimed by any worker
Claimed Jar: A directory containing symbolic links to jobs that have been claimed by some worker, but not yet completed
Completed Jar: A directory containing jobs that have been run to completion by a worker
Pause: Temporarily suspend a worker without causing it to exit
Stop: Cause a worker process to exit
Stranded Job: A job that was claimed by a worker, which subsequently crashed (as from a system reboot).
See the section The Worker Lifecycle for more information about workers.
To start the Job Jar system, all you need to do is start at least one worker.
To start a worker on a node, run the program bin/start_worker . A background worker process will be started and will begin looking in the unclaimed jar for jobs. Type:
cd $JJ_HOME/bin ./start_worker
You may supply the start_worker program with a list of categories that this worker is allowed to process. Type:
cd $JJ_HOME/bin ./start_worker LINUX AMD_OPTERON
This command creates a worker that will claim jobs which are categorized as LINUX and AMD_OPTERON jobs, in addition to claiming uncategorized jobs . See the section Job Categories for more information about using categories.
If no categories are specified, the worker will only claim uncategorized jobs and jobs in the implicit category.
Any number of workers may be started on a given node. There is no explicit load balancing in the Job Jar system, so choose the number of workers you want to run based on:
The check_workers program inspects the worker directories in a Job Jar installation. It checks to see how many workers are running on the current node (the node on which the check_workers program is run). If fewer than the expected number of workers are running, check_workers starts as many workers as necessary to bring the number of running workers up to the expected number. The expected number of workers for a given node is determined by the JJ_MAX_WORKERS configuration parameter (see Configuration).
The check_workers program is what the worker.cron example crontab file invokes. You can also run check_workers manually, which can be useful if you wish to manage a Job Jar system without the use of crontabs. Type:
cd $JJ_HOME/bin ./check_workers
Workers in a Job Jar system may only be stopped between jobs. You cannot interrupt a running worker without manually killing the job it is running.
When a worker is stopped, it first finishes the job it is currently running. Then it moves the job to the completed jar, removes the job-specific scratch directory it created when it started the job, and exits.
The program $JJ_HOME/bin/stop_worker stops some or all of the workers in a Job Jar system.
To stop all currently-running workers, type:
cd $JJ_HOME/bin ./stop_worker all
New workers created after you run stop_worker all will not be stopped (e.g. if you are running the Job Jar system with cron jobs to restart workers). The command stop_worker all actually creates STOP files in each individual worker's directory, rather than a STOP_ALL file in the JJ_WORKER_HOME directory. See the section Stopping The System Manually for information about STOP and STOP_ALL files.
To stop all currently-running workers on a particular node, type:
cd $JJ_HOME/bin stop_worker NODE1 NODE2 ...
Replace NODE1, etc., with actual node names.
To stop particular workers, type:
cd $JJ_HOME/bin stop_worker NODE1.PID1 NODE2.PID2 ...
Replace NODE1.PID1, etc., with the name of the worker directory for the worker you want to stop.
You can list both nodes and individual workers when invoking stop_worker. For example:
cd $JJ_HOME/bin stop_worker rain flower.13764 meadow
The previous command will stop all workers on nodes rain and meadow, as well as the single worker on node flower with process ID 13764.
The Python client library contains the following functions to stop the Job Jar system:
To stop an individual worker in a Job Jar system, create a file named STOP in the worker's directory. For example, assuming a worker is running on node "flower" with process ID 5123, type:
cd $JJ_HOME/workers/flower.5123 touch STOP
When the worker finishes the job it is currently running, it will move the job to the completed jar, remove the job-specific scratch directory it created when it started the job, and exit.
If you wish to stop all workers in a running Job Jar system, you do not need to stop each worker individually. You can stop the entire system by creating a file named STOP_ALL in the JJ_WORKER_HOME directory. Type:
cd $JJ_WORKER_HOME touch STOP_ALL
As each worker checks for jobs, it will notice this file and exit. This is equivalent to creating a STOP file in each individual worker directory, but more convenient.
Note: A STOP_ALL file will cause new workers to stop before claiming any jobs. If you are running a Job Jar system with cron jobs to automatically restart workers, each new worker the cron job starts will stop immediately without doing any work. Use STOP_ALL with care.
Workers may be paused between jobs. When a worker is paused, it first finishes the job it is currently running. Then it moves the job to the completed jar, removes the job-specific scratch directory it created when it started the job, and begins sleeping. It does not claim jobs from the unclaimed jar, but it does not exit completely. A paused worker periodically wakes up to check whether it should remain paused, resume work, or stop.
Stopping has priority over pausing. That is, if a STOP or STOP_ALL file is present, the worker will exit rather than pausing.
The program $JJ_HOME/bin/pause_worker pauses some or all of the workers in a Job Jar system.
To pause all currently-running workers, type:
cd $JJ_HOME/bin ./pause_worker all
New workers created after you run pause_worker all will not be paused. The command pause_worker all actually creates PAUSE files in each individual worker's directory, rather than a PAUSE_ALL file in the JJ_WORKER_HOME directory. See the section Pausing And Unpausing The System Manually for information on PAUSE and PAUSE_ALL files.
To pause all currently-running workers on a particular node, type:
cd $JJ_HOME/bin pause_worker NODE1 NODE2 ...
Replace NODE1, etc., with actual node names.
To pause individual workers, type:
cd $JJ_HOME/bin pause_worker NODE1.PID1 NODE2.PID2 ...
Replace NODE1.PID1, etc., with the name of the worker directory for the worker you want to pause.
You can list both nodes and individual workers when invoking pause_worker. For example:
cd $JJ_HOME/bin pause_worker rain flower.996 meadow
The previous command will pause all workers on nodes rain and meadow, as well as the single worker on node flower with process ID 996.
The program $JJ_HOME/bin/unpause_worker unpauses some or all of the workers in a Job Jar system. When a worker is unpaused, it will begin claiming jobs from the unclaimed jar at some time in the future, unless it has been stopped.
The unpause_worker program takes all the same arguments as the pause_worker program. It is fine to unpause a worker that was not paused before. For example, you can use unpause_worker all after pausing just one or two workers, rather than unpausing the individual workers.
The Python client library contains the following functions to pause and unpause the Job Jar system:
To pause an individual worker, create a file named PAUSE in the worker's directory. For example, assuming a worker is running on node "rain" with process ID 17192, type:
cd $JJ_HOME/workers/rain.17192 touch PAUSE
You can pause all workers in a Job Jar system by creating a file named PAUSE_ALL in the JJ_WORKER_HOME directory. Type:
cd $JJ_WORKER_HOME touch PAUSE_ALL
This is equivalent to creating a PAUSE file in each individual worker directory, but more convenient. It can also be used to start new workers in a paused state.
To resume normal operations, remove the PAUSE or PAUSE_ALL files.
A job may be any Unix executable program. In practice, it is most effective to write Unix shell scripts that launch other programs with whatever parameters they require, since jobs are run without any command-line parameters. The program bin/submit_job in a Job Jar installation can automate creation of a shell script for simple tasks. See the section Submitting Jobs for more information.
When a job is run by a Job Jar worker, its environment will contain some useful symbols.
Remember that the job's scratch directory ($JJ_SCRATCH_DIR) is deleted after the job finishes. If your job produces output files that need to be saved, it must move them to their destination before exiting.
A common use for the Job Jar system is parallelizing large computations across multiple nodes. One way to do this is to use a front end program which takes the parameters of the computation as input and creates multiple jobs to carry out the computation.
For example, suppose the program run_sr performs a portion of a parallelizable computation called SR, which is parameterized by filename, start index, and end index. A hypothetical front-end system might prompt the user for the number of workers and the input parameters for the computation, then create multiple output jobs and submit them to the Job Jar system. An input of filename /data/sr_input, start index 1, end index 100, and 10 total workers, might produce 10 jobs like:
run_sr /data/sr_input 1 10 run_sr /data/sr_input 11 20 ... run_sr /data/sr_input 91 100
The front-end program could save these jobs to files for the user to submit, or submit them directly to the Job Jar system. See the section Submitting Jobs for different ways to submit jobs to the system.
An alternative to writing a front-end for creating jobs is to write a job which itself spawns other jobs, then exits. A job that submits other jobs is known as a seed job.
To continue the SR example from the last section, you could write a non-interactive program named distribute_sr (the seed job) which takes parameters for the overall computation, then submits jobs for the subtasks which make it up. Then a user could submit a job like:
distribute_sr -input=/data/sr_input -start=1 -end=100 -workers=10
The distribute_sr program would then submit 10 jobs like the following, and exit:
run_sr /data/sr_input 1 10 run_sr /data/sr_input 11 20 ... run_sr /data/sr_input 91 100
This approach is quite similar to the approach of writing a front-end program for job creation and submission. However, seed jobs can express some ideas more concisely.
A front-end program can also be used to submit a seed job. The most flexible design is to write a simple seed job whose command-line parameters define the sub-jobs the seed job wil create, along with a front-end program that generates seed jobs. This results in modular, domain-specific interface, which can be extended or invoked in different ways.
Complex computations often involve multiple stages. Later stages often cannot be started until earlier stages are complete. This section demonstrates one way to synchronize multiple-stage jobs.
Suppose that the SR computation is easily parallelized, but that the intermediate outputs of the various workers must be merged back into a single output file for use as input to a later computation. That is, the output of the various run_sr programs must be used as input to a program called merge_sr; however, merge_sr cannot run until all the run_sr instances have completed. One way to do this is to write a simple seed job which submits all the run_sr jobs, along with another job, monitor_sr that:
A worker may claim monitor_sr at any time. If the run_sr jobs have not all completed, the worker will simply resubmit monitor_sr and look for other work.
This same idea can be applied to computations with an arbitrary number of stages. Each stage will have a seed job that submits the job or jobs that make up the current stage, along with a monitor job, like monitor_sr, that checks whether the current stage has finished. When the current stage finishes, the monitor will submit the seed job for the next stage, which will behave in the same way. In a three-stage computation:
The seed jobs allow you to encapsulate each stage of a computation as a single idea, simplifying maintenance and allowing you to run later stages without running earlier stages, if necessary.
The advantages of this approach are:
The disadvantages are:
Conceptually, submitting a job means copying an executable file to the unclaimed jar. In practice, the system first copies the executable file to an intermediate directory, then moves it atomically to the unclaimed jar. This prevents a worker from claiming a partially-copied job. The intermediate directory is $JJ_HOME/limbo .
Each job has an associated priority. Higher-priority jobs are claimed by workers before lower-priority jobs. A job's priority is determined when it is submitted.
You do not need to specify a priority for a job. However, if there is already a long queue of jobs in the system, and you need a new job to be run before the older jobs complete, you can submit a job with a higher priority than what is already in the queue.
Alternatively, if you are going to submit a large number of jobs for some task that is not particularly urgent, you can submit those jobs with a low priority to let other jobs run to completion first. Note, however, that in a busy system with new jobs constantly being submitted, it is possible that low-priority jobs may never be claimed.
The default priority is 0. Negative priorities are allowed. If two jobs have equal priority, the older of the two jobs is claimed first.
Each job may be assigned to a particular category when it is submitted. Jobs that are not assigned to any category are uncategorized jobs. You can use categories to ensure that certain jobs are only run on nodes with a particular operating system, performance profile, or other distinguishing feature.
Each Job Jar worker has an ordered list of categories from which it will claim jobs. All workers will claim uncategorized jobs if there are no jobs in the other categories they are checking, so you do not need to use categories if you do not wish to. Each Job Jar worker also implicitly checks for jobs in a category with the same name as the node on which the worker is running.
Because a worker's list of categories is checked in order, job priorities only have meaning within a single category. If a worker is checking categories FAST and LINUX in that order, then it will not claim any jobs in the LINUX category until there are no jobs in the FAST category, no matter what the priorities of jobs in the LINUX category are.
There is no central list of categories. Instead, categories are created as needed. Submitting a job with a given category will create that category if it does not already exist.
Job categories are implemented as subdirectories of the unclaimed jar.
Note: If you assign a job a category that no worker will check, that job will never be run.
In addition to whatever explicit categories a worker is assigned, each worker also checks a category with the same name as the node on which the worker is running. You can submit jobs to the implicit per-node category to ensure that they run on a specific node.
In general, broader categories are more useful than per-node categories. However, running a job on a specific node is the only way to accomplish certain tasks. For example, suppose that the node "arizona" is running three workers, and the job one of the workers is processing has hung. You can submit a job to kill the hung job to the "arizona" category, so that it will run on the node where the kill command can take effect.
The program bin/submit_job in a Job Jar installation allows you to submit jobs to the system from the Unix command line.
The submit_job program can automate creation of a shell script for simple tasks. This allows you to submit jobs consisting of a Unix program plus some command parameters without manually creating a new script just to run the job. Type:
$JJ_HOME/bin/submit_job /usr/local/bin/big_computation /data/infile.tmp 104.1 30
This will first create an executable file with a unique name, containing the lines:
#!/bin/sh -f /usr/local/bin/big_computation /data/infile.tmp 104.1 30
It will then copy the new file to limbo. When the copy is complete, submit_job will move the new file into the unclaimed jar.
The submit_job program can also copy executable files directly with the -f (file) flag. Type:
$JJ_HOME/bin/submit_job -f /opt/local/bin/standalone_job
This will:
No intermediate shell script is created.
The submit_job program accepts the following options in addition to -f:
-p (priority)
The submit_job program submits jobs with a priority of 0 by default. You can specify a numeric priority with the -p (priority) flag. To submit a job with a priority of 10, type:
$JJ_HOME/bin/submit_job -p 10 /usr/local/bin/big_computation /data/infile.tmp 104.1 30or:
$JJ_HOME/bin/submit_job -p 10 -f /opt/local/bin/standalone_job
-c (category)
The submit_job program submits jobs as uncategorized by default. You can specify a category with the -c (category) flag. To submit a job in category LINUX, type:
$$JJ_HOME/bin/submit_job -c LINUX /usr/local/bin/big_computation /data/infile.tmp 104.1 30
-i (identifier)
The submit_job program copies jobs to a unique file name before submitting them to the system. This unique name includes information such as the date and time the job was submitted, and the login of the submitter. This can help find the job file as it goes through the Job Jar system.
You can specify an additional string identifier to be put in the job's name with the -i (identifier) flag. Specifying an additional identifier can help you keep track of the purpose of a particular job. Type:
$$JJ_HOME/bin/submit_job -i test_run /usr/local/bin/big_computation /data/infile.tmp 104.1 30
NOTE: Unix shells split command-line arguments into separate words based on whitespace. The submit_job program joins these words back together again with single spaces. Unix shells also interpret quotation marks specially. If you require quotation marks to be preserved in the final job that is submitted, you may need to protect them from the Unix shell with backslashes or by enclosing the entire command line in single quotes. You can also create a script file with quotes as you want them, and use the -f option to submit that file directly to the Job Jar system.
The Python client library allows you to submit jobs directly from Python, by calling the following functions:
The following example demonstrates the use of the Python client library to submit three jobs to the unclaimed jar. It assumes that the path to the Python client library is in your Python search path:
import jobjar jobjar.submit_job('/usr/local/bin/big_computation /data/infile.tmp 104.1 30') # Submit this job at a low priority jobjar.submit_job_file('/opt/local/bin/standalone_job', -3, id='test_run') # Submit this job with category LINUX jobjar.submit_job_file('/opt/local/bin/standalone', category='LINUX')
To submit a job to a Job Jar system, simply place an executable file in the unclaimed jar. You should copy the job to limbo first, then move it from limbo to the unclaimed jar, so that a worker does not claim a half-copied job. If the job you want to submit is a program named /opt/local/bin/standalone_job, and you wish the job to be uncategorized, type:
cd $JJ_HOME/limbo cp /opt/local/bin/standalone_job . mv standalone_job ../unclaimed
This is similar to typing:
$JJ_HOME/bin/submit_job -f /opt/local/bin/standalone_job
If you wish to submit the same job to the LINUX category, type:
cd $JJ_HOME/limbo cp /opt/local/bin/standalone_job . # Assumes the LINUX directory already exists mv standalone_job ../unclaimed/LINUX
This is similar to typing:
$JJ_HOME/bin/submit_job -f /opt/local/bin/standalone_job -c LINUX
The only difference between submitting a job manually and using the submit_job program is that the submit_job program would have copied the job executable to a new, unique name in limbo, and would have created the LINUX category if it did not exist.
If you wish to set the priority of the job to something other than the default of 0, put the priority you wish in the filename's extension. In the previous example, you might specify a priority of 15 as follows:
cd $JJ_HOME/limbo cp /opt/local/bin/standalone_job ./standalone_job.15 mv standalone_job.15 ../unclaimed
This is similar to typing:
$JJ_HOME/bin/submit_job -p 15 -f /opt/local/bin/standalone_job
The program bin/unique_job_name in a Job Jar installation generates a unique name for a job and prints the unique name to standard output. The job name is an absolute path located in the Limbo directory. The first example in the previous section could be rewritten as:
jobname=`$JJ_HOME/bin/unique_job_name` cp /opt/local/bin/standalone_job $jobname mv $jobname $JJ_HOME/unclaimed unset jobname
This is equivalent to typing:
$JJ_HOME/bin/submit_job -f /opt/local/bin/standalone_job
You can provide the -p option to unique_job_name to specify priority, just as with the submit_job program. The second example in the previous section could be rewritten as:
jobname=`$JJ_HOME/bin/unique_job_name -p 15` cp /opt/local/bin/standalone_job $jobname mv $jobname $JJ_HOME/unclaimed unset jobname
This is equivalent to typing:
$JJ_HOME/bin/submit_job -p 15 -f /opt/local/bin/standalone_job
You can provide the -i option to unique_job_name to specify a string identifier to include in the job's name, just as with the submit_job program.
The intended purpose for the unique_job_name program is to allow you to write your own version of submit_job, which does any extra work needed in your problem domain. For example, if all of your jobs need a common preamble and postamble, you can write a program that automatically generates the preamble and postamble, so that you only need to specify the portion of each job that changes from job to job.
The program bin/system_status can be used to print a summary of the Job Jar system. Type:
cd $JJ_HOME/bin ./system_status
The Python client library contains functions for inspecting the status of the Job Jar system.
When a worker starts, it creates a subdirectory in the workers directory of the Job Jar installation. This directory's name is of the form <node>.<pid> . For example, a worker running on node rain with process ID 7723 would create the directory $JJ_HOME/workers/rain.7723 . Thus, simply inspecting the contents of the workers subdirectory of a Job Jar installation can give you an overview of how many workers there are in the system, and on what nodes they're running.
Each worker's directory is further populated as follows:
Workers do not delete their directories when they exit.
A Job Jar installation needs no special configuration to run. However, there are several aspects of a Job Jar system that can be configured if desired.
When Job Jar programs run, they determine their configuration parameters in this order:
That is, parameters set in environment variables override parameters set in configuration files, and so on.
The following parameters may be set in configuration files, environment variables, or in the Python client library.
JJ_PAUSE_TIME. If set, this parameter's value is taken as the base time, in seconds, for a worker to sleep when it is paused. A randomizing factor of plus or minus 10%, but no more than 10 seconds, is applied to this time to help spread out access to the shared directories from workers on different nodes. The default is 60 seconds (one minute).
JJ_MAX_WORKERS. If set, this parameter's value is taken as the maximum number of workers that should run on the current node. The program check_workers consults this parameter. This parameter may only be set in an environment variable or INI file, not from the Python client library. The default is 4 workers per node.
JJ_HOME. If set, this parameter's value is taken to be an absolute path. It is interpreted as the parent of job directories (unclaimed, claimed, and completed), the limbo directory, and worker directories (<node>.<pid>). The default location for job and worker directories is the top directory of the Job Jar installation hierarchy.
JJ_WORKER_HOME. If set, this parameter's value is taken to be an absolute path. It is interpreted as the parent of worker directories (<node>.<pid>). If both JJ_HOME and JJ_WORKER_HOME are set, then job directories are under the value of``JJ_HOME``, while workers directories are under the value of JJ_WORKER_HOME. The default location for worker directories is $JJ_HOME/workers.
JJ_CATEGORIES. If set, this parameter's value is taken to be an ordered list of the categories workers check for jobs. Workers check these categories after checking the implicit category for the node on which the worker is running. Workers always check for uncategorized jobs as well, but only after first checking for jobs in the implicit per-node category, and categories specified by JJ_CATEGORIES.
Job Jar system programs look for configuration parameters in a file named .jobjar/config.ini in your home directory. This file is called the personal configuration file. If the environment variable JJ_CONFIG_FILE is set, then its value is taken as an absolute path to a configuration file to use instead.
If the file config.ini exists in the Job Jar installation directory, configuration parameters are first read from that file. This file is called the global configuration file. Settings in the personal configuration file will override settings in the global configuration file, and settings in the environment will override both. The Job Jar distribution includes a global configuration file. You may copy this file to the personal configuration file to edit it, or remove it entirely, if you wish.
The structure of Job Jar configuration files is similar to that of Windows INI files. Lines beginning with "#" or ";" are ignored and may be used to provide comments. The first nonblank, non-comment line of a config file must be [Job Jar]. Subsequent lines contain "name: value" entries ("name=value" is also accepted). The names you may set are those described in the section Common Configuration Parameters .
The following is an example INI file which sets all of the common configuration parameters:
# Sample Job Jar configuration file [Job Jar] # Pause 5 minutes when no jobs are available JJ_PAUSE_TIME: 300 # The check_workers program may start up to 4 workers on this node JJ_MAX_WORKERS: 4 # The parent directory for the unclaimed, claimed, completed, and # limbo directories JJ_HOME: /opt/jobjar-1.0.0 # The parent directory for worker-specific directories JJ_WORKER_HOME: /opt/jobjar_workers # The categories workers will check for jobs JJ_CATEGORIES: LINUX
You may set environment variables to control a Job Jar system. Environment variables override settings in configuration files. All of the parameters in the section Common Configuration Parameters may be set in the environment. There is also one additional parameter that you may set: JJ_CONFIG_FILE.
JJ_CONFIG_FILE. If set, this variable's value is taken as the absolute path to the personal configuration file. See the section Configuring a System With INI Files for details. Any values set in the environment will override values set in this configuration file. For example, if the configuration file specifies JJ_HOME, and JJ_HOME is also set in the environment, the value from the environment is used.
When the Python client library interacts with a Job Jar system, it consults any INI files and environment variables for configuration. You may also set and inspect configuration parameters directly from Python. If configuration parameters are set in INI files or environment variables, and also set explicitly using the Python client library, the settings in the Python client library will override the settings in INI files and environment variables.
The configuration-related functions in the Python client library are:
The following example demonstrates the use of the Python client library to configure the Job Jar system. It assumes that the path to the Python client library is in your Python search path:
import jobjar jobjar.set_home('/opt/jobjar-1.0.0') jobjar.set_worker_home('/opt/jobjar_workers') jobjar.set_pause_time(500)
It is useful to understand what each worker process does. The pseudocode below describes the lifecycle of a Job Jar worker:
read configuration parameters from INI files and environment variables create worker directory do forever: if stopped: exit if paused: sleep for JJ_PAUSE_TIME seconds, plus a small randomizer continue job_to_claim = next_job_to_claim() if no job available: sleep for JJ_PAUSE_TIME seconds, plus a small randomizer continue move job_to_claim to worker directory if move failed: # Another worker claimed the job first continue make symbolic link in claimed jar to newly-claimed job make job scratch directory set environment variables for job run job move job to completed jar remove symbolic link from claimed jar remove job scratch directory function next_job_to_claim (): for category in [per-node category] + JJ_CATEGORIES: candidates = list of all jobs with the highest priority if no candidates: continue else: return oldest job in candidates candidates = list of all uncategorized jobs with the highest priority return oldest job in candidates
The primary points to notice in the worker lifecycle are:
$Id: user_guide.txt,v 1.11 2004/07/13 01:30:04 sfiedler Exp $