mcp
Master Control Program (MCP)
MCP is a command line utility that provides automatic resource selection for running a single parallel job on high performance computing resources. MCP uses directives provided by the user in batch submission scripts to submit to the queues of multiple resources. As soon as the job starts to run on one of the resources, it removes the jobs from all other resources' queues. MCP can be called in one of the following ways:
Note: Automatic resource selection scheduling services are a class of metascheduler. Ultimately, their objective is to run jobs faster on distributed high performance compute resources by finding the resource with the soonest availability.
- Use MCP by itself: create a batch job submission script for each machine that includes #MCP directives.
- Call MCP through fullauto: a utility that simplifies the use of MCP by storing frequently used settings and automatically generating job files.
MCP requires the application to be compiled on each machine (if it does not already exist there), and the input files to be staged on the remote clusters. The MCP submission will be initiated from only one of the machines. In order for the batch job submission script to run correctly, it needs to include #MCP directives. MCP can be used by itself, but the user must manually create a job script for each machine.
Fullauto
Fullauto is a command line utility that automatically generates job scripts for MCP. Fullauto simplifies the process of submitting a job to multiple clusters through MCP by allowing users to specify and store job and machine attributes. After learning those attributes, it selects the appropriate clusters for the job and generates job scripts for each cluster. It then uses MCP to run the job on the earliest available cluster. The user defined attributes are kept in the autojob.py module
Locations
Currently MCP is installed on the IA-64 cluster at SDSC.
Currently fullauto is installed on the IA-64 cluster at SDSC.MCP Workflow
The following workflow presumes the use of two resources.
-
User runs grid-proxy-init or myproxy-get-delegation to establish grid credential.
grid-proxy-init
OR
myproxy-get-delegationNote: Use myproxy-get-delegation to establish your grid credential using XSEDE Single Sign-On with your XSEDE-Wide (XSEDE User Portal) password.
-
User constructs a set of appropriate job files, one for each resource. (See example job files.)
vi jobfile_1
vi jobfile_2 - User submits the job files to MCP with job files as the input.
./mcp.py [--debug] <submit_script1> [<submit_script2>]./mcp.py --query=<MCP job info file>./mcp.py --resume=<MCP job info file> - MCP scans each of the job submission scripts for lines that specify the resource to which the script is to be submitted. Required lines are:
#MCP submit_host <head node for the remote cluster>#MCP username <local username on the remote cluster>#MCP scratch_dir <scratch directory on the remote cluster> - MCP submits the scripts to the queues of the specified resources and proceeds to continuously monitor the status of these jobs. As soon as the job begins to run on one of the resources, it is removed from queues of the other resources.
MCP directives, examples and explanations
Shortcut to built-in configurations
Specifies use of built-in configs for certain resource managers:
#MCP qtype [pbs|loadleveler|globusws|cobalt] #MCP qtype pbs Mandatory directives
Specifies on which machine the job is submitted:
#MCP submit_host [hostname] #MCP submit_host tg-login1.sdsc.edu Specifies remote username:
#MCP username [remote username] #MCP username your_username Specifies where MCP may stage files:
#MCP scratch_dir [remote scratch directory] #MCP scratch_dir /gpfs/your_username/test/mcp Directives mandatory for qtype globusws
Specifies whether the job submit command is run on the same machine as mcp.py (as with Globus GRAM jobs) or on the submit_host:
#MCP submit_mode [local|remote] #MCP submit_mode local Specifies how to contact Globus web service:
#MCP globus_factory [globus factory string] #MCP globus_factory https://tg-login1.sdsc.teragrid
[cont'd].org:8443/wsrf/services/ManagedJobFactoryService Specifies type of factory to contact through Globus:
#MCP globus_factory_type [Globus factory type] #MCP globus_factory_type PBS If #MCP qtype is specified, the following directives are optional
Specifies job submit command:
#MCP submit_command [command to submit job] #MCP submit_command qsub Allows the user to modify parsing of the returned text from the submit command:
#MCP submit_return_pattern [Python regular expression] #MCP submit_return_pattern ^(?P<job_id>\d+).[-.\w]+\s*$ Specifies job query command:
#MCP queue_line_command [command to query job] #MCP queue_line_command qstat Allows the user to modify parsing of the returned text from the query command:
#MCP queue_line_pattern [Python regular expression] #MCP queue_line_pattern ^\d+\.\S*\s+\S+\s+\S+\s+
[cont'd](\d\d:\d\d:\d\d|\d)*\s+(?P<state>\w)\s+\S+ Specifies job cancel command:
#MCP kill_command [command to cancel job] #MCP kill_command qdel MCP Notes
- Unprompted remote commands provide for smoother running of MCP than prompted commands. In order to use unprompted commands, user can either set up passwordless ssh with a ssh-agent, or use grid-proxy-init and gsissh (where available). To specify the path to ssh and scp that do not require passwords, set the environment variables MCPSSH and MCPSCP.
export MCPSSH=/usr/bin/gsisshexport MCPSCP=/usr/bin/gsiscp - Since there may be a long delay before job start, use the UNIX screen utility to run MCP session in the background (See example commands).
- Qtype function specifies the commands to be used for job submission, monitoring and cancellation. Recognized types are pbs, loadleveler, cobalt andglobusws. User can create new qtypes and write them to the MCPResource.py file (See Scenarios 1 and 2). Also, interface information can be specified manually in the job submission script (See Scenario 3)
Other resource manager commands to be used by MCP are specified in the MCPResource.py file.
Fullauto Workflow
- User runs grid-proxy-init or myproxy-get-delegation to establish grid credential.
grid-proxy-init
OR
myproxy-get-delegation * - User constructs an appropriate autojob.py module file with personalized settings. (See example autojob.py files) User should create a working directory and place autojob.py in it.
vi autojob.py - User runs fullauto.py with autojob.py as the input.
fullauto.py fullauto.py --autojobfile=<autojob file>fullauto.py --attributes - Fullauto imports the autojob.py module to learn which attributes to it should use to select remote clusters. It then creates job scripts for each selected cluster.
- Fullauto uses MCP to run the scripts.
Fullauto Notes
Use myproxy-get-delegation to establish your grid credential using XSEDE Single Sign-On with your XSEDE-Wide (XSEDE User Portal) password.
** The name of the module must be autojob.py.
Use the fullauto --always option to launch the job on any and all available queues. For example, at NCSA it might submit three jobs (fastqueue, highqueue, compute) depending on the requirements. When a job in one queue starts, the others will be killed off. This behaviour is a consequence of the heterogenous nature of the clusters. NCSA IA-64 is a mix of 1300 and 1500 MHz nodes; if a user specifies a particular MHz, then only jobs appropriate to that queue would be submitted.
Fullauto calls MCP; the user does not have to run MCP separately; however, MCP can be used to select resources automatically using manually generated scripts. See the MCP section for more information.
Unprompted remote commands provide for smoother running of MCP than prompted commands. In order to use unprompted commands, user can either set up passwordless ssh with a ssh-agent, or use grid-proxy-init and gsissh (where available). To specify the path to ssh and scp that do not require passwords, set the environment variables MCPSSH and MCPSCP.
export MCPSSH=/usr/bin/gsissh
export MCPSCP=/usr/bin/gsiscp
Since there may be a long delay before job start, use the screen utility to run MCP sessions in the background (See example commands).
To see all attributes of autojob.py, including machines and queues where fullauto/MCP will run, use --attributes.
/usr/local/apps/mcp)./fullauto.py --attributes Use the fullauto --debug option for debugging. This option is passed on to MCP for extensive debugging information.