Slurm User and Administrator Guide for Cray Systems Natively


User Guide

This document describes the unique features of Slurm on Cray computers natively, or without the use of Cray's Application Level Placement Scheduler (ALPS). You should be familiar with the Slurm's mode of operation on Linux clusters before studying the differences in Cray system operation described in this document. When running Slurm in native mode a Cray system will function very similar to a Linux cluster.

Since version 14.03, Slurm is designed to operate as a workload manager on Cray XC systems (Cascade) without the use of ALPS. In addition to providing the same look and feel of a regular Linux cluster this also allows for many functionalities previously not available, such as...

  • Ability to run multiple jobs per node.
  • Ability to status running jobs with sstat
  • Full accounting support for job steps
  • Ability to run multiple jobs/steps in background from the same session

Cray Specific Features

  • Network Performance Counters
  • To access Cray's Network Performance Counters (NPC) you can use the --network option in sbatch/salloc/srun to request them. There are 2 different types of counters, system and blade.

    For the system option (--network=system) only one job can use system at a time. Only nodes requested will be marked in use for the job allocation. If the job does not fill up the entire system the rest of the nodes are not able to be used by other jobs using NPC, if idle their state will appear as PerfCnts. These nodes are still available for other jobs not using NPC.

    For the blade option (--network=blade) Only nodes requested will be marked in use for the job allocation. If the job does not fill up the entire blade(s) allocated to the job those blade(s) are not able to be used by other jobs using NPC, if idle their state will appear as PerfCnts. These nodes are still available for other jobs not using NPC.

  • Core Specialization
  • Ability to reserve a number of cores allocated to the job for system operations and not used by the application. The application will not use these cores, but will be charged for their allocation.

Admin Guide

Many new plugins were added to utilize the Cray system without ALPS. These should be set up in your slurm.conf outside of your normal configuration.

  • CoreSpec
  • To use set CoreSpecPlugin=core_spec/cray.

  • JobSubmit
  • To use set JobSubmitPlugins=job_submit/cray. Primarily this plugin is used to set a gres=craynetwork value which is used to limit the number of applications that can run on a node at once. For a node without MICs on it that number at most is 4. Nodes with MICs the number drops to 2. This craynetwork gres needs to be set up in your slurm.conf to ensure proper functionality. In example...

        ...
        Grestypes=craynetwork
        NodeName=nid000[00-10] gres=craynetwork:4 #node without MIC
        NodeName=nid000[11-20] gres=craynetwork:2 #node with MIC
        ...
      

  • Proctrack
  • To use set ProctrackType=proctrack/cray.

  • Select
  • To use set SelectType=select/cray. This plugin is a layered plugin. Which means it enhances a lower layer select plugin. By default it is layered on top of the select/linear plugin. It can also be layered on top of the select/cons_res plugin by using the SelectTypeParameters=other_cons_res, doing this will allow you to run multiple jobs on a Cray node just like on a normal Linux cluster. Use additional SelectTypeParameters to identify the resources to allocate (e.g. cores, sockets, memory, etc.). See the slurm.conf man page for details.

  • Switch
  • To use set SwitchType=switch/cray.

  • Task
  • To use set TaskPlugin=cray. It is advised to use this in conjunction with other task plugins such as the task/cgroup plugin. This can be done in this manner, TaskPlugin=cgroup,cray, you can also use the task/affinity plugin if wanted with task/cray or the combination of all three depending on how you want your system configured (i.e TaskPlugin=cgroup,affinity,cray). The plugins are used in the order they are defined in the comma separated list.

Cray system Setup

Some services on the system need to be set up to run correctly with Slurm. Below is how to restart the service and the nodes they run on. It is probably a good idea to set this up to happen automatically.

  • boot node
    • WLM_DETECT_ACTIVE=SLURM /etc/init.d/aeld restart
  • sdb node
    • WLM_DETECT_ACTIVE=SLURM /etc/init.d/ncmd restart
    • WLM_DETECT_ACTIVE=SLURM /etc/init.d/apptermd restart

    As with linux clusters you will need to start a slurmd on each of your compute nodes. If you choose to use munge authentication, advised, you will also need munge installed and a munged running on each of your compute nodes as well. See the quick start guide for more info. Outside of the differences listed in this file it can be used to set up your Cray system to run Slurm natively.

    Last modified 5 April 2014