Profiling under Slurm¶
Using Slurm's profiling functionality, available since the summer 2020 update
Profiling disabled by default
By default, the jobs run without profiling enabled.
Principle¶
By default, Slurm indicates and stores the memory consumption of a job. This can be viewed after the computation is completed with the sacct
command. When profiling is enabled for a job, Slurm collects the data periodically and stores it in an HDF5 file.
Usage¶
When starting the job (sbatch
), add the option --profile=all
.
Once the job is finished, ask for the collection of the generated files by executing the command sh5util -j $JOB_ID
, replacing $JOB_ID
by the job number. This command generates a file (job_$JOB_ID.h5
) in HDF5 format containing the collected data, in the current folder.
Data¶
On Myria, only the data related to the calculation tasks can be queried (task). You can follow the evolution of the memory usage (RSS and VMSize).
The GPFS file system is not compatible with the read/write monitoring plugin. Also the Omni-Path network is not compatible with the network monitoring plugin.
To view the contents of the HDF5 file, install the HDFView
software on your workstation.
Going further¶
For more information, see the Slurm documentation: https://slurm.schedmd.com/archive/slurm-20.02.7/hdf5_profile_user_guide.html