IA - Deep Learning¶
Software library available¶
Deep learning (tensorflow
, pytorch
, keras
) and machine learning (scikit-learn
) frameworks are installed under python3 environments. These environments also provide image processing tools, such as opencv
and scikit-image
, as well as data analysis tools such as pandas
.
There are four main installations available on Myria.
- The first installation proposes the frameworks in frozen version and based on cuda 9.0. The environment is accessible by loading the python3-DL/3.6.1 module :
module load python3-DL/3.6.1
. - The 3.6.9 and 3.7.6 installations, recommended for standard use offer the frameworks in relatively recent versions and are based on cuda 10.0 and cuda 10.1 respectively.
These environments are accessible by loading the module python3-DL/3.6.9 :module load python3-DL/3.6.9
(resp. python3-DL/3.7.6 :module load python3-DL/3.7.6
) - Finally the fourth installation offers the frameworks in their November 2020 version and is based on cuda 10.1.
The environment is accessible by loading the module python3-DL/3.8.5:module load python3-DL/3.8.5
.
The following table summarizes the different versions of the frameworks available for each of these two versions.
module | python3-DL/3.6.1 | python3-DL/3.6.9 | python3-DL/3.7.6 | python3-DL/3.8.5 | python3-DL/3.8.8 | python3-DL/3.8.0 | python3-DL/3.9.7 |
---|---|---|---|---|---|---|---|
architecture | k80, p100, v100 | k80, p100, v100 | k80, p100, v100 | k80, p100, v100 | k80, p100, v100 | k80, p100, v100 | k80, p100, v100 |
python | 3.6.1 | 3.6.9 | 3.7.6 | 3.8.5 | 3.8.8 | 3.8.0 | 3.9.7 |
cuda | 9.0 | 10.0 | 10.1 | 10.1 | 11.0 | 11.1 | 11.1 |
keras | 2.2.4 | 2.2.5 | 2.3.1 | 2.4.3 | 2.4.3 | NA | NA |
opencv | 3.4.0 | 4.1.0 | 4.2.0 | 4.4.0 | 4.5.1 | 4.5.2 | 4.5.3 |
pandas | 0.24.0 | 0.25.1 | 1.0.1 | 1.1.4 | 1.2.3 | 1.2.5 | 1.3.3 |
scikit-image | 0.14.2 | 0.15.0 | 0.16.2 | 0.17.2 | 0.19.3 | 0.18.1 | 0.19.3 |
scikit-learn | 0.20.0 | 0.21.3 | 0.22.1 | 0.23.2 | 0.24.1 | 0.24.2 | 1.0 |
tensorflow | 1.8 | 1.14 | 2.1 | 2.3.1 | 2.4.1 | 2.5.0 | 2.6.0 |
torch | 1.0.1 | 1.2.0 | 1.4.0 | 1.6.0 | 1.7.1 | 1.8.1 | 1.9.1 |
Many additional packages enrich the main installed tools. The exhaustive list of packages, once the module is loaded, can be obtained by the command: pip list
.
Missing packages¶
If some packages are missing in the proposed installation it is recommended to ask for their installation to the CRIANN technical team by sending the request to support@criann.fr
In some cases, it is possible to install the packages locally on the user's home using the command: pip install <package> --user
In this case, the default local directory in which the package will be installed depends on the loaded module version:
- python3-DL/3.6.9 :
~/.python-DL-3.6.9/site-packages
- python3-DL/3.6.1 :
~/.python-DL-3.6.1/site-packages
- python3-DL/3.7.6 :
~/.python-DL-3.7.6-2/site-packages
- python3-DL/3.8.5 :
~/.python-DL-3.8.5/site-packages
- python3-DL/3.8.8 :
~/.python-DL-3.8.8/site-packages
- python3-DL/3.8.0 :
~/.python-DL-3.8.0/site-packages
- python3-DL/3.9.7 :
~/.python-DL-3.9.7/site-packages
Use¶
For most of the Deep Learning frameworks the use of GPU resources is recommended (or even mandatory in the case of tensorflow).
To access the gpus resources it is mandatory to run on the gpu_k80
, gpu_p100
or gpu_v100
partitions.
Warning: don't forget to specify the number of GPUs to be used for your computation with the slurm option: --gres gpu:X
where X
is the number of GPU devices to be loaded.
Although most of the computations are done on GPUs, most of the frameworks make a multi-threaded use of python. It is therefore relevant to assign several cpus to the same python task. To do so, you just have to use the --cpus-per-task
option of slurm. The choice of the number of cpus per task is made in proportion to the number of GPUs used and depends on the targeted partition:
gpu_k80
: up to 7 cpus per gpugpu_p100
: up to 14 cpus per gpugpu_v100
: up to 8 cpus per gpu
On Myria, the slurm script: /soft/slurm/criann_modeles_scripts/job_tensorflow.sl
is an example for a single-gpu tensorflow job on p100 architecture.