When performing deep learning tasks especially on a single physical machine, there can be a moment where we need to execute tasks in parallel. Suppose that we are evaluating different models. We may need a task to calculate the precision and recall of a certain model while at the same time we are in need for training another model. We can proceed with the sequential operation, doing the tasks one by one. But life will be much easier if the tasks can be done in parallel. A possible route to achieving this is by creating several containers and perform distinct task in each container.
NVIDIA provides a utility called nvidia-docker. The utility enables creation of Docker containers that leverage CUDA GPU computing when being run. Under the hood, nvidia-docker will add a new Docker runtime called nvidia during the installation. By specifying this runtime when invoking a command in a (new) Docker container, the command execution will be accelerated with the GPUs.
As outlined in the official documentation, these prerequisites should be satisfied prior to installation.
1. Verify that the GNU/Linux kernel version is bigger than 3.10
The kernel version info can be obtained by invoking this command:
$ uname -r
2. Verify that the Docker version is at least 1.12
The Docker version info can be obtained by invoking this command:
$ docker -v
If you haven’t installed Docker, you can refer to the installation guide provided in this article.
3. Verify that NVIDIA GPU architecture is newer than Fermi (the CUDA compute capability > 2.1)
The architecture types that are newer than Fermi:
- Kepler (compute capability 3.0 – 3.7)
- Maxwell (compute capability 5.0 – 5.3)
- Pascal (compute capability 6.0 – 6.2)
- Volta (compute capability >= 7.0)
To check if the current GPU can be used with nvidia-docker, we can use nvidia-smi to obtain the GPU product name.
$ nvidia-smi -q | grep “Product Name”
After obtaining the product name, if you are using desktop GPUs you can check against the table provided in this article. If the product is not in the list, there is high chance that nvidia-docker cannot be installed.
4. Verify that the NVIDIA driver version >= 361.93
To obtain the NVIDIA driver version, we can invoke nvidia-smi command again:
$ nvidia-smi | grep “Version"
If all prerequisites are satisfied, we can now proceed to the installation
1. Add the repository GPG key
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
2. Add the repository source list
$ curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
3. Update the apt package index
$ sudo apt-get update
4. Install nvidia-docker2 and reload the docker daemon
$ sudo apt-get install -y nvidia-docker2
$ sudo pkill -SIGHUP dockerd
5. Test running docker with GPU acceleration provided by NVIDIA runtime
$ sudo docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Alternatively, we can replace docker command with nvidia-docker
$ nvidia-docker run --rm nvidia/cuda nvidia-smi
Let’s say we have a docker image for Caffe2 and want to run a python script for operator test. We can execute the python script with a GPU-accelerated docker image as follows.
$ nvidia-docker run -it caffe2ai/caffe2 python -m caffe2.python.operator_test.relu_op_test
The guide for Caffe2 installation will be elaborated in a future post. So, stay tuned!
How was your nvidia-docker installation? If you have issues or want to share troubleshooting details, simply write in the comment section.