In the recent posts, we have been going through the installation of deep learning framework like Caffe2 and its dependencies, such as CUDA or cuDNN. In this post, we will go few steps back to the very basic prerequisite of setting up a GPU-powered deep learning system: display driver installation. We will specifically focus on NVIDIA display driver installation due to the pervasiveness and robustness of NVIDIA GPUs as deep learning infrastructure.
Key Terminologies
Before proceeding to the installation, let’s discuss some key terminologies related with the use of NVIDIA GPUs as the computing infrastructure in a deep learning system.
GPU: Graphical / Graphics Processing Unit. A unit of computation, in a form of a small chip on the graphics card, traditionally intended to perform rapid computation for image / graphics rendering and display purpose. A graphics card can contain one or more GPUs while one GPU can be built of hundreds or thousands of cores.
CUDA: A parallel programming model and the implementation as a computing platform developed by NVIDIA to perform computation on the GPUs. CUDA was designed to speed up computation by harnessing the power of the parallel computation utilizing hundreds or thousands of the GPU cores.
CUDA-enabled GPUs: NVIDIA GPUs that support CUDA programming model and implementation
CUDA compute capability: A number that refers to the general specifications and available features especially in terms of parallel computing methods of a CUDA-enabled GPU. The full list of the available features in each compute capability can be seen here.
Note on CUDA compute capability and deep learning:
It is important to note that if you plan to use an NVIDIA GPU for deep learning purpose, you need to make sure that the compute capability of the GPU is at least 3.0 (Kepler architecture).
Installation Steps
After ensuring that you already have the right graphics card and have it properly mounted on the PCI / PCI-e slot, we’ll now proceed with the graphics card installation. After rebooting / turning on the machine, let’s open a terminal session for command line installation.
Step 0: Perform a quick test for checking driver installation status
If you are not installing the driver on a freshly created Ubuntu VM or newly installed Ubuntu OS, you may need to perform a quick check for the driver status. Invoke this command from the terminal:
$ nvidia-smi
If you see this error:
bash: nvidia-smi: command not found
It means that the driver truly is not installed yet and we can safely proceed with the remaining installation steps. Otherwise, you may check the driver version and perform version upgrade if necessary.
Step 1: Check that the graphics card is connected to PCI bus
$ lspci | grep -i nvidia
Output:
84:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
85:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Explanation: From the output above, we can see that the graphics card model is Tesla K80.
Step 2: Check the recommended driver version from NVidia website.
From the NVIDIA driver download page, we provide the graphics card, OS, the CUDA toolkit information.
For Tesla K80 to be installed on Ubuntu 16.04 with CUDA toolkit 9.1, the recommended driver version was 390.46.
Step 3: Check existing NVIDIA driver packages cached by apt
$ sudo apt-cache search nvidia | grep -E "nvidia-[0-9]{3}"
Note: If the output contains entries with nvidia-xxx (xxx= number) that corresponds to the package names for NVIDIA drivers, it means that the driver(s) have been cached by apt.
Step 4: Check the NVIDIA driver that is currently installed
$ dpkg -l | grep -E "nvidia-[0-9]{3}"
Sample output:
ii nvidia-375 384.111-0ubuntu0.16.04.1 amd64 Transitional package for nvidia-384
ii nvidia-384 384.111-0ubuntu0.16.04.1 amd64 NVIDIA binary driver - version 384.111
Note:
From the output list above, the entries “ii nvidia-375 …” and “ii nvidia-384 …” correspond to installed version of NVIDIA drivers. We then need to remove them from the system prior to installing a newer driver.
$ sudo apt-get purge nvidia-*
Step 5: Add NVIDIA private repository for graphics drivers
$ sudo add-apt-repository ppa:graphics-drivers/ppa
Troubleshooting:
– If error is shown when executing the command: “sudo: add-apt-repository: command not found”
Resolve: install software-properties-common package
$ sudo apt-get install software-properties-common
Step 6: Update apt index
$ sudo apt-get update
Step 7: Verify once again the NVIDIA driver packages to install
$ sudo apt-cache search nvidia | grep -E "nvidia\-[0-9]{3}"
Sample output:
nvidia-304-dev - NVIDIA binary Xorg driver development files
nvidia-331 - Transitional package for nvidia-331
nvidia-331-dev - Transitional package for nvidia-340-dev
nvidia-331-updates - Transitional package for nvidia-340
nvidia-331-updates-dev - Transitional package for nvidia-340-dev
nvidia-331-updates-uvm - Transitional package for nvidia-340
nvidia-331-uvm - Transitional package for nvidia-340
nvidia-340-dev - NVIDIA binary Xorg driver development files
nvidia-340-updates - Transitional package for nvidia-340
nvidia-340-updates-dev - Transitional package for nvidia-340-dev
nvidia-340-updates-uvm - Transitional package for nvidia-340-updates
nvidia-340-uvm - Transitional package for nvidia-340
nvidia-346 - Transitional package for nvidia-346
nvidia-346-dev - Transitional package for nvidia-352-dev
nvidia-346-updates - Transitional package for nvidia-346-updates
nvidia-346-updates-dev - Transitional package for nvidia-352-updates-dev
nvidia-352 - Transitional package for nvidia-361
nvidia-352-dev - Transitional package for nvidia-361-dev
nvidia-352-updates - Transitional package for nvidia-361
nvidia-352-updates-dev - Transitional package for nvidia-361-dev
nvidia-361-updates - Transitional package for nvidia-361
nvidia-361-updates-dev - Transitional package for nvidia-361-dev
nvidia-304-updates - Transitional package for nvidia-304
nvidia-304-updates-dev - Transitional package for nvidia-304-dev
nvidia-361 - Transitional package for nvidia-367
nvidia-361-dev - Transitional package for nvidia-367-dev
nvidia-367 - Transitional package for nvidia-375
nvidia-367-dev - Transitional package for nvidia-375-dev
nvidia-375 - Transitional package for nvidia-384
nvidia-375-dev - Transitional package for nvidia-384-dev
nvidia-384-dev - NVIDIA binary Xorg driver development files
nvidia-304 - NVIDIA legacy binary driver - version 304.137
nvidia-340 - NVIDIA binary driver - version 340.106
nvidia-384 - NVIDIA binary driver - version 384.130
nvidia-387-dev - Transitional package for nvidia-390-dev
nvidia-387 - Transitional package for nvidia-390
nvidia-390-dev - NVIDIA binary Xorg driver development files
nvidia-390 - NVIDIA binary driver - version 390.48
nvidia-396-dev - NVIDIA binary Xorg driver development files
nvidia-396 - NVIDIA binary driver - version 396.18
Step 8: Install the latest / recommended NVIDIA driver. This time we will install the recommended version.
$ sudo apt-get install nvidia-390
Step 9: Reboot the system
$ sudo reboot
Troubleshooting: try to disable secure boot when rebooting after installing NVIDIA drivers
Step 10: Check the custom driver has been installed by invoking nvidia-smi command.
$ nvidia-smi
Sample output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:84:00.0 Off | 0 |
| N/A 33C P0 57W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:85:00.0 Off | 0 |
| N/A 26C P0 70W / 149W | 0MiB / 11441MiB | 92% Default |
+-------------------------------+----------------------+----------------------+
Concluding remark
Installing NVIDIA custom graphics driver should not be too daunting. How about your experience? Did you face specific issue that you want to share with others?
Pingback: Guide: Installing Cuda Toolkit 9.1 on Ubuntu 16.04 « Amikelive | Technology Blog
Pingback: Guide: Installing Tensor Flow 1.8 with GPU Support against CUDA 9.1 and cuDNN 7.1 on Ubuntu 16.04 « Amikelive | Technology Blog
Pingback: Comprehensive Guide: Installing Caffe2 with GPU Support by Building from Source on Ubuntu 16.04 « Amikelive | Technology Blog
Pingback: Installing CUDA Toolkit 9.2 on Ubuntu 16.04: Fresh Install, Install by Removing Older Version, Install and Retain Old Version | Amikelive | Technology Blog
Pingback: CUDA Compatibility of NVIDIA Display / GPU Drivers | Amikelive | Technology Blog
Great article getting my K80 working with ubuntu 20, however, you have a curly quote in the code for steps 3,4, and 7.
The curly quotes were the results of inadvertent auto formatting by the editor. It should have been fixed now. Thanks for the heads-up.