Guide: Installing Tensor Flow 1.8 with GPU Support against CUDA 9.1 and cuDNN 7.1 on Ubuntu 16.04

What is interesting in the deep learning ecosystem is the plentiful choices of deep learning frameworks. On the other side, of course there is another equation; more options equate to more confusion, especially in choosing the most appropriate framework for the entire gamut of the problems. At the end of the day, instead of using one, we may need to stick with multiple deep learning frameworks with each usage depending on the nature of the problem to solve.

TensorFlow is one of the popular (de facto most popular in terms of Github stars) deep learning frameworks. TensorFlow comes with excellent documentation. This also includes the documentation for installation. If you go to the official documentation page for installation, you will be provided with elaborate installation guide for multiple OS platforms. Then why this post?

The latest version of TensorFlow with GPU support (version 1.8 at the time this post is published) is built against CUDA 9.0. However, NVIDIA has released CUDA 9.1 and there is possibility of newer version release in the near future. Given that TensorFlow is lagging behind the CUDA GA version, the publicly released TensorFlow bundle cannot immediately work on the system having only the latest CUDA version installed. A remedy for this is by installing from source, which can be non-trivial especially for those who are not so familiar with the source build mechanism.

The final system setup after completing the installation steps explained in the posts will be as follows.

ItemValue
OSUbuntu 16.04
NVIDIA driver version390.48
CUDA version9.1
cuDNN version7.1.3
NCCL version2.1.15
Python version2.7.12
Python install methodvirtualenv
TensorFlow version1.8.0

Note that the components will be updated in the future. This implies version upgrade for the components. It is expected that this post will still be valid even after version upgrade. Under the circumstances where this post becomes invalid, the content will be updated or another post will be written. Yet, this would be realized with sufficient comments or feedback regarding existing content. Continue reading

How to Install NVIDIA Collective Communications Library (NCCL) 2 for TensorFlow on Ubuntu 16.04

NVIDIA Collective Communications Library (NCCL) is a library developed to provide parallel computation primitives on multi-GPU and multi-node environment. The idea is to enable GPUs to collectively work to complete certain computing task. This is especially helpful when the computation is complex. With multiple GPUs working together, the task will be completed in less time, rendering a more performing system. People with background or experience in distributed system, such as Hadoop, may immediately relate this concept with similar model applied in the traditional distributed system. Hadoop, for example, supports MapReduce programming model that splits a compute job into chunks that are spread into the slave nodes and collected back by the master to produce the final output. Continue reading

How to Properly Install NVIDIA Graphics Driver on Ubuntu 16.04

In the recent posts, we have been going through the installation of deep learning framework like Caffe2 and its dependencies, such as CUDA or cuDNN. In this post, we will go few steps back to the very basic prerequisite of setting up a GPU-powered deep learning system: display driver installation. We will specifically focus on NVIDIA display driver installation due to the pervasiveness and robustness of NVIDIA GPUs as deep learning infrastructure.

Key Terminologies

Before proceeding to the installation, let’s discuss some key terminologies related with the use of NVIDIA GPUs as the computing infrastructure in a deep learning system.

GPU: Graphical / Graphics Processing Unit. A unit of computation, in a form of a small chip on the graphics card, traditionally intended to perform rapid computation for image / graphics rendering and display purpose. A graphics card can contain one or more GPUs while one GPU can be built of hundreds or thousands of cores.

CUDA: A parallel programming model and the implementation as a computing platform developed by NVIDIA to perform computation on the GPUs. CUDA was designed to speed up computation by harnessing the power of the parallel computation utilizing hundreds or thousands of the GPU cores.

CUDA-enabled GPUs: NVIDIA GPUs that support CUDA programming model and implementation

CUDA compute capability: A number that refers to the general specifications and available features especially in terms of parallel computing methods of a CUDA-enabled GPU. The full list of the available features in each compute capability can be seen here.

Note on CUDA compute capability and deep learning:
It is important to note that if you plan to use an NVIDIA GPU for deep learning purpose, you need to make sure that the compute capability of the GPU is at least 3.0 (Kepler architecture). Continue reading

What Object Categories / Labels Are In COCO Dataset?

One important element of deep learning and machine learning at large is dataset. A good dataset will contribute to a model with good precision and recall. In the realm of object detection in images or motion pictures, there are some household names commonly used and referenced by researchers and practitioners. The names in the list include Pascal, ImageNet, SUN, and COCO. In this post, we will briefly discuss about COCO dataset, especially on its distinct feature and labeled objects.

tl;dr The COCO dataset labels from the original paper and the released versions in 2014 and 2017 can be viewed and downloaded from this repository. Continue reading

Resolving Error “ImportError: No module name named hypothesis”

When running the test script “relu_op_test.py” to verify Caffe2 installation, you may encounter this error “ImportError: No module name named hypothesis”. Let’s take a look at the content of the script to get some idea about the root cause.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

from caffe2.python import core
from hypothesis import given
import hypothesis.strategies as st
import caffe2.python.hypothesis_test_util as hu
import caffe2.python.mkl_test_util as mu
import numpy as np

import unittest

class TestRelu(hu.HypothesisTestCase):

    @given(X=hu.tensor(),
           engine=st.sampled_from(["", "CUDNN"]),
           **mu.gcs)
    def test_relu(self, X, gc, dc, engine):
        op = core.CreateOperator("Relu", ["X"], ["Y"], engine=engine)
        # go away from the origin point to avoid kink problems
        X += 0.02 * np.sign(X)
        X[X == 0.0] += 0.02
        self.assertDeviceChecks(dc, op, [X], [0])
        self.assertGradientChecks(gc, op, [X], 0, [0])


if __name__ == "__main__":
    unittest.main()

Continue reading