NVIDIA Collective Communications Library (NCCL) is a library developed to provide parallel computation primitives on multi-GPU and multi-node environment. The idea is to enable GPUs to collectively work to complete certain computing task. This is especially helpful when the computation is complex. With multiple GPUs working together, the task will be completed in less time, rendering a more performing system. People with background or experience in distributed system, such as Hadoop, may immediately relate this concept with similar model applied in the traditional distributed system. Hadoop, for example, supports MapReduce programming model that splits a compute job into chunks that are spread into the slave nodes and collected back by the master to produce the final output. Continue reading
In the recent posts, we have been going through the installation of deep learning framework like Caffe2 and its dependencies, such as CUDA or cuDNN. In this post, we will go few steps back to the very basic prerequisite of setting up a GPU-powered deep learning system: display driver installation. We will specifically focus on NVIDIA display driver installation due to the pervasiveness and robustness of NVIDIA GPUs as deep learning infrastructure.
Before proceeding to the installation, let’s discuss some key terminologies related with the use of NVIDIA GPUs as the computing infrastructure in a deep learning system.
GPU: Graphical / Graphics Processing Unit. A unit of computation, in a form of a small chip on the graphics card, traditionally intended to perform rapid computation for image / graphics rendering and display purpose. A graphics card can contain one or more GPUs while one GPU can be built of hundreds or thousands of cores.
CUDA: A parallel programming model and the implementation as a computing platform developed by NVIDIA to perform computation on the GPUs. CUDA was designed to speed up computation by harnessing the power of the parallel computation utilizing hundreds or thousands of the GPU cores.
CUDA-enabled GPUs: NVIDIA GPUs that support CUDA programming model and implementation
CUDA compute capability: A number that refers to the general specifications and available features especially in terms of parallel computing methods of a CUDA-enabled GPU. The full list of the available features in each compute capability can be seen here.
Note on CUDA compute capability and deep learning:
It is important to note that if you plan to use an NVIDIA GPU for deep learning purpose, you need to make sure that the compute capability of the GPU is at least 3.0 (Kepler architecture). Continue reading
One important element of deep learning and machine learning at large is dataset. A good dataset will contribute to a model with good precision and recall. In the realm of object detection in images or motion pictures, there are some household names commonly used and referenced by researchers and practitioners. The names in the list include Pascal, ImageNet, SUN, and COCO. In this post, we will briefly discuss about COCO dataset, especially on its distinct feature and labeled objects.
When running the test script “relu_op_test.py” to verify Caffe2 installation, you may encounter this error “ImportError: No module name named hypothesis”. Let’s take a look at the content of the script to get some idea about the root cause.
from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals from caffe2.python import core from hypothesis import given import hypothesis.strategies as st import caffe2.python.hypothesis_test_util as hu import caffe2.python.mkl_test_util as mu import numpy as np import unittest class TestRelu(hu.HypothesisTestCase): @given(X=hu.tensor(), engine=st.sampled_from(["", "CUDNN"]), **mu.gcs) def test_relu(self, X, gc, dc, engine): op = core.CreateOperator("Relu", ["X"], ["Y"], engine=engine) # go away from the origin point to avoid kink problems X += 0.02 * np.sign(X) X[X == 0.0] += 0.02 self.assertDeviceChecks(dc, op, [X], ) self.assertGradientChecks(gc, op, [X], 0, ) if __name__ == "__main__": unittest.main()
In the previous posts, we have gone through the installation processes for deep learning infrastructure, such as Docker, nvidia-docker, CUDA Toolkit and cuDNN. With the infrastructure setup, we may conveniently start delving into deep learning: building, training, and validating deep neural network models, and applying the models into a certain problem domain. Translating deep learning primitives into low level bytecode execution can be an enormous task, especially for practitioners without interests in the deep learning calculus. Fortunately, there are several deep learning frameworks that provide the high level programming interface to assist in performing deep learning tasks.
In this post, we will go through the installation of Caffe2, one of the major deep learning frameworks. Caffe2 is adopted from Caffe, a deep learning framework developed by the Barkeley Vision and Learning Center (BVLC) of UC Berkeley. Caffe2 was started with the aim to improve Caffe especially to better support large-scale distributed model training, mobile deployment, reduced precision computation, new hardware, and flexibility of porting to multiple platforms. Continue reading