How to Build and Install The Latest TensorFlow without CUDA GPU and with Optimized CPU Performance on Ubuntu

In this post, we are about to accomplish something less common: building and installing TensorFlow with CPU support-only on Ubuntu server / desktop / laptop. We are targeting machines with older CPU, as for example those without Advanced Vector Extensions (AVX) support. This kind of setup can be a choice when we are not using TensorFlow to build a new AI model but instead only for obtaining the prediction (inference) served by a trained AI model. Compared with model training, the model inference is less computational intensive. Hence, instead of performing the computation using GPU acceleration, the task can be simply handled by CPU.

tl;dr The WHL file from TensorFlow CPU build is available for download from this Github repository.

Since we will build TensorFlow with CPU support only, the physical server will not need to be equipped with additional graphics card(s) to be mounted on the PCI slot(s). This is different with the case when we build TensorFlow with GPU support. For such case, we need to have at least one external (non built-in) graphics card that supports CUDA. Naturally, running TensorFlow with CPU pertains to be an economical approach to deep learning. Then how about the performance? Some benchmark results have shown that GPU performs better than CPU when performing deep learning tasks, especially for model training. However, this does not mean that TensorFlow CPU cannot be a feasible option. With proper CPU optimization, TensorFlow can exhibit improved performance that is comparable to its GPU counterpart. When cost is a more serious issue, let’s say we can only do the model training and inference in the cloud, leaning towards TensorFlow CPU can be a decision that also makes more sense from financial standpoint.

Optimizing the TensorFlow CPU Build

We optimize TensorFlow CPU build by turning on all the computation optimization opportunities provided by the CPU. We are interested in the flags information provided through /proc/cpuinfo. We obtain the flags information with this command:

$ more /proc/cpuinfo | grep flags

A sample output from the command invocation is shown below:

...
flags   : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl pni cx16 hypervisor lahf_lm
...

Looking at the output, we may be inundated with cryptic text that looks meaningless. Indeed, there is certain articulation behind each flag. A visit to Linux kernel source helps unravel the meaning for each flag, which corresponds to a CPU feature. For more human-readable explanation about the flags, we can refer to the following post-style article on StackExchange.

We will then customize the TensorFlow source build to take advantage of the availability of some CPU features that contribute to a speedier execution of TensorFlow code. The list of CPU features is provided as follows.

NoFlagCPU FeatureAdditional Info
1ssse3Supplemental Streaming SIMD Extensions 3 (SSSE-3) instruction setLink
2sse4_1Streaming SIMD Extensions 4.1 (SSE-4.1) instruction setLink
3sse4_2Streaming SIDM Extensions 4.2 (SSE-4.2) instruction setLink
4fmaFused multiply-add (FMA) instruction setLink
5cx16CMPXCHG16B instruction (double-width compare-and-swap)Link
6popcntPopulation count instruction (count number of bits set to 1)Link
7avxAdvanced Vector ExtensionsLink
8avx2Advanced Vector Extension 2Link

From the previous sample of /proc/cpuinfo output, we can see that the CPU does not support AVX and AVX2. The CPU also does not support SSSE-3, SSE-4.2, SSE-4.2, FMA, and POPCNT. Apparently, there is not much performance optimization that can be done for the build. However, in a different machine with more modern CPU, more CPU features shall be available, relative to the sample CPU. This means that we have more opportunity to optimize the TensorFlow performance.

Populating System Information

Prior to the building the source, we need to first populate the current system information. The build process described in this post was tested on Ubuntu 16.04 LTS with Python 2.7. Your mileage may vary if you perform the build on a machine with different Ubuntu or Python version. Let’s proceed with obtaining the necessary system information as follows.

– Ubuntu version
Command:
$ lsb_release -a| grep "Release" | awk '{print $2}'

– Python version
Command:
$ python --version 2>&1 | awk '{print $2}'

– GCC version
Command:
$ gcc --version | grep "gcc" | awk '{print $4}'

– TensorFlow optimization flags

The optimization flags will be supplied when configuring the TensorFlow source build. The following command is used to populate the optimization flags:

$ grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" | "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done; MODOPT=${OPT//_/\.}; echo "$MODOPT"; }

After invoking all commands above, we then put the system information gathered in the following table. Put the output of each command invocation in the “Current” column.

ItemExpectedCurrent
Ubuntu version16.04...
Python version>= 2.7.12...
GCC version>= 5.4.0...
TF optimization flags*-march=native -mcx16 ...

* Specific to TF optimization flags, the value in the “Expected” column is only an example taken from the sample CPU instead of an expected value

Pre-Installation

We will install TensorFlow in an isolated environment. To do so, we need to first create the Python virtual environment using virtualenv. Additionaly, we will also install Bazel, that will be used to build TensorFlow source code. The steps are explained as follows.

Step 1: Set locale to UTF-8

$ export LC_ALL="en_US.UTF-8"
$ export LC_CTYPE="en_US.UTF-8"
$ sudo dpkg-reconfigure locales

Step 2: Install pip and virtualenv for Python 2 and TensorFlow
$ sudo apt-get -y install python-pip python-dev python-virtualenv python-numpy python-wheel

Step 3: Create virtualenv environment for Python 2 (Virtualenv location: ~/virtualenv/tensorflow)

$ mkdir -p ~/virtualenv/tensorflow
$ virtualenv --system-site-packages ~/virtualenv/tensorflow

Step 4: Activate the virtualenv environment
$ source ~/virtualenv/tensorflow/bin/activate

Verify the prompt is changed to:
(tensorflow) $

Step 5: (virtualenv) Ensure pip >= 8.1 is installed and upgrade to the latest version
– Get currently installed pip version
(tensorflow) $ pip --version | awk '{print $2}'

– Upgrade pip if necessary
(tensorflow) $ pip install --upgrade pip

Step 6: (virtualenv) Deactivate the virtualenv
(tensorflow) $ deactivate

Step 7: Install bazel to build TensorFlow

– Install Java JDK 8 (Open JDK) if there is no JDK installed
$ sudo apt-get install openjdk-8-jdk

– Add bazel private repository into source repository list

$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

– Install the latest version of bezel
$ sudo apt-get update && sudo apt-get -y install bazel

The Installation

We are now ready to build and install TensorFlow. For the installation steps, we will proceed as follows.

Step 1: Create directory for the source
$ sudo mkdir -p ~/installers/tensorflow/tf-cpu

Step 2: Download the latest stable release of TensorFlow (release 1.10.0 at the time this post is written) into the source directory

$ cd ~/installers/tensorflow/tf-cpu
$ wget https://github.com/tensorflow/tensorflow/archive/v1.10.0.zip

Step 3: Unzip the installer
$ unzip v1.10.0.zip

Step 4: Go to the inflated TensorFlow source directory
$ cd tensorflow-1.10.0

Step 5: Activate the virtualenv
$ source ~/virtualenv/tensorflow/bin/activate

Step 6: (virtualenv) Install additional Python modules required to build TensorFlow (enum and mock)
(tensorflow) $ pip install --upgrade enum34 mock

Step 7: (virtualenv) Configure the build file. We will configure TensorFlow without CUDA and with CPU optimization.

(tensorflow) $ ./configure

Sample configuration for reference:

Please specify the location of python. [Default is /home/MYUSER/virtualenv/tensorflow/bin/python]: /home/MYUSER/virtualenv/tensorflow/bin/python
Please input the desired Python library path to use.  Default is [/home/MYUSER/virtualenv/tensorflow/lib/python2.7/site-packages]
/home/MYUSER/virtualenv/tensorflow/lib/python2.7/site-packages
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: Y
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: Y
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: Y
Do you wish to build TensorFlow with Amazon AWS Platform support? [Y/n]: Y
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: Y
Do you wish to build TensorFlow with XLA JIT support? [y/N]: N
Do you wish to build TensorFlow with GDR support? [y/N]: N
Do you wish to build TensorFlow with VERBS support? [y/N]: N
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: N
Do you wish to download a fresh release of clang? (Experimental) [y/N]: N
Do you wish to build TensorFlow with MPI support? [y/N]: N
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -march=native -mcx16
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N

Step 8: (virtualenv) Build TensorFlow source

– For GCC >= 5.x
(tensorflow) $ bazel build --config=opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow/tools/pip_package:build_pip_package

– For GCC 4.x:
(tensorflow) $ bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package

Step 9: (virtualenv) Create the .whl file from the bazel build

(tensorflow) $ mkdir tensorflow-pkg
(tensorflow) $ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow-pkg

– Install the .whl file
(tensorflow) $ cd tensorflow-pkg && ls -al

After knowing the .whl file name:
(tensorflow) $ pip install tensorflow-1.10.0-cp27-cp27mu-linux_x86_64.whl

Step 10: (virtualenv) Verify the installation

– Check the installed TensorFlow version in the virtualenv
(tensorflow) $ python -c 'import tensorflow as tf; print(tf.__version__)'

– Run TensorFlow Hello World
(tensorflow) $ python -c 'import tensorflow as tf; hello = tf.constant("Hello, TensorFlow!"); sess = tf.Session(); print(sess.run(hello));'

Output of the last command:
Hello, TensorFlow!

Concluding Remark

We have successfully installed the latest TensorFlow with CPU support-only. If you are interested in running TensorFlow without CUDA GPU, you can start building from source as described in this post. I have also created a Github repository that hosts the WHL file created from the build. You can also check it out.

Future work will include performance benchmark between TensorFlow CPU and GPU. If this is something that you want to see in the future post, please write in the comment section.

3 thoughts on “How to Build and Install The Latest TensorFlow without CUDA GPU and with Optimized CPU Performance on Ubuntu

  1. Pingback: How to Resolve Error “Illegal instruction (core dumped)” when Running “import tensorflow” in a Python Program | Amikelive | Technology Blog

  2. Jonti

    Thanks so much for the post. My computer has some instruction extentions but not AVX so the instructions given by the main website failed with the dreded “illegal instruction”. Using an old v 1.5 version of tensorflow didn’t work and instead I got all sorts of problems with running tensorflow examples. I can’t beleive how dificult it is to compile tensorflow and the problems with python version etc. However copying and pasting your post it compiled and installed without any issues. It took a few hours to compile and lots of warnings as it was doing it but no fatal errors. I’m on kubuntu 18.04 LTS.

    18.04
    2.7.15rc1
    7.3.0
    -march=native -mssse3 -mcx16 -msse4.1 -msse4.2 -mpopcnt

    Cheers
    Jonti

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.