CUDA Installation

TensorFlow uses the Nvidia library CUDA and the machine learning patch cuDNN to run certain computation-heavy operations on GPUs. Thus, in order to launch a Poseidon job using your GPU you must first install CUDA and cuDNN.

CUDA

Refer to this link to download and install the CUDA toolkit. Download the version 8.0 deb/rpm local package and then follow the installation instructions listed underneath the download link. The default installation will place the toolkit into /usr/local/cuda-8.0 and will create a symbolic folder at /usr/local/cuda.

cuDNN

Download cuDNN v5.1 from here. Choose the appropriate library download for your operating system. The following bash instructions will uncompress and copy the cuDNN files into the toolkit directory. Assuming the toolkit is installed in /usr/local/cuda, run the following commands (edited to reflect the cuDNN version you downloaded):

tar -xvf cudnn-8.0-linux-x64-v5.1-ga.tgz
sudo cp -P cuda/include/cudnn.h /usr/local/cuda/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Finally, run the following command to refresh the shared object cache:

sudo ldconfig

Check NVIDIA Drivers

Sometimes GPU device drivers are incorrectly configured. To make sure your GPU drivers are installed properly, run nvidia-smi on the terminal. A box should show up with the installed GPUs and some stats such as temperature, etc. If this fails, you will need to update to the latest drivers.

If your operating system is CentOS 7.2, you can refer to the Troubleshooting section of the tutorial for advice.