Installing Nvidia GPU Driver And Nvidia Docker in Ubuntu EC2 instance

Install Docker

Follow the instructions in Install Docker on Ubuntu to install docker-engine.

Install GPU driver

Follow the instructions in Deploy on Amazon EC2 in nvidia-docker wiki to install the GPU driver and nvidia-docker. Here I create a GPU EC2 instance using Ubuntu 16.04 LTS AMI by AWS web console intead of docker-machine (Note that different instance type (P2, G2, CG1) has different GPU hardware, please refere to the document and Nvidia official website to find out the GPU product of your instance and its corresponding driver.)

# ssh to the instance 
$ ssh -i {your key}.pem [email protected]{instance ip or domain name}

# Install official NVIDIA driver package
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
$ sudo sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
$ sudo apt-get update && sudo apt-get install -y --no-install-recommends cuda-drivers

# Install nvidia-docker and nvidia-docker-plugin
$ wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb
$ sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb

Next, verify the driver is installed.

$ nvidia-smi

# Thu Dec 29 09:20:51 2016
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
# |-------------------------------+----------------------+----------------------+
# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
# |===============================+======================+======================|
# |   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
# | N/A   31C    P8    17W / 125W |      0MiB /  4036MiB |      0%      Default |
# +-------------------------------+----------------------+----------------------+
# 
# +-----------------------------------------------------------------------------+
# | Processes:                                                       GPU Memory |
# |  GPU       PID  Type  Process name                               Usage      |
# |=============================================================================|
# |  No running processes found                                                 |
# +-----------------------------------------------------------------------------+

And the nvidia-docker.

$ sudo nvidia-docker

# Usage: docker [OPTIONS] COMMAND [arg...]
#        docker [ --help | -v | --version ]
# 
# A self-sufficient runtime for containers.
# 
# Options:
# 
#   --config=~/.docker              Location of client config files
#   -D, --debug                     Enable debug mode
#   -H, --host=[]                   Daemon socket(s) to connect to
#   -h, --help                      Print usage
#   -l, --log-level=info            Set the logging level
#   --tls                           Use TLS; implied by --tlsverify
#   --tlscacert=~/.docker/ca.pem    Trust certs signed only by this CA
#   --tlscert=~/.docker/cert.pem    Path to TLS certificate file
#   --tlskey=~/.docker/key.pem      Path to TLS key file
#   --tlsverify                     Use TLS and verify the remote
#   -v, --version                   Print version information and quit

Troubleshooting

Error: Unable to load the kernel module ‘nvidia.ko’

If you see the error message below.

Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or
improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver
such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics
device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log'
for more information.

execute following commands to solve the issue.

$ sudo apt-get install linux-image-extra-virtual
$ reboot

Error: Unable to find the kernel source tree

If the installation shows Uable to find the kernel source tree error, then execute the command

$ sudo apt-get install linux-headers-`uname -r`

Disable nouveau

Install Nvidia driver instead nouveau shows how to disable nouveau.