CUDA is a computing platform for graphical processing units (GPUs) developed by NVIDIA, widely used to accelerate machine-learning. Existing frameworks, such as Tensorflow or PyTorch, utilize it under the hood not asking user for any specific coding. However, it is still necessary to set its dependencies, particularly the compiler nvcc, properly to benefit of acceleration. In this short note, I share an interesting use-case that occurred when prototyping on Kaggle Docker image and NVIDIA Docker image.
Compatibility of CUDA tools and targeted libraries
It turns out that one of Kaggle images was released with incompatible CUDA dependencies: compilation tools were not aligned with PyTorch, as revealed when attempting to compile detectron2, an object detection library by Facebook.
(base) maciej.skorski@shared-notebooks:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/kaggle-gpu-images/python latest 87983e20c290 4 weeks ago 48.1GB
nvidia/cuda 11.6.2-devel-ubuntu20.04 e1687ea9fbf2 7 weeks ago 5.75GB
gcr.io/kaggle-gpu-images/python <none> 2b12fe42f372 2 months ago 50.2GB
(base) maciej.skorski@shared-notebooks:~$ docker run -d \
-it \
--name kaggle-test \
--runtime=nvidia \
--mount type=bind,source=/home/maciej.skorski,target=/home \
2b12fe42f372
(base) maciej.skorski@shared-notebooks:~$ docker exec -it kaggle-test python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
...
RuntimeError:
The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.8). Please make sure to use the same CUDA versions.In order to compile detectron2, it was necessary to align the CUDA toolkit version. Rather than trying to install it manually – which is known to be an error-prone task – a working solution was to change the Kaggle image. It turns out that the gap was bridged in a subsequent release:
(base) maciej.skorski@shared-notebooks:~$ docker run 87983e20c290 nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
(base) maciej.skorski@shared-notebooks:~$ docker run 2b12fe42f372 nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0And indeed, the Facebook library installed smoothly under the new image 👍
(base) maciej.skorski@shared-notebooks:~$ docker run -d \
-it \
--name kaggle-test \
--runtime=nvidia \
--mount type=bind,source=/home/maciej.skorski,target=/home \
87983e20c290
bf60d0e3f3bdb42c5c08b24598bb3502b96ba2c461963d11b31c1fda85f9c26b
(base) maciej.skorski@shared-notebooks:~$ docker exec -it kaggle-test python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
Collecting git+https://github.com/facebookresearch/detectron2.git
...
Successfully built detectron2 fvcore antlr4-python3-runtime pycocotoolsCompatibility of CUDA tools and GPU drivers
The compiler version should not be significantly newer than that that of the driver, as presented by nvidia-smi:
(base) maciej.skorski@shared-notebooks:~$ nvidia-smi
Thu Aug 10 14:56:44 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 69C P0 30W / 70W | 12262MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 8635 C ...detectron_venv/bin/python 12260MiB |
+-----------------------------------------------------------------------------+Consider the simple CUDA script querying the GPU device properties:
// query_GPU.cu
#include <stdio.h>
int main() {
int nDevices;
cudaGetDeviceCount(&nDevices);
for (int i = 0; i < nDevices; i++) {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, i);
printf("Device Number: %d\n", i);
printf(" Name: %s\n", prop.name);
printf(" Integrated: %d\n", prop.integrated);
printf(" Compute capability: %d.%d\n", prop.major, prop.minor );
printf(" Peak Memory Bandwidth (GB/s): %f\n\n",
2.0*prop.memoryClockRate*(prop.memoryBusWidth/8)/1.0e6);
printf( " Total global mem: %ld\n", prop.totalGlobalMem );
printf( " Multiprocessor count: %d\n", prop.multiProcessorCount );
}
}This code compiles and presents GPU properties only under the image equipped with the matching major compiler version (select the appropriate image here):
(base) maciej.skorski@shared-notebooks:~$ docker run -d \
-it \
--name nvidia-cuda \
--runtime=nvidia \
--mount type=bind,source=$(pwd),target=/home \
--privileged \
nvidia/cuda:11.6.2-devel-ubuntu20.04
docker exec -it nvidia-cuda sh -c "nvcc /home/query_GPU.cu -o /home/query_GPU && /home/query_GPU"
Device Number: 0
Name: Tesla T4
Integrated: 0
Compute capability: 7.5
Peak Memory Bandwidth (GB/s): 320.064000
Total global mem: 15634661376
Multiprocessor count: 40However, the container doesn’t even start with a mismatching major version:
(base) maciej.skorski@shared-notebooks:~$ docker run -d \
> -it \
> --name nvidia-cuda \
> --runtime=nvidia \
> --mount type=bind,source=$(pwd),target=/home \
> --privileged \
> nvidia/cuda:12.2.0-devel-ubuntu20.04
d14d07b8b04bc7e6e27ce8312452850946d98d82611cb24c3e662ceb27d708c5
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.2, please update your driver to a newer version, or use an earlier cuda container: unknown.
Fixing Reproducibility of Scientific Repos
As the last example, consider the recent cuZK project which implements some state-of-the-art cryptographic protocols on GPU. The original code was missing dependencies and compilation instructions, therefore I shared a working fork version.
To work with the code, let’s use the NVIDIA Docker image with the appropriate version, here I selected the tag 11.6.2-devel-ubuntu20.04. Checkout the code and start a container mounting the working directory with the GitHub code, like below:
docker run -d \
-it \
--name nvidia-cuda \
--runtime=nvidia \
--mount type=bind,source=$(pwd),target=/home \
--privileged \
nvidia/cuda:11.6.2-devel-ubuntu20.04To work with the code, we need few more dependencies within the container:
apt-get update
apt-get install -y git libgmp3-devAfter adjusting the headers in Makefile, the CUDA code can be compiled and run
root@7816e1643c2a:/home/cuZK/test# make
...
root@7816e1643c2a:/home/cuZK/test# ls
BLS377 MSMtestbn.cu Makefile core msmtesta testb testbn.cu
MSMtestbls.cu MSMtestmnt.cu libgmp.a msmtestb testbls.cu testmnt.cu
root@7816e1643c2a:/home/cuZK/test# ./msmtestb
Please enter the MSM scales (e.g. 20 represents 2^20) 