nvidia-docker2.0 is a simple package, which mainly allows docker to use NVIDIA Container runtime by modifying docker's configuration file "/etc/docker/daemon.json".
The operating environment of this article: Windows 10 system, Docker version 20.10.11, Dell G3 computer.
Introduction to NVidia Docker
NVIDIA began designing NVIDIA-Docker in 2016 to facilitate containers using NVIDIA GPUs. The first generation nvidia-docker1.0 implements the encapsulation of the docker client and mounts the necessary GPU devices and libraries into the container when the container is started. However, this design method is highly coupled with the docker runtime and lacks flexibility. The existing defects are as follows:
The design is highly coupled with docker and does not support other container runtimes. Such as: LXC, CRI-O and container runtimes that may be added in the future.
Cannot make better use of other tools in the docker ecosystem. Such as: docker compose.
GPU cannot be used as a resource of the scheduling system for flexible scheduling.
Improve GPU support during container runtime. For example: automatically obtain user-level NVIDIA Driver libraries, NVIDIA kernel modules, device ordering, etc.
Based on the shortcomings described above, NVIDIA began the design of the next generation container runtime: nvidia-docker2.0.
The implementation mechanism of nvidia-docker 2.0
First briefly introduce the direct relationship between nvidia-docker 2.0, containerd, nvidia-container-runtime, libnvidia-container and runc .
The relationship between them can be related through the following picture:
nvidia-docker 2.0
nvidia-docker2.0 is a simple package that mainly allows docker to use the NVIDIA Container runtime by modifying the docker configuration file /etc/docker/daemon.json.
nvidia-container-runtime
nvidia-container-runtime is the real core part. It adds a prestart based on the original docker container runtime runc. Hook, used to call the libnvidia-container library.
libnvidia-container
libnvidia-container provides a library and a simple CLI tool that can be used to make NVIDIA GPUs used by Linux containers.
Containerd
Containerd is mainly responsible for:
Manage the life cycle of the container (from container creation to destruction )
Pull/Push container images
Storage management (manage the storage of images and container data)
Call runc to run the container
Manage the network interface and network of the container
When containerd After receiving the request, make relevant preparations. You can choose to call runc yourself or create containerd-shim and then call runc. Runc creates the container based on the OCI file. The above is the basic process of ordinary container creation.
RunC
RunC is a lightweight tool. It is used to run containers. It is only used to do one thing and one thing. Do it well. We can think of it as a command line gadget that can run containers directly without going through the docker engine. In fact, runC is a product of standardization and it creates and runs containers according to OCI standards. The OCI (Open Container Initiative) organization aims to develop an open industrial standard around container formats and runtimes.
You can create a container directly using the RunC command line and provide simple interaction capabilities.
The functions of each component and the relationship between them have been introduced above. Next, let’s describe this picture in detail:
Create a container normally The process is as follows:
docker --> dockerd --> containerd--> containerd-shim -->runc --> container-process
The docker client sends the request to create a container to dockerd. When dockerd receives the request task, it sends the request to containerd. After checking and verifying, containerd starts containerd-shim or does it by itself. Start the container process.
Create a container that uses GPU
The process of creating a GPU container is as follows:
docker--> dockerd --> containerd --> containerd-shim--> nvidia-container-runtime --> nvidia-container-runtime-hook --> libnvidia-container --> runc -- > container-process
The basic process is similar to that of a container that does not use GPU, except that The default runtime of docker is replaced by NVIDIA's own nvidia-container-runtime.
In this way, when nvidia-container-runtime creates a container, it first executes the nvidia-container-runtime-hook hook to check whether the container needs to use the GPU (judged by the environment variable NVIDIA_VISIBLE_DEVICES). If necessary, call libnvidia-container to expose the GPU to the container. Otherwise, the default runc logic is used.
Speaking of which, the general mechanism of nvidia-docker2.0 is basically clear. However, the projects involved in nvidia-container-runtime, libnvidia-container, containerd, and runc will not be introduced one by one in this article. If you are interested, you can explore and learn on your own. The addresses of these projects have been linked in the article.
Recommended learning: "Docker Video Tutorial"
The above is the detailed content of what is nvidia docker2. For more information, please follow other related articles on the PHP Chinese website!