头疼,装了一周的环境,查了很多资料,GPT和Claude都问冒烟了,还是没弄好。
目前主要的参考教程是这个“ 2023最新WSL搭建深度学习平台教程(适用于Docker-gpu、tensorflow-gpu、pytorch-gpu) - 知乎 (zhihu.com)”
我的目标是:
我想在windows server服务器上,装WSL系统,然后在docker中运行tensorflow2的代码,这要求docker和WSL都能识别到GPU并利用GPU加速
目前卡在运行docker容器验证是否能识别GPU时,
docker run --rm --gpus all nvidia/cuda:12.3.2-cudnn9-devel-ubuntu20.04 nvidia-smi
结果是
"This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
/opt/nvidia/nvidia_entrypoint.sh: line 67: /usr/bin/nvidia-smi: Permission denied
/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: /usr/bin/nvidia-smi: cannot execute: Permission denied"
这应该意味着我的docker容器没法识别到我的GPU ,对吧? 我要怎么解决
我的情况是:
windows server 2022 21H2 OS Version 20348.1249 ;
我能在WINDOS终端上运行nvidia-smi 和 nvcc --version命令
nvidia-smi
Mon Apr 1 14:19:54 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.86 Driver Version: 551.86 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3080 Ti WDDM | 00000000:01:00.0 On | N/A |
| 0% 36C P8 15W / 350W | 175MiB / 12288MiB | 3% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA GeForce RTX 2080 Ti WDDM | 00000000:03:00.0 Off | N/A |
| 28% 40C P8 18W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1380 C+G C:\Windows\System32\LogonUI.exe N/A |
| 0 N/A N/A 1388 C+G C:\Windows\System32\dwm.exe N/A |
| 0 N/A N/A 9048 C+G C:\Windows\System32\WUDFHost.exe N/A |
| 0 N/A N/A 9128 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 9136 C+G C:\Windows\System32\dwm.exe N/A |
| 0 N/A N/A 10308 C+G …Search_cw5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 10816 C+G …2txyewy\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 11612 C+G C:\Program Files\ToDesk\ToDesk.exe N/A |
| 0 N/A N/A 12864 C+G …CBS_cw5n1h2txyewy\TextInputHost.exe N/A |
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:30:42_Pacific_Standard_Time_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0
我的WSL子系统为
WSL2 Ubuntu 20.04
内核版本 :5.10.16
最初,我在WSL终端上是没法识别nvidia-smi命令的,我是通过下面2条命令解决了
cp /usr/lib/wsl/lib/nvidia-smi /usr/bin/nvidia-smi
chmod ogu+x /usr/bin/nvidia-smi
随后我就能在WSL上运行nvidia-smi命令了,如下
====================================
nvidia-smi
Mon Apr 1 14:38:58 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.65 Driver Version: 551.86 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3080 Ti On | 00000000:01:00.0 Off | N/A |
| 0% 33C P8 12W / 350W | 298MiB / 12288MiB | 2% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA GeForce RTX 2080 Ti On | 00000000:03:00.0 Off | N/A |
| 22% 33C P8 17W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+
我在WSL终端中没法识别nvcc 命令.
我的docker是装在 wsl里的(为节省资源).
docker version
docker version
Client: Docker Engine - Community
Version: 26.0.0
API version: 1.45
Go version: go1.21.8
Git commit: 2ae903e
Built: Wed Mar 20 15:17:51 2024
OS/Arch: linux/amd64
Context: default