问题排查
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1 #154
详见这里。
1 2 3 4 5 6
| sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-ce-rootless-extras sudo rm -rf /var/lib/docker sudo rm -rf /var/lib/containerd for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin sudo systemctl restart docker
|
no compatible GPUs were discovered / Failed to initialize NVML: Unknown Error
Ollama docker 容器找不到 GPU。
如果使用 docker compose,配置如下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| services: ollama: image: ollama/ollama:latest restart: always hostname: ollama runtime: nvidia user: root ports: - '11434:11434' volumes: - /data/docker/llm/ollama:/root/.ollama networks: - default deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu]
|
解决。宿主机编辑 /etc/nvidia-container-runtime/config.toml
。
1
| no-cgroups = false # 修改, true -> false
|
重启 docker。
1
| sudo systemctl restart docker
|