[toc]
安装
安装运行时
docker
1 2 3 4 5 6 7 8 9 10 11 12 # ubuntu # sudo apt-get install -y ca-certificates curl gnupg lsb-release # sudo mkdir -p /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg # echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ $ (lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # # install docker engine sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin
对于新版的 k8s,使用 docker 还需要安装 cri-docker ,从这里 下载二进制程序,把下面的内容保存为两个文件 cri-docker.service
和 cri-docker.socket
。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 # cri-docker.service [Unit] Description=CRI Interface for Docker Application Container Engine Documentation=https://docs.mirantis.com After=network-online.target firewalld.service docker.service Wants=network-online.target Requires=cri-docker.socket [Service] Type=notify ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin= ExecReload=/bin/kill -s HUP $MAINPID TimeoutSec=0 RestartSec=2 Restart=always # Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229. # Both the old, and new location are accepted by systemd 229 and up, so using the old location # to make them work for either version of systemd. StartLimitBurst=3 # Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230. # Both the old, and new name are accepted by systemd 230 and up, so using the old name to make # this option work for either version of systemd. StartLimitInterval=60s # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting.LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity # Comment TasksMax if your systemd version does not support it. # Only systemd 226 and above support this option. TasksMax=infinity Delegate=yes KillMode=process [Install] WantedBy=multi-user.target
1 2 3 4 5 6 7 8 9 10 11 12 13 # cri-docker.socket [Unit] Description=CRI Docker Socket for the API PartOf=cri-docker.service [Socket] ListenStream=%t/cri-dockerd.sock SocketMode=0660 SocketUser=root SocketGroup=docker [Install] WantedBy=sockets.target
移动文件
1 2 3 4 5 # 移动二进制 sudo mv cri-docker /usr/bin # 移动 systemd 配置 sudo mv cri-docker.service /etc/systemd/system/ sudo mv cri-docker.socket /etc/systemd/system/
systemd 启动服务
1 2 3 4 5 6 sudo systemctl daemon-reload sudo systemctl enable cri-docker.service sudo systemctl enable --now cri-docker.socket # 启动服务 sudo service cri-docker start
安装 kub*
安装 kubeadm、kubelet、kubectl,国内源参考 阿里云镜像 或者 清华开源镜像站 。
1 2 3 4 5 6 7 8 sudo apt-get update && apt-get install -y apt-transport-https curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add - # root 用户运行 cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main EOF sudo apt-get update sudo apt-get install -y kubelet kubeadm kubectl
拉取镜像
1 2 3 $ sudo kubeadm config images pull \ --image-repository registry.aliyuncs.com/google_containers \ --cri-socket unix:///var/run/cri-dockerd.sock
初始化
1 2 3 4 5 6 # 10.244.0.0/16 是 chennel 扩展的配置 # --apiserver-advertise-address 是 master 节点的 ip,如果是单机,即为该机器 ip 地址 $ kubeadm init --image-repository registry.aliyuncs.com/google_containers \ --pod-network-cidr 10.244.0.0/16 \ --control-plane-endpoint 10.1.0.145 \ --cri-socket unix:///var/run/cri-dockerd.sock
root 用户使用
需要配置 KUBECONFIG=/etc/kubernetes/admin.conf
。
1 2 3 4 root@k8s-master-1:/home/ubuntu# export KUBECONFIG=/etc/kubernetes/admin.conf root@k8s-master-1:/home/ubuntu# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-1 Ready control-plane 5m55s v1.24.1
配置 none root 用户使用
1 2 3 mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
验证
1 2 3 ubuntu@k8s-master-1:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-1 Ready control-plane 4m52s v1.24.1
回滚操作
1 2 3 4 5 6 kubeadm reset [glags] preflight Run reset pre-flight checks update-cluster-status Remove this node from the ClusterStatus object. remove-etcd-member Remove a local etcd member. cleanup-node Run cleanup node.
配置网络
这一步很关键,如不能正确配置集群网络,pod 间可能无法通讯,kubectl proxy 无法正常访问(通常表现为 pod 运行正常,但提示连接拒绝)。以 flannel 为例,首先安装 flannel。
1 $ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
使用工具 mk-docker-opts.sh 生成网络信息,这个工具也可以使用 sudo find / -name 'mk-docker-opts.sh'
在 docker 容器中找到。
1 $ mk-docker-opts.sh -d /run/docker_opts.env -c
修改 docker service。
1 2 3 4 5 6 7 8 9 10 # root 用户执行 $ vim /lib/systemd/system/docker.service # 添加这一行 EnvironmentFile=/run/docker_opts.env # 修改这一行 ExecStart=/usr/bin/dockerd $DOCKER_OPTS -H fd:// ... # 重启 docker $ systemctl daemon-reload $ systemctl restart docker
添加节点
对添加的节点同样需要配置网络,且不可复用其他节点的 docker_opts.env 文件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # 在 master 节点 $ kubeadm token create --print-join-command ... # 在待加入的节点, 在上面生成的命令后面指定 cri socket $ kubeadm join 10.1.0.145:6443 --token nxxcv7.gge00x97wiphualw --discovery-token-ca-cert-hash sha256:cfb324b2ee7ee548b08e38d2e6d60905e392553bf6715504e87888183a1238fd u --cri-socket unix:///var/run/cri-dockerd.sock # 为新节点指定 label $ kubectl label node k8s-worker-1 node-role.kubernetes.io/worker=worker # 验证 ubuntu@k8s-master-1:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-1 Ready control-plane 9h v1.24.1 k8s-worker-1 Ready worker 3m17s v1.24.1
安装 dashboard
1 2 3 4 5 # 安装 dashboard $ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.6.0/aio/deploy/recommended.yaml # 启动代理 $ kubectl proxy --address=0.0.0.0
创建服务账号
保存到 account.yaml
。
1 2 3 4 5 apiVersion: v1 kind: ServiceAccount metadata: name: admin-user namespace: kubernetes-dashboard
然后运行。
1 kubectl apply -f account.yaml
设置权限
保存到 permission.yaml
。
1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kubernetes-dashboard
然后运行。
1 kubectl apply -f permission.yaml
也可以放到一个文件里面。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 apiVersion: v1 kind: ServiceAccount metadata: name: admin-user namespace: kubernetes-dashboard --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kubernetes-dashboard
获取 token
1 kubectl -n kubernetes-dashboard create token admin-user
https 证书
k8s dashboard 默认会自己生成证书,可以跳过。对于 https 证书,可以自己生成证书,可以用证书认证服务商。对于自己生成证书,可以手动生成,也可以通过添加 --auto-generate-certificates
来自动生成,更多参数参考这里 。
1 2 3 4 5 6 7 8 9 10 # 自认证证书 # 生成 dashboard.pass.key $ openssl genrsa -des3 -passout pass:over4chars -out dashboard.pass.key 2048 # 生成 dashboard.key $ openssl rsa -passin pass:over4chars -in dashboard.pass.key -out dashboard.key $ rm dashboard.pass.key # 生成 dashboard.csr $ openssl req -new -key dashboard.key -out dashboard.csr # 生成 dashboard.crt $ openssl x509 -req -sha256 -days 365 -in dashboard.csr -signkey dashboard.key -out dashboard.crt
删除 dashboard
1 $ kubectl delete -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.6.0/aio/deploy/recommended.yaml
网络扩展
flannel
安装
1 $ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
常用命令
查看节点信息
1 2 3 $ kubectl get nodes NAME STATUS ROLES AGE VERSION localhost.localdomain NotReady control-plane,master 2m13s v1.22.1
查看 pod 信息
1 2 3 4 5 6 7 8 9 $ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-78fcd69978-9dk4n 0/1 Pending 0 2m52s kube-system coredns-78fcd69978-w52zc 0/1 Pending 0 2m52s kube-system etcd-localhost.localdomain 1/1 Running 0 3m6s kube-system kube-apiserver-localhost.localdomain 1/1 Running 0 3m6s kube-system kube-controller-manager-localhost.localdomain 1/1 Running 0 3m8s kube-system kube-proxy-4w84n 1/1 Running 0 2m52s kube-system kube-scheduler-localhost.localdomain 1/1 Running 0 3m6s
pod 管理
1 2 3 4 5 # 查看 pod 信息 $ kubectl get pods -A # 删除 pod $ kubectl delete pod kubernetes-dashboard --namespace=kubernetes-dashboard
参考
问题
配置
k8s 配置在这里 /etc/kubernetes/kubelet.conf
。
找不到节点
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ubuntu@k8s-master-1:~$ systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since Wed 2022-06-08 16:18:53 UTC; 2s ago Docs: https://kubernetes.io/docs/home/ Main PID: 69055 (kubelet) Tasks: 29 (limit: 38495) Memory: 39.5M CGroup: /system.slice/kubelet.service └─69055 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/co> Jun 08 16:18:55 k8s-master-1 kubelet[69055]: E0608 16:18:55.158403 69055 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.1.0> Jun 08 16:18:55 k8s-master-1 kubelet[69055]: E0608 16:18:55.193156 69055 kubelet.go:2419] "Error getting node" err="node \"k8s-master-1\" not found" ...
这里 有说,可能是 api server 不能连接,由于 cri 用了 docker,随检查 docker 状态。
node not found is a misleading error by the kubelet. at this point it means that the kubelet was unable to register a Node object with the api server.
https://nalshsvrk8ss01.railcarmgt.com:6443/healthz?timeout=10s in 0 milliseconds
this means that the api server cannot be connected. could be caused by a number of things (including firewall).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 root@k8s-master-1:~# service docker status ● docker.service - Docker Application Container Engine Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2022-06-08 16:30:32 UTC; 34min ago TriggeredBy: ● docker.socket Docs: https://docs.docker.com Main PID: 1039 (dockerd) Tasks: 23 Memory: 134.1M CGroup: /system.slice/docker.service └─1039 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock Jun 08 17:04:52 k8s-master-1 dockerd[1039]: time="2022-06-08T17:04:52.874831490Z" level=error msg="Handler for POST /v1.40/images/create returned error: Get \"https://k8s.gcr.io/v2/\": context deadline exceeded" Jun 08 17:04:58 k8s-master-1 dockerd[1039]: time="2022-06-08T17:04:58.867350258Z" level=warning msg="Error getting v2 registry: Get \"https://k8s.gcr.io/v2/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" ...
果然是镜像有问题,从 k8s.gcr.io 拉取镜像失败。可以根据这里 的说明,来指定自定义的镜像地址。
1 2 # 查看服务实时日志 $ journalctl -u docker -f
设置 --image-repository
可以拉下来镜像,但还是会在启动 control plane 时超时。
1 2 [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed.
老老实实挂代理吧。
1 2 3 4 5 6 7 8 9 10 11 sudo mkdir -p /etc/systemd/system/docker.service.d sudo touch /etc/systemd/system/docker.service.d/proxy.conf sudo chmod 777 /etc/systemd/system/docker.service.d/proxy.conf sudo echo ' [Service] Environment="HTTP_PROXY=socks5://192.168.6.19:3213" Environment="HTTPS_PROXY=socks5://192.168.6.19:3213" ' >> /etc/systemd/system/docker.service.d/proxy.conf sudo systemctl daemon-reload sudo systemctl restart docker sudo systemctl restart kubelet
使用 kubectl proxy 无法访问其他节点服务
下面是访问 dashboard 的错误信息,运行时是 docker,kubernetes-dashboard 运行在另外一台 worker node 上,使用 master node 的 proxy 访问 dashboard 服务会报下面的错误。
1 2 3 4 5 6 7 8 9 10 11 { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "error trying to reach service: dial tcp 172.17.0.3:8443: connect: connection refused", "reason": "ServiceUnavailable", "code": 503 }
原因是 docker 容器使用的网络(172.17.0.1/16)和网络扩展(用的是 flannel,10.244.0.0/32)不是统一个网络,导致无法访问。这里的 172.17.0.3 其实是 worker node 的网络地址,这是因为 proxy 容易部署早于 flannel,不在同一个网络。
1 2 3 4 5 6 # 查看 flannel 网络 $ cat /run/flannel/subnet.envFLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.0.1/24 FLANNEL_MTU=1400 FLANNEL_IPMASQ=true
生成 docker 环境变量 DOCKER_OPTS。
1 2 3 4 5 6 7 8 # 找到 mk-docker-opts.sh,在 flannel 镜像里面;也可以下载 flannel 的二进制包找到这个脚本 $ sudo find / -name 'mk-docker-opts.sh' /var/lib/docker/overlay2/8779d2bd83ddf0e237da15f5c0e62fd79bbf6d3868cea87ec926c471f1184774/merged/opt/bin/mk-docker-opts.sh /var/lib/docker/overlay2/99462f1d9e955f5c40a11844119dc1e0f295208c20a696e7bea76b39324a9943/diff/opt/bin/mk-docker-opts.sh # root 用户; 生成 docker opts $ alias mk-docker-opts="/var/lib/docker/overlay2/99462f1d9e955f5c40a11844119dc1e0f295208c20a696e7bea76b39324a9943/diff/opt/bin/mk-docker-opts.sh" $ mk-docker-opts -d /run/docker_opts.env -c
修改 docker service。
1 2 3 4 5 6 7 8 9 10 # root 用户执行 $ vim /lib/systemd/system/docker.service # 添加这一行 EnvironmentFile=/run/docker_opts.env # 修改这一行 ExecStart=/usr/bin/dockerd $DOCKER_OPTS -H fd:// ... # 重启 docker $ systemctl daemon-reload $ systemctl restart docker
验证 pod ip。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # alias k8s="kubectl --namespace=default" $ k8s get pods NAME READY STATUS RESTARTS AGE dashboard-metrics-scraper-8c47d4b5d-2bbbx 1/1 Running 0 43s kubernetes-dashboard-59fccbc7d7-9wmn9 1/1 Running 0 43s $ k8s describe pod kubernetes-dashboard-59fccbc7d7-9wmn9 Name: kubernetes-dashboard-59fccbc7d7-9wmn9 Namespace: default Priority: 0 Node: k8s-worker-1/10.1.0.123 Start Time: Sat, 11 Jun 2022 07:03:10 +0000 Labels: k8s-app=kubernetes-dashboard pod-template-hash=59fccbc7d7 Annotations: seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running IP: 10.244.1.3 ...
这里需要注意的是,需要在每个 node 执行上面的操作,docker_opt.env 文件不能共用。
dashboard 无法登录
最简单的方式是,网络类型设置为 NodePort,使用火狐浏览器打开。
使用 proxy 的方式打开登录界面之后,会发现无法登录,有如下提示。
1 Insecure access detected. Sign in will not be available. Access Dashboard securely over HTTPS or using localhost. Read more here .
即便添加如下参数。
1 2 3 4 5 --auto-generate-certificates --namespace=default --insecure-bind-address=0.0.0.0 --insecure-port=8080 --enable-insecure-login
根据 这里 说的,--enable-insecure-login
仅适用在 127.0.0.1
和 localhost
使用 http 登录的情景,对于使用 kubectl proxy 使用 http 协议并不适用。
端口转发
1 $ kubectl port-forward -n default service/kubernetes-dashboard 18082:443 --address='0.0.0.0'
解决方案
dashboard 的 ssl 认证有点坑,可以确认下面几点。
所以,对于使用 HTTPS ,可以从两个方向来解决。
购买证书认证商的认证服务,使用域名,并配置域名解析
搭建 self-signed 站点
购买证书认证服务就不说了,记录几个可行的解决方案。
端口转发
Firefox 可以访问
Chrome 不能查看证书,Safari、Chrome、Opera 不能访问
搭建 nginx 服务并做自认证
Safari、Firefox 可以忽略风险直接访问
Chrome 需要先下载证书,标记信任后可访问
搭建 nginx 并配置自认证
至于搭建参考这里 吧,端口类型改为 NodePort。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 kind: Service apiVersion: v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: default spec: ports: - port: 443 targetPort: 8443 type: NodePort selector: k8s-app: kubernetes-dashboard
添加 nginx 配置。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 upstream k8s-dashboard { server 10.244.1.2:8443; } server { # auth_basic "Auth Message"; # auth_basic_user_file /etc/apache2/.htpasswd; client_max_body_size 100m; listen 8443 ssl default_server; listen [::]:8443 ssl default_server; server_name 192.168.6.103; include snippets/self-signed.conf; location / { proxy_pass https://k8s-dashboard; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # enable websocket proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } }
然后就可以使用 Chrome 访问了。
1 2 # 浏览器输入地址 https://192.168.6.103:8443/#/login