ubuntu network

静态地址

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# vim /etc/netplan/99_config.yaml
network:
version: 2
renderer: networkd
ethernets:
eth0:
addresses:
- 10.10.10.2/24 # 静态地址
gateway4: 10.10.10.1
nameservers: # 可选
search: [mydomain, otherdomain]
addresses: [10.10.10.1, 1.1.1.1]

$ sudo netplan apply

重启

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# method 1
$ sudo netplan apply

# method 2
$ sudo nmcli networking off
$ sudo nmcli networking on

# method 3
$ sudo systemctl start NetworkManager
$ sudo systemctl stop NetworkManager

# method 4
$ sudo service network-manager restart

# method 5
$ sudo ifdown -a
$ sudo ifup -a

# method 6
$ sudo systemctl restart sytemd-networking

接口

1
2
3
# up / down
ip link set dev ens7 up # up
ip link set dev ens7 down # down

路由

1
2
3
4
5
6
7
8
9
10
11
# 展示路由配置
ip route show

# 添加路由
ip route add <network_ip>/<cidr> via <gateway_ip> dev <network_card_name>
## 示例
ip route add 192.168.6.0/24 dev eth0
ip route add default

# 删除路由
ip route del default # 删除默认路由

设置 DNS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
## ubuntu 20.04+
# cat /etc/netplan/***.yaml
network:
version: 2
ethernets:
ens4:
dhcp4: true
match:
macaddress: fa:16:3e:65:2c:6b
mtu: 1450
set-name: ens4
nameservers:
addresses: [192.168.6.1,8.8.8.8] # 设置 dns

# 生效
sudo netplan apply

防火墙

1
2
3
4
5
6
# 查看防火墙状态
$ sudo ufw status
# active: 激活, inactive: 非激活

# disable
$ sudo ufw disable

Netplan

  • gateway4 弃用,使用 route 代替
1
2
3
4
5
6
# 弃用版本
gateway4: 192.168.6.1
# 新版设置
routes:
- to: default
via: 192.168.6.1

设置路由

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
network:
version: 2
ethernets:
enp0s31f6:
match:
macaddress: 6c:4b:90:85:5d:59
dhcp4: true
wakeonlan: true
set-name: enp0
nameservers:
addresses: [223.5.5.5, 8.8.8.8]
addresses: [192.168.6.8/24]
routes:
- to: default
via: 192.168.6.1
table: 100
routing-policy:
- to: 192.168.6.0/24
table: 100
priority: 100
enxf8e43b1a1229:
match:
macaddress: f8:e4:3b:1a:12:29
addresses: [192.168.9.99/24]
set-name: ext0
nameservers:
addresses: [223.5.5.5, 8.8.8.8]

wireshark

安装

1
apt install tshark

文档参考这里

使用

1
2
3
4
tshark -i <entwork-interface>

# 筛选端口
tshark -i enp0 -f "src port 30800"

Wake On Lan

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 配置
$ cat /etc/netplan/*_config.yaml
# Let NetworkManager manage all devices on this system
network:
version: 2
renderer: NetworkManager
ethernets:
enp4s0:
match:
macaddress: 2a:04:a0:46:08:38
dhcp4: true
wakeonlan: true
enp5s0:
match:
macaddress: 2a:04:a0:46:08:39
dhcp4: true
wakeonlan: true

# 生效
$ sudo netplan apply

注意

  1. 如果不生效,尝试添加 macaddress match

yarn

常用命令

查看日志

1
$ yarn logs -applicationId <application-id>

kill 任务

1
$ yarn app -kill <application-id>

查看任务

1
$ yarn top

htop

按键

F2 或者 Shift + S,对于没有 Delete 按键的键盘,尝试 Fn + Backspace

配置

修改每行展示的 cpu 数量

image-20231102111705618

调整左右列的 CPUS。

iterm2

快捷键

1
2
3
4
5
6
# 切换 tab
Command + 方向键
Command + 数字

# 切换 panel
Command + Options + 方向键

滚轮

配置,Advanced -> Scroll wheel sends arrow keys when in alternate screen mode

  • yes,滚轮事件转为箭头事件
    • 在 vi、screen、tumx 中,可以使用滚轮快速上下移动指针
  • no
    • 避免上下切换 history 而不滚动屏幕(此时只能通过滚动条滚动屏幕内容)

python random

常用

从数组中选取

1
2
3
4
5
6
7
8
9
10
11
# 定义数组
>>> l = [1, 2, 3, 4, 5, 6, 7]
# 随机选一个
>>> random.choice(l)
6
# 随机选多个,可重复
>>> random.choices(l, k=3)
[7, 4, 7]
# 随机选多个,不重复
>>> random.sample(l, 3)
[3, 5, 7]

vncserver

centos

1
2
3
4
5
6
7
8
9
10
11
# 安装 vncserver
$ sudo yum install tigervnc-server

# 设置密码
$ vncpasswd

# 拷贝配置
$ sudo cp /lib/systemd/system/vncserver@.service /etc/systemd/system/vncserver@:1.service

# 修改配置
$ sudo vim /etc/systemd/system/vncserver@\:1.service

配置

以用户 wii 为例,需要替换其中的内容。

1
2
3
4
5
6
7
8
9
10
11
12
13
[Unit]
Description=Remote desktop service (VNC)
After=syslog.target network.target

[Service]
Type=forking
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'
ExecStart=/sbin/runuser -l wii -c "/usr/bin/vncserver %i -geometry 1920x1080"
# PIDFile=/home/wii/.vnc/%H%i.pid
ExecStop=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'

[Install]
WantedBy=multi-user.target

交由 systemctl 管理

1
2
3
4
$ sudo systemctl daemon-reload         # 重新加载配置
$ sudo systemctl start vncserver@:1 # 启动 vnc server
$ sudo systemctl status vncserver@:1 # 查看状态
$ sudo systemctl enable vncserver@:1 # 开启自启动

参考

ubuntu

参考

jmeter

命令行运行

1
$ ./jmeter -n -t config.jmx -l results.jtl

参数

1
2
3
4
5
6
7
# 命令格式
jmeter -n -t test-file [-p property-file] [-l results-file] [-j log-file]

# 参数
-n non gui
-t test file
-l result file

RPC 压测

针对 RPC 需要基于 SDK 做开发,整体流程如下。

  • 基于 SDK 开发

    • 引用 jmeter 依赖包
    • 编写压测类,实现 JavaSamplerClient 接口
  • 将包及其依赖拷贝至 jmeter 的 lib/ext 目录下

  • 重新打开 jmeter

  • 创建压测项目

    • 新建线程组

      image-20210825164558728

    • 新建 java 请求

      image-20210825164643817

      image-20210825164927950

    • 下拉框选择我们的实现类

    • 参数需要在实现类的 getDefaultParameters 方法返回,但是值可以在 jmeter GUI 修改以及保存

    • 添加 view result tree

      image-20210825165136784

  • 点击开始按钮,进行压测

  • 通过 view result tree 查看结果

    image-20210825165307925

代码示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// EchoService 是已运行的 gRPC 服务
public class EchoClientExampleForJMeter implements JavaSamplerClient {
EchoService echoService;
@Override
public void setupTest(JavaSamplerContext javaSamplerContext) {
echoService = ...; // 初始化 echoService
}

@Override
public SampleResult runTest(JavaSamplerContext javaSamplerContext) {
SampleResult result = new SampleResult();
result.sampleStart();

String id = javaSamplerContext.getParameter("id");
String name = javaSamplerContext.getParameter("name");

EchoRequest request = EchoRequest.newBuilder()
.setId(id)
.setName(name)
.build();
try {
EchoResponse response = echoService.echo(request);
result.sampleEnd();
result.setSuccessful(true);
result.setResponseData(JsonFormat.printer().print(response), null);
result.setDataType(SampleResult.TEXT);
result.setResponseCode("OK");
} catch (Exception e) {
result.sampleEnd();
result.setSuccessful(false);
java.io.StringWriter stringWriter = new java.io.StringWriter();
e.printStackTrace(new java.io.PrintWriter(stringWriter));
result.setResponseData(stringWriter.toString(), null);
result.setDataType(SampleResult.TEXT);
result.setResponseCode("FAILED");

e.printStackTrace();
}

return result;
}

@Override
public void teardownTest(JavaSamplerContext javaSamplerContext) {
}

@Override
public Arguments getDefaultParameters() {
Arguments arguments = new Arguments();
arguments.addArgument("id", String.valueOf(RANDOM.nextInt(10) + 2000));
arguments.addArgument("name", "pressure" + RANDOM.nextInt(100000000) + 10000000);
return arguments;
}
}

异常

OutOfMemoryError

修改 bin/jmeter 脚本。

1
2
3
...
: "${HEAP:="-Xms4g -Xmx4g -XX:MaxMetaspaceSize=1024m"}" # 调大堆内存和 Meta 内存
...

k8s 安装

[toc]

安装

安装运行时

docker

1
2
3
4
5
6
7
8
9
10
11
12
# ubuntu
## 安装依赖
sudo apt-get install -y ca-certificates curl gnupg lsb-release
## 添加 GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
## 添加 sources repository
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
## install docker engine
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

对于新版的 k8s,使用 docker 还需要安装 cri-docker,从这里下载二进制程序,把下面的内容保存为两个文件 cri-docker.servicecri-docker.socket

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# cri-docker.service
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket

[Service]
Type=notify
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
10
11
12
13
# cri-docker.socket
[Unit]
Description=CRI Docker Socket for the API
PartOf=cri-docker.service

[Socket]
ListenStream=%t/cri-dockerd.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker

[Install]
WantedBy=sockets.target

移动文件

1
2
3
4
5
# 移动二进制
sudo mv cri-docker /usr/bin
# 移动 systemd 配置
sudo mv cri-docker.service /etc/systemd/system/
sudo mv cri-docker.socket /etc/systemd/system/

systemd 启动服务

1
2
3
4
5
6
sudo systemctl daemon-reload
sudo systemctl enable cri-docker.service
sudo systemctl enable --now cri-docker.socket

# 启动服务
sudo service cri-docker start

安装 kub*

安装 kubeadm、kubelet、kubectl,国内源参考 阿里云镜像 或者 清华开源镜像站

1
2
3
4
5
6
7
8
sudo apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
# root 用户运行
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl

拉取镜像

1
2
3
$ sudo kubeadm config images pull \
--image-repository registry.aliyuncs.com/google_containers \
--cri-socket unix:///var/run/cri-dockerd.sock

初始化

1
2
3
4
5
6
# 10.244.0.0/16 是 chennel 扩展的配置
# --apiserver-advertise-address 是 master 节点的 ip,如果是单机,即为该机器 ip 地址
$ kubeadm init --image-repository registry.aliyuncs.com/google_containers \
--pod-network-cidr 10.244.0.0/16 \
--control-plane-endpoint 10.1.0.145 \
--cri-socket unix:///var/run/cri-dockerd.sock

root 用户使用

需要配置 KUBECONFIG=/etc/kubernetes/admin.conf

1
2
3
4
root@k8s-master-1:/home/ubuntu# export KUBECONFIG=/etc/kubernetes/admin.conf
root@k8s-master-1:/home/ubuntu# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-1 Ready control-plane 5m55s v1.24.1

配置 non-root 用户使用

1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

验证

1
2
3
ubuntu@k8s-master-1:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-1 Ready control-plane 4m52s v1.24.1

回滚操作

1
2
3
4
5
6
kubeadm reset [glags] 

preflight Run reset pre-flight checks
update-cluster-status Remove this node from the ClusterStatus object.
remove-etcd-member Remove a local etcd member.
cleanup-node Run cleanup node.

配置网络

这一步很关键,如不能正确配置集群网络,pod 间可能无法通讯,kubectl proxy 无法正常访问(通常表现为 pod 运行正常,但提示连接拒绝)。以 flannel 为例,首先安装 flannel。

1
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

使用工具 mk-docker-opts.sh生成网络信息,这个工具也可以使用 sudo find / -name 'mk-docker-opts.sh' 在 docker 容器中找到。

1
$ mk-docker-opts.sh -d /run/docker_opts.env -c

修改 docker service。

1
2
3
4
5
6
7
8
9
10
# root 用户执行
$ vim /lib/systemd/system/docker.service
# 添加这一行
EnvironmentFile=/run/docker_opts.env
# 修改这一行
ExecStart=/usr/bin/dockerd $DOCKER_OPTS -H fd:// ...

# 重启 docker
$ systemctl daemon-reload
$ systemctl restart docker

添加节点

对添加的节点同样需要配置网络,且不可复用其他节点的 docker_opts.env 文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 在 master 节点
$ kubeadm token create --print-join-command
...

# 在待加入的节点, 在上面生成的命令后面指定 cri socket
$ kubeadm join 10.1.0.145:6443 --token nxxcv7.gge00x97wiphualw --discovery-token-ca-cert-hash sha256:cfb324b2ee7ee548b08e38d2e6d60905e392553bf6715504e87888183a1238fd
u --cri-socket unix:///var/run/cri-dockerd.sock

# 为新节点指定 label
$ kubectl label node k8s-worker-1 node-role.kubernetes.io/worker=worker

# 验证
ubuntu@k8s-master-1:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-1 Ready control-plane 9h v1.24.1
k8s-worker-1 Ready worker 3m17s v1.24.1

安装 dashboard

1
2
3
4
5
# 安装 dashboard
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.6.0/aio/deploy/recommended.yaml

# 启动代理
$ kubectl proxy --address=0.0.0.0

创建服务账号

保存到 account.yaml

1
2
3
4
5
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard

然后运行。

1
kubectl apply -f account.yaml

设置权限

保存到 permission.yaml

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard

然后运行。

1
kubectl apply -f permission.yaml

也可以放到一个文件里面。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard

获取 token

1
kubectl -n kubernetes-dashboard create token admin-user

https 证书

k8s dashboard 默认会自己生成证书,可以跳过。对于 https 证书,可以自己生成证书,可以用证书认证服务商。对于自己生成证书,可以手动生成,也可以通过添加 --auto-generate-certificates 来自动生成,更多参数参考这里

1
2
3
4
5
6
7
8
9
10
# 自认证证书
# 生成 dashboard.pass.key
$ openssl genrsa -des3 -passout pass:over4chars -out dashboard.pass.key 2048
# 生成 dashboard.key
$ openssl rsa -passin pass:over4chars -in dashboard.pass.key -out dashboard.key
$ rm dashboard.pass.key # 可以删除了
# 生成 dashboard.csr
$ openssl req -new -key dashboard.key -out dashboard.csr # 一直回车
# 生成 dashboard.crt
$ openssl x509 -req -sha256 -days 365 -in dashboard.csr -signkey dashboard.key -out dashboard.crt

删除 dashboard

1
$ kubectl delete -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.6.0/aio/deploy/recommended.yaml

网络扩展

flannel

安装

1
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

常用命令

查看节点信息

1
2
3
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
localhost.localdomain NotReady control-plane,master 2m13s v1.22.1

查看 pod 信息

1
2
3
4
5
6
7
8
9
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-78fcd69978-9dk4n 0/1 Pending 0 2m52s
kube-system coredns-78fcd69978-w52zc 0/1 Pending 0 2m52s
kube-system etcd-localhost.localdomain 1/1 Running 0 3m6s
kube-system kube-apiserver-localhost.localdomain 1/1 Running 0 3m6s
kube-system kube-controller-manager-localhost.localdomain 1/1 Running 0 3m8s
kube-system kube-proxy-4w84n 1/1 Running 0 2m52s
kube-system kube-scheduler-localhost.localdomain 1/1 Running 0 3m6s

pod 管理

1
2
3
4
5
# 查看 pod  信息
$ kubectl get pods -A # A = all-namespaces

# 删除 pod
$ kubectl delete pod kubernetes-dashboard --namespace=kubernetes-dashboard

参考

问题

配置

k8s 配置在这里 /etc/kubernetes/kubelet.conf

找不到节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ubuntu@k8s-master-1:~$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2022-06-08 16:18:53 UTC; 2s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 69055 (kubelet)
Tasks: 29 (limit: 38495)
Memory: 39.5M
CGroup: /system.slice/kubelet.service
└─69055 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/co>

Jun 08 16:18:55 k8s-master-1 kubelet[69055]: E0608 16:18:55.158403 69055 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.1.0>
Jun 08 16:18:55 k8s-master-1 kubelet[69055]: E0608 16:18:55.193156 69055 kubelet.go:2419] "Error getting node" err="node \"k8s-master-1\" not found"
...

这里 有说,可能是 api server 不能连接,由于 cri 用了 docker,随检查 docker 状态。

node not found is a misleading error by the kubelet. at this point it means that the kubelet was unable to register a Node object with the api server.

https://nalshsvrk8ss01.railcarmgt.com:6443/healthz?timeout=10s in 0 milliseconds

this means that the api server cannot be connected. could be caused by a number of things (including firewall).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
root@k8s-master-1:~# service docker status
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2022-06-08 16:30:32 UTC; 34min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 1039 (dockerd)
Tasks: 23
Memory: 134.1M
CGroup: /system.slice/docker.service
└─1039 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Jun 08 17:04:52 k8s-master-1 dockerd[1039]: time="2022-06-08T17:04:52.874831490Z" level=error msg="Handler for POST /v1.40/images/create returned error: Get \"https://k8s.gcr.io/v2/\": context deadline exceeded"
Jun 08 17:04:58 k8s-master-1 dockerd[1039]: time="2022-06-08T17:04:58.867350258Z" level=warning msg="Error getting v2 registry: Get \"https://k8s.gcr.io/v2/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
...

果然是镜像有问题,从 k8s.gcr.io 拉取镜像失败。可以根据这里的说明,来指定自定义的镜像地址。

1
2
# 查看服务实时日志
$ journalctl -u docker -f

设置 --image-repository可以拉下来镜像,但还是会在启动 control plane 时超时。

1
2
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

老老实实挂代理吧。

1
2
3
4
5
6
7
8
9
10
11
sudo mkdir -p /etc/systemd/system/docker.service.d 
sudo touch /etc/systemd/system/docker.service.d/proxy.conf
sudo chmod 777 /etc/systemd/system/docker.service.d/proxy.conf
sudo echo '
[Service]
Environment="HTTP_PROXY=socks5://192.168.6.19:3213"
Environment="HTTPS_PROXY=socks5://192.168.6.19:3213"
' >> /etc/systemd/system/docker.service.d/proxy.conf
sudo systemctl daemon-reload
sudo systemctl restart docker
sudo systemctl restart kubelet

使用 kubectl proxy 无法访问其他节点服务

下面是访问 dashboard 的错误信息,运行时是 docker,kubernetes-dashboard 运行在另外一台 worker node 上,使用 master node 的 proxy 访问 dashboard 服务会报下面的错误。

1
2
3
4
5
6
7
8
9
10
11
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "error trying to reach service: dial tcp 172.17.0.3:8443: connect: connection refused",
"reason": "ServiceUnavailable",
"code": 503
}

原因是 docker 容器使用的网络(172.17.0.1/16)和网络扩展(用的是 flannel,10.244.0.0/32)不是统一个网络,导致无法访问。这里的 172.17.0.3 其实是 worker node 的网络地址,这是因为 proxy 容易部署早于 flannel,不在同一个网络。

1
2
3
4
5
6
# 查看 flannel 网络
$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1400
FLANNEL_IPMASQ=true

生成 docker 环境变量 DOCKER_OPTS。

1
2
3
4
5
6
7
8
# 找到 mk-docker-opts.sh,在 flannel 镜像里面;也可以下载 flannel 的二进制包找到这个脚本
$ sudo find / -name 'mk-docker-opts.sh'
/var/lib/docker/overlay2/8779d2bd83ddf0e237da15f5c0e62fd79bbf6d3868cea87ec926c471f1184774/merged/opt/bin/mk-docker-opts.sh
/var/lib/docker/overlay2/99462f1d9e955f5c40a11844119dc1e0f295208c20a696e7bea76b39324a9943/diff/opt/bin/mk-docker-opts.sh

# root 用户; 生成 docker opts
$ alias mk-docker-opts="/var/lib/docker/overlay2/99462f1d9e955f5c40a11844119dc1e0f295208c20a696e7bea76b39324a9943/diff/opt/bin/mk-docker-opts.sh"
$ mk-docker-opts -d /run/docker_opts.env -c

修改 docker service。

1
2
3
4
5
6
7
8
9
10
# root 用户执行
$ vim /lib/systemd/system/docker.service
# 添加这一行
EnvironmentFile=/run/docker_opts.env
# 修改这一行
ExecStart=/usr/bin/dockerd $DOCKER_OPTS -H fd:// ...

# 重启 docker
$ systemctl daemon-reload
$ systemctl restart docker

验证 pod ip。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# alias k8s="kubectl --namespace=default"
$ k8s get pods
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-8c47d4b5d-2bbbx 1/1 Running 0 43s
kubernetes-dashboard-59fccbc7d7-9wmn9 1/1 Running 0 43s
$ k8s describe pod kubernetes-dashboard-59fccbc7d7-9wmn9
Name: kubernetes-dashboard-59fccbc7d7-9wmn9
Namespace: default
Priority: 0
Node: k8s-worker-1/10.1.0.123
Start Time: Sat, 11 Jun 2022 07:03:10 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=59fccbc7d7
Annotations: seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP: 10.244.1.3
...

这里需要注意的是,需要在每个 node 执行上面的操作,docker_opt.env 文件不能共用。

dashboard 无法登录

最简单的方式是,网络类型设置为 NodePort,使用火狐浏览器打开。

使用 proxy 的方式打开登录界面之后,会发现无法登录,有如下提示。

1
Insecure access detected. Sign in will not be available. Access Dashboard securely over HTTPS or using localhost. Read more here .

即便添加如下参数。

1
2
3
4
5
--auto-generate-certificates
--namespace=default
--insecure-bind-address=0.0.0.0
--insecure-port=8080
--enable-insecure-login

根据 这里 说的,--enable-insecure-login 仅适用在 127.0.0.1localhost 使用 http 登录的情景,对于使用 kubectl proxy 使用 http 协议并不适用。

端口转发

1
$ kubectl port-forward -n default service/kubernetes-dashboard 18082:443 --address='0.0.0.0' 

解决方案

dashboard 的 ssl 认证有点坑,可以确认下面几点。

  • 不允许使用非 localhost127.0.0.1 地址使用 HTTP 协议访问,没有配置可以规避这个问题,所以

    • kubectl proxy 方式只能是在本机安装 k8s 时使用
    • 基本可以放弃使用 HTTP 访问了
  • localhost127.0.0.1 只能使用 HTTPS 协议访问了

所以,对于使用 HTTPS ,可以从两个方向来解决。

  • 购买证书认证商的认证服务,使用域名,并配置域名解析
  • 搭建 self-signed 站点

购买证书认证服务就不说了,记录几个可行的解决方案。

  • 端口转发
    • Firefox 可以访问
    • Chrome 不能查看证书,Safari、Chrome、Opera 不能访问
  • 搭建 nginx 服务并做自认证
    • Safari、Firefox 可以忽略风险直接访问
    • Chrome 需要先下载证书,标记信任后可访问

搭建 nginx 并配置自认证

至于搭建参考这里吧,端口类型改为 NodePort。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: default
spec:
ports:
- port: 443
targetPort: 8443
type: NodePort
selector:
k8s-app: kubernetes-dashboard

添加 nginx 配置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
upstream k8s-dashboard {
server 10.244.1.2:8443;
}

server {
# auth_basic "Auth Message";
# auth_basic_user_file /etc/apache2/.htpasswd;
client_max_body_size 100m;
listen 8443 ssl default_server;
listen [::]:8443 ssl default_server;
server_name 192.168.6.103;
include snippets/self-signed.conf;

location / {
proxy_pass https://k8s-dashboard;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

# enable websocket
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}

然后就可以使用 Chrome 访问了。

1
2
# 浏览器输入地址
https://192.168.6.103:8443/#/login

组装

初衷

  • 想组装一台可以随便折腾的主机
  • 性价比要高,不要太贵
  • 考虑到后面可能要部署大数据套件,性能不能太弱

硬件

硬件类型 参数 数量 价格
主板 华南金牌 x99 f8d 1 894
CPU E5 2683 v4 2 1940
内存 2133 ddr4 16g 4 1060
硬盘 500G ssd 2 950
散热器 利民 as120 2 248
机箱 - 1 429
电源 850w 1 889
显卡 - - 60
合计 - - 6470

变更

硬件类型 参数 数量 价格
内存 2133 ddr4 32g 2 960
硬盘 1T nvme 1 597
合计 - - 1557

合计

6470 + 1557

= 8027

注意事项

  • 主板两个 cpu 供电口距离过大,电源需要两条独立 cpu 供电线,如果买一拖二的 cpu 供电线电源,提前买一根 cpu 供电线
  • 注意散热器支持的接口,我这里用的是兼容 2011 接口的

配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
wii@srv:~$ free -h
total used free shared buff/cache available
Mem: 125Gi 13Gi 110Gi 2.0Mi 1.7Gi 110Gi
Swap: 8.0Gi 0B 8.0Gi
wii@srv:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
Stepping: 1
CPU MHz: 1200.097
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4199.82
Virtualization: VT-x
L1d cache: 1 MiB
L1i cache: 1 MiB
L2 cache: 8 MiB
L3 cache: 80 MiB
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs
bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe pop
cnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpri
ority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dt
herm ida arat pln pts md_clear flush_l1d

跑分

image-20210824204253078

详细参考这里

后记

  • 两个 CPU 1940, 一年后 800~900, 血亏

server

BIOS 配置

断电恢复后自启动

  • 开机长按 Delete 进入 BIOS
  • InelRCSetup -> PCH Configuration -> PCH Devices -> Restore AC after Power Loss
  • 设置为 Power On

设置断电恢复后启动,目的是设置远程启动。

系统

Centos 7。

配置

1
2
3
4
5
6
7
8
9
10
11
12
# 关闭 selinux
$ sudo vim /etc/selinux/config
SELINUX=enforcing -> SELINUX=disabled
$ sudo setenforce 0

# 关闭 swap
$ sudo vim /etc/fstab
注释掉行 /dev/mapper/centos-swap

# 关闭防火墙
$ systemctl stop firewalld
$ systemctl disable firewalld

程序

必备

1
sudo yum install git telnet -y

zsh

1
2
3
$ sudo yum install zsh
# on my zsh
$ sh -c "$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

docker

1
2
# 一键安装脚本
$ curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun

参考

vnc

服务端

1
2
3
4
sudo apt install xfonts-base xfonts-75dpi xfonts-100dpi
sudo apt install tightvncserver

# centos

配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# /etc/systemd/system/vncserver@:1.service
[Unit]
Description=Remote desktop service (VNC)
After=syslog.target network.target

[Service]
#User=wii
#Group=wii
#WorkingDirectory=/home/wii
Type=forking

# Clean any existing files in /tmp/.X11-unix environment
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'
# ExecStart=/sbin/runuser -l wii -c "/usr/bin/vncserver %i -geometry 1920x1080"
ExecStart=/bin/sh -c "/usr/bin/vncserver %i -geometry 1920x1080"
PIDFile=/home/wii/.vnc/%H%i.pid
ExecStop=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'

[Install]
WantedBy=multi-user.target

客户端

这里下载。

参考

jdk

手动下载

1
2
3
4
5
# 从这里 https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html 下载 jdk

# 安装
$ sudo yum install jdk-8u301-linux-x64.rpm
$ sudo alternatives --config java

yum

1
2
$ yum install -y java-1.8.0-openjdk-devel  # 安装 jdk
$ yum install -y java-1.8.0-openjdk # 安装 jre

mvn

这里 下载。

1
2
3
4
5
6
7
8
9
# 修改 conf/settings.xml
# 注释掉如下内容
<mirror>
<id>maven-default-http-blocker</id>
<mirrorOf>external:http:*</mirrorOf>
<name>Pseudo repository to mirror external repositories initially using HTTP.</name>
<url>http://0.0.0.0/</url>
<blocked>true</blocked>
</mirror>

npm

1
2
3
4
5
$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.37.2/install.sh | bash
$ nvm install v12
# nrm
$ npm install nrm
$ nrm use taobao

mysql / mariadb

1
2
3
4
5
$ yum install mariadb mariadb-server
$ systemctl start mariadb #启动mariadb
$ systemctl enable mariadb #设置开机自启动
$ mysql_secure_installation #设置root密码等相关
$ mysql -uroot -p #测试登录

ambari (using MapR)

依赖

  • jdk

  • mvn

  • rpm-build(centos)

  • npm

  • python-devel

    • sudo yum install -y python-devel
  • ant

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    #!/bin/bash
    set -ex
    ANT_VERSION=1.10.11
    wget http://archive.apache.org/dist/ant/binaries/apache-ant-${ANT_VERSION}-bin.tar.gz
    sudo tar xvfvz apache-ant-${ANT_VERSION}-bin.tar.gz -C /opt
    sudo ln -sfn /opt/apache-ant-${ANT_VERSION} /opt/ant
    sudo sh -c 'echo ANT_HOME=/opt/ant >> /etc/environment'
    sudo ln -sfn /opt/ant/bin/ant /usr/bin/ant

    ant -version
    rm apache-ant-${ANT_VERSION}-bin.tar.gz
  • gcc

下载

这里下载,或使用 git 克隆,git clone git@github.com:apache/ambari.git,切换分支 git checkout branch-2.7

安装

参考这里

1
2
3
4
5
# 添加 -Drat.skip=true
$ mvn -B clean install rpm:rpm -DnewVersion=2.7.5.0.0 -DbuildNumber=5895e4ed6b30a2da8a90fee2403b6cab91d19972 -DskipTests -Drat.skip=true -Dpython.ver="python >= 2.6"

# 需要修改所有 https://s3.amazonaws.com/dev.hortonworks.com/ 开头的连接
# 参照这个 mr 修改 https://github.com/apache/ambari/pull/3283/commits/3dca705f831383274a78a8c981ac2b12e2ecce85

异常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 报错
[ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.3:npm (npm install) on project ambari-admin: Failed to run task: 'npm install --unsafe-perm' failed. (error code 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <args> -rf :ambari-admin

# 处理
cd ambari-admin/src/main/resources/ui/admin-web
npm install --unsafe-perm

# 继续打包
mvn -B clean install rpm:rpm -DnewVersion=2.7.5.0.0 -DbuildNumber=5895e4ed6b30a2da8a90fee2403b6cab91d19972 -DskipTests -Drat.skip=true -Dpython.ver="python >= 2.6" -rf :ambari-admin

问题集锦

配置远程启动

整机无负载功率在 100w 左右,功率大且并不常用,工作的时候可能会用到。远程关闭、启动方案是通过设置 BIOS 的断电恢复后自动启动 + 小米智能插座实现。

使用

系统尝试了 centos 7、centos 8、ubuntu 20.04(desktop + server),尝试安装了 ambari、mapr、openstack、microstack。最终的使用方案是,ubuntu 20.04 + openstack Wallaby。

系统最开始打算用 centos 7,觉得可能会更稳定吧,公司服务器一般也是。

想搭一套大数据平台(zookeeper、hadoop、impala、yarn、spark、kudu 等),先是尝试了 ambari,但是现在 CDH 的时候遇到收费墙问题,放弃。后发现 MapR,惊喜,先是尝试在 centos 7 上装,后来发现最新版本不支持。然后尝试从 centos 7 直接升级到 centos 8,失败。重新安装 centos 8,再安装 MapR,配置后无法开机,又重新安装。

一出问题,买的那个亮机卡就不显示内容,需要搬机箱、拆换另外一台机器的显卡,崩溃。不想再在裸机上装太多东西,笨重的东西全部放虚拟机。考虑用 virtual box,但不太方便,最后选了 openstack。

先是尝试在 centos 7 装 openstack,每次创建卷时,cinder 都会报错,pip 锁死在 8.x.x。本打算用 centos 8 试下,最终放弃。

转战 Ubuntu 之后,开始倾向于 desktop 版本,有个界面也挺好,但是那个亮机卡装的时候好好的,一进系统就什么都不显示,随选择 server 版本。

Snap 有个 MicroStack,可以一键安装 openstack,试了下,可以。但是 snap 包内的文件只读,没办法改。最终,决定还是一步一步按官网教程来安装。

openstack 官方文档有一些细节没有覆盖到,总体还是很赞。

问题

硬件错误

intel ssd 兼容性问题

但凡能看到内核日志的地方,都在疯狂刷下面的内容。后排查原因是 inter 一款 nvme 的 ssd 硬盘导致的,换了一块好了。

1
2
3
4
5
6
pcieport 0000:80:02.0: Multiple Uncorrected (Non-Fatal) error received: 0000:80:02.0
pcieport 0000:80:02.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Request ID)
pcieport 0000:80:02.0: device [8086:bf] error status/mask=88100000/88000000
pcieport 0000:80:02.0: [28] UnsupReq (First)
pcieport 0000:80:02.0: TLP Header: 34000000 01000010 00000000 00000000
pcieport 0000:80:02.0: device recovery successful

image-20210831233753508

最终

image-20210831235937113