openstack

安装

安装教程仅供个人学习用,请谨慎用于生产环境。对于 Ubuntu 安装 openstack,推荐按照官网指引操作,这里 是一个不错的辅助。

环境检查

打开安装指引,在 Environment 章节下方,有 OpenStack packages 连接,在下方有不同 linux 发行版的 package 说明,比如 centos,里面有对发行版版本的要求说明,这里 有所有的 openstack 版本。比如,centos 7 只能安装 Ussuri 之前的版本,即 Train 及之前。

此外其他配置也要进行,比如 mariadb、rabbitmq、memcached、etcd。

环境准备

以 centos 7 + openstack train 为例。这里 是 train 的发布页,这里 是安装引导页面,里面有 train 的安装指引连接 Minimal deployment for Train,以最小安装为例,先安装 Identity Service 服务,点击去之后会有常用发行版的安装教程,点击 centos 的 Install and configure,在上方会有 note,提示先进行 Openstack Install Guide 的先决条件安装步骤。

1
2
3
4
5
6
# 不要忘了安装 openstack client
$ yum install python-openstackclient

# 使用 pip 安装 openstack client
$ yum install python-devel python-pip
$ pip install python-openstackclient

安装仓库

如果我们直接使用命令安装,会报错。

1
2
3
4
5
6
7
8
9
$ yum install openstack-keystone -y
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.bupt.edu.cn
* epel: mirrors.bfsu.edu.cn
* extras: mirrors.bupt.edu.cn
* updates: mirrors.bupt.edu.cn
No package openstack-keystone available.
Error: Nothing to do

搜下仓库。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ yum search openstack
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.bupt.edu.cn
* epel: mirrors.tuna.tsinghua.edu.cn
* extras: mirrors.bupt.edu.cn
* updates: mirrors.bupt.edu.cn
============================================ N/S matched: openstack ============================================
ansible-openstack-modules.noarch : Unofficial Ansible modules for managing Openstack
centos-release-openstack-queens.noarch : OpenStack from the CentOS Cloud SIG repo configs
centos-release-openstack-rocky.noarch : OpenStack from the CentOS Cloud SIG repo configs
centos-release-openstack-stein.noarch : OpenStack from the CentOS Cloud SIG repo configs
centos-release-openstack-train.noarch : OpenStack from the CentOS Cloud SIG repo configs
diskimage-builder.noarch : Image building tools for OpenStack
golang-github-rackspace-gophercloud-devel.noarch : The Go SDK for Openstack http://gophercloud.io
php-opencloud.noarch : PHP SDK for OpenStack/Rackspace APIs
php-opencloud-doc.noarch : Documentation for PHP SDK for OpenStack/Rackspace APIs
python2-oslo-sphinx.noarch : OpenStack Sphinx Extensions and Theme for Python 2

Name and summary matches only, use "search all" for everything.

我们安装 train 对应的 noarch。

1
$ yum install centos-release-openstack-train.noarch

继续下面的流程。

安装 keystone

1
2
3
4
5
6
7
8
# 先创建用户 keystone
$ useradd -g keystone keystone

# 安装 glance 之前,创建 domain、project
# 创建 domain
$ openstack domain create --description "Default Domain" default
# 创建 project
$ openstack project create --domain default --description "Service Project" service

按教程安装到最后,会有如下内容。

1
2
3
4
5
6
7
export OS_USERNAME=admin
export OS_PASSWORD=ADMIN_PASS
export OS_PROJECT_NAME=admin
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_DOMAIN_NAME=Default
export OS_AUTH_URL=http://controller:5000/v3
export OS_IDENTITY_API_VERSION=3

修改 ADMIN_PASS 之后,保存到文件 admin-openrc,后面会用到。

安装指引有创建 domain、project、user、role 的教程,后续安装尽量不要使用新创建的项目,使用默认即可。

项目
domain / domain_id default
domain_name Default
project service

安装 glance

1
# 第二步有个 source admin-openrc,即为安装 keystone 时最后给出的内容

安装 placement

验证

验证时如果出现 Expecting value: line 1 column 1 (char 0) ,可以参考这篇文章

安装 horizon

如果访问 http://controller/dashboard 提示 404,在配置 local_settings 时添加如下内容。

1
WEBROOT = '/dashboard'

创建实例

先决条件

  • 创建镜像
  • 创建实例类型
  • 创建网络
    • 选 vxlan

LVM

对磁盘创建 lvm。

1
2
3
4
5
6
7
8
9
10
11
# 创建
$ pvcreate /dev/sda
$ vgcreate cinder-volumes /dev/sda

# 删除
## 移除卷
$ lvremove cinder--volumes-cinder--volumes--pool_tmeta
## 删除组
$ vgremove cinder-volumes
## 删除物理卷
$ pvremove /dev/sda

如果出现 pvcreate 时出现 execlude by a filter,检查 /etc/lvm/lvm.conf 下的 filters

1
filter = [ "a/sda/", "a/nvme/", "r/.*/" ]

如果想要接受一个块设备,使用类似下面的配置。

1
"a|.*|"

如果想要拒绝一个块设备,使用类似下面的配置。

1
"r|/dev/cdrom|"

Ubuntu 安装

参考安装指南,下面是在单机安装,IP 为 192.168.6.55

版本信息

软件/系统 版本
Ubuntu 22.04.2
Openstack antelope

安装依赖

1
# apt install apache2 libapache2-mod-uwsgi tgt 

环境配置

Hosts

1
2
3
# vim /etc/hosts
## 添加如下内容, 注意替换 ip
192.168.6.55 controller

安装仓库

参考这里,选择一个版本即可,比如。

1
2
3
# add-apt-repository cloud-archive:antelope
# apt install nova-compute
# apt install python3-openstackclient

SQL Database

1
# apt install mariadb-server python3-pymysql

配置

1
2
3
4
5
6
7
8
9
10
# vim /etc/mysql/mariadb.conf.d/99-openstack.cnf
## 添加如下内容
[mysqld]
bind-address = 192.168.6.55

default-storage-engine = innodb
innodb_file_per_table = on
max_connections = 4096
collation-server = utf8_general_ci
character-set-server = utf8

bind-address 为 Controller Node 的 ip

重启服务

1
# service mysql restart

初始化 mariadb

1
# mysql_secure_installation

Message Queue

注意

  • 替换掉命令中的 RABBIT_PASS
1
2
3
# apt install rabbitmq-server
# rabbitmqctl add_user openstack RABBIT_PASS
# rabbitmqctl set_permissions openstack ".*" ".*" ".*"

Memcached

1
2
3
4
# apt install memcached python3-memcache
# vim /etc/memcached.conf
## 将 -l 127.0.0.1 替换为 Controller 的 ip
-l 192.168.6.55

Etcd

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# apt install etcd
# vim /etc/default/etcd
## 参考如下内容修改
ETCD_NAME="controller"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-01"
ETCD_INITIAL_CLUSTER="controller=http://192.168.6.55:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.6.55:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.6.55:2379"
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.6.55:2379"
# systemctl enable etcd
# systemctl restart etcd

安装 Openstack 服务

参考这里,我们选择最小部署。

Identity service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# mysql
MariaDB [(none)]> CREATE DATABASE keystone;
## 注意替换掉下面的 KEYSTONE_DBPASS
MariaDB [(none)]> GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'localhost' \
IDENTIFIED BY 'KEYSTONE_DBPASS';
MariaDB [(none)]> GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'%' \
IDENTIFIED BY 'KEYSTONE_DBPASS';
MariaDB [(none)]> exit

# apt install keystone
# vim /etc/keystone/keystone.conf
## 修改如下内容, 注意替换 KEYSTONE_DBPASS
[database]
# ...
connection = mysql+pymysql://keystone:KEYSTONE_DBPASS@controller/keystone
[token]
# ...
provider = fernet

# su -s /bin/sh -c "keystone-manage db_sync" keystone
# keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone
# keystone-manage credential_setup --keystone-user keystone --keystone-group keystone
# keystone-manage bootstrap --bootstrap-password ADMIN_PASS \
--bootstrap-admin-url http://controller:5000/v3/ \
--bootstrap-internal-url http://controller:5000/v3/ \
--bootstrap-public-url http://controller:5000/v3/ \
--bootstrap-region-id RegionOne
# vim /etc/apache2/apache2.conf
## 添加如下内容
ServerName controller

## 环境变量中添加如下内容
$ export OS_USERNAME=admin, 注意替换 ADMIN_PASS
$ export OS_PASSWORD=ADMIN_PASS
$ export OS_PROJECT_NAME=admin
$ export OS_USER_DOMAIN_NAME=Default
$ export OS_PROJECT_DOMAIN_NAME=Default
$ export OS_AUTH_URL=http://controller:5000/v3
$ export OS_IDENTITY_API_VERSION=3

Glance

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# mysql
## 注意修改 GLANCE_DBPASS
MariaDB [(none)]> CREATE DATABASE glance;
MariaDB [(none)]> GRANT ALL PRIVILEGES ON glance.* TO 'glance'@'localhost' \
IDENTIFIED BY 'GLANCE_DBPASS';
MariaDB [(none)]> GRANT ALL PRIVILEGES ON glance.* TO 'glance'@'%' \
IDENTIFIED BY 'GLANCE_DBPASS';
MariaDB [(none)]> exit
## 注意保持上面设置的环境变量
$ openstack user create --domain default --password-prompt glance
$ openstack role add --project service --user glance admin
$ openstack service create --name glance \
--description "OpenStack Image" image
$ openstack endpoint create --region RegionOne \
image public http://controller:9292
# apt install glance

Placement

Compute

注意

  • nova.confauth_url 填写 http://controller:5000/v3

Networking

Dashboard

Block Storage

其他配置

tgt 配置

1
2
3
$ vim /etc/tgt/targets.conf
# 添加如下内容
include /var/lib/cinder/volumes/*

centos 常见问题

[toc]

编码

1
2
3
4
5
6
7
8
# Failed to set locale, defaulting to C

# vim /etc/profile
# 输入
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
export LC_COLLATE=C
export LC_CTYPE=en_US.UTF-8

yum

yum 命令卡住

1
2
3
4
# 问题出在 Amazon Linux 2,所有 yum 命令卡住
ps aux | grep yum

kill -9 <pid> # kill 所有 yum 进程

其他问题

Failed to download metadata for repo ‘appstream’

1
2
3
4
5
6
7
# 保存
CentOS Linux 8 - AppStream 9.5 B/s | 38 B 00:03
Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist

# 解决
dnf --disablerepo '*' --enablerepo=extras swap centos-linux-repos centos-stream-repos -y
dnf distro-sync -y

Openstack 初始化后无法联网

1
2
3
4
5
6
7
sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0

# 添加 DNS1
DNS1=192.168.6.1

# 重启网络服务
sudo systemctl restart NetworkManager

Centos 7 安装 gcc 7

1
sudo yum install gcc72-c++

软件包

1
2
# lcov
yum install lcov.noarch

ubuntu network

静态地址

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# vim /etc/netplan/99_config.yaml
network:
version: 2
renderer: networkd
ethernets:
eth0:
addresses:
- 10.10.10.2/24 # 静态地址
gateway4: 10.10.10.1
nameservers: # 可选
search: [mydomain, otherdomain]
addresses: [10.10.10.1, 1.1.1.1]

$ sudo netplan apply

重启

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# method 1
$ sudo netplan apply

# method 2
$ sudo nmcli networking off
$ sudo nmcli networking on

# method 3
$ sudo systemctl start NetworkManager
$ sudo systemctl stop NetworkManager

# method 4
$ sudo service network-manager restart

# method 5
$ sudo ifdown -a
$ sudo ifup -a

# method 6
$ sudo systemctl restart sytemd-networking

接口

1
2
3
# up / down
ip link set dev ens7 up # up
ip link set dev ens7 down # down

路由

1
2
3
4
5
6
7
8
9
10
11
# 展示路由配置
ip route show

# 添加路由
ip route add <network_ip>/<cidr> via <gateway_ip> dev <network_card_name>
## 示例
ip route add 192.168.6.0/24 dev eth0
ip route add default

# 删除路由
ip route del default # 删除默认路由

设置 DNS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
## ubuntu 20.04+
# cat /etc/netplan/***.yaml
network:
version: 2
ethernets:
ens4:
dhcp4: true
match:
macaddress: fa:16:3e:65:2c:6b
mtu: 1450
set-name: ens4
nameservers:
addresses: [192.168.6.1,8.8.8.8] # 设置 dns

# 生效
sudo netplan apply

防火墙

1
2
3
4
5
6
# 查看防火墙状态
$ sudo ufw status
# active: 激活, inactive: 非激活

# disable
$ sudo ufw disable

Netplan

  • gateway4 弃用,使用 route 代替
1
2
3
4
5
6
# 弃用版本
gateway4: 192.168.6.1
# 新版设置
routes:
- to: default
via: 192.168.6.1

设置路由

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
network:
version: 2
ethernets:
enp0s31f6:
match:
macaddress: 6c:4b:90:85:5d:59
dhcp4: true
wakeonlan: true
set-name: enp0
nameservers:
addresses: [223.5.5.5, 8.8.8.8]
addresses: [192.168.6.8/24]
routes:
- to: default
via: 192.168.6.1
table: 100
routing-policy:
- to: 192.168.6.0/24
table: 100
priority: 100
enxf8e43b1a1229:
match:
macaddress: f8:e4:3b:1a:12:29
addresses: [192.168.9.99/24]
set-name: ext0
nameservers:
addresses: [223.5.5.5, 8.8.8.8]

wireshark

安装

1
apt install tshark

文档参考这里

使用

1
2
3
4
tshark -i <entwork-interface>

# 筛选端口
tshark -i enp0 -f "src port 30800"

Wake On Lan

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 配置
$ cat /etc/netplan/*_config.yaml
# Let NetworkManager manage all devices on this system
network:
version: 2
renderer: NetworkManager
ethernets:
enp4s0:
match:
macaddress: 2a:04:a0:46:08:38
dhcp4: true
wakeonlan: true
enp5s0:
match:
macaddress: 2a:04:a0:46:08:39
dhcp4: true
wakeonlan: true

# 生效
$ sudo netplan apply

注意

  1. 如果不生效,尝试添加 macaddress match

yarn

常用命令

查看日志

1
$ yarn logs -applicationId <application-id>

kill 任务

1
$ yarn app -kill <application-id>

查看任务

1
$ yarn top

htop

按键

F2 或者 Shift + S,对于没有 Delete 按键的键盘,尝试 Fn + Backspace

配置

修改每行展示的 cpu 数量

image-20231102111705618

调整左右列的 CPUS。

iterm2

快捷键

1
2
3
4
5
6
# 切换 tab
Command + 方向键
Command + 数字

# 切换 panel
Command + Options + 方向键

滚轮

配置,Advanced -> Scroll wheel sends arrow keys when in alternate screen mode

  • yes,滚轮事件转为箭头事件
    • 在 vi、screen、tumx 中,可以使用滚轮快速上下移动指针
  • no
    • 避免上下切换 history 而不滚动屏幕(此时只能通过滚动条滚动屏幕内容)

python random

常用

从数组中选取

1
2
3
4
5
6
7
8
9
10
11
# 定义数组
>>> l = [1, 2, 3, 4, 5, 6, 7]
# 随机选一个
>>> random.choice(l)
6
# 随机选多个,可重复
>>> random.choices(l, k=3)
[7, 4, 7]
# 随机选多个,不重复
>>> random.sample(l, 3)
[3, 5, 7]

vncserver

centos

1
2
3
4
5
6
7
8
9
10
11
# 安装 vncserver
$ sudo yum install tigervnc-server

# 设置密码
$ vncpasswd

# 拷贝配置
$ sudo cp /lib/systemd/system/vncserver@.service /etc/systemd/system/vncserver@:1.service

# 修改配置
$ sudo vim /etc/systemd/system/vncserver@\:1.service

配置

以用户 wii 为例,需要替换其中的内容。

1
2
3
4
5
6
7
8
9
10
11
12
13
[Unit]
Description=Remote desktop service (VNC)
After=syslog.target network.target

[Service]
Type=forking
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'
ExecStart=/sbin/runuser -l wii -c "/usr/bin/vncserver %i -geometry 1920x1080"
# PIDFile=/home/wii/.vnc/%H%i.pid
ExecStop=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'

[Install]
WantedBy=multi-user.target

交由 systemctl 管理

1
2
3
4
$ sudo systemctl daemon-reload         # 重新加载配置
$ sudo systemctl start vncserver@:1 # 启动 vnc server
$ sudo systemctl status vncserver@:1 # 查看状态
$ sudo systemctl enable vncserver@:1 # 开启自启动

参考

ubuntu

参考

jmeter

命令行运行

1
$ ./jmeter -n -t config.jmx -l results.jtl

参数

1
2
3
4
5
6
7
# 命令格式
jmeter -n -t test-file [-p property-file] [-l results-file] [-j log-file]

# 参数
-n non gui
-t test file
-l result file

RPC 压测

针对 RPC 需要基于 SDK 做开发,整体流程如下。

  • 基于 SDK 开发

    • 引用 jmeter 依赖包
    • 编写压测类,实现 JavaSamplerClient 接口
  • 将包及其依赖拷贝至 jmeter 的 lib/ext 目录下

  • 重新打开 jmeter

  • 创建压测项目

    • 新建线程组

      image-20210825164558728

    • 新建 java 请求

      image-20210825164643817

      image-20210825164927950

    • 下拉框选择我们的实现类

    • 参数需要在实现类的 getDefaultParameters 方法返回,但是值可以在 jmeter GUI 修改以及保存

    • 添加 view result tree

      image-20210825165136784

  • 点击开始按钮,进行压测

  • 通过 view result tree 查看结果

    image-20210825165307925

代码示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// EchoService 是已运行的 gRPC 服务
public class EchoClientExampleForJMeter implements JavaSamplerClient {
EchoService echoService;
@Override
public void setupTest(JavaSamplerContext javaSamplerContext) {
echoService = ...; // 初始化 echoService
}

@Override
public SampleResult runTest(JavaSamplerContext javaSamplerContext) {
SampleResult result = new SampleResult();
result.sampleStart();

String id = javaSamplerContext.getParameter("id");
String name = javaSamplerContext.getParameter("name");

EchoRequest request = EchoRequest.newBuilder()
.setId(id)
.setName(name)
.build();
try {
EchoResponse response = echoService.echo(request);
result.sampleEnd();
result.setSuccessful(true);
result.setResponseData(JsonFormat.printer().print(response), null);
result.setDataType(SampleResult.TEXT);
result.setResponseCode("OK");
} catch (Exception e) {
result.sampleEnd();
result.setSuccessful(false);
java.io.StringWriter stringWriter = new java.io.StringWriter();
e.printStackTrace(new java.io.PrintWriter(stringWriter));
result.setResponseData(stringWriter.toString(), null);
result.setDataType(SampleResult.TEXT);
result.setResponseCode("FAILED");

e.printStackTrace();
}

return result;
}

@Override
public void teardownTest(JavaSamplerContext javaSamplerContext) {
}

@Override
public Arguments getDefaultParameters() {
Arguments arguments = new Arguments();
arguments.addArgument("id", String.valueOf(RANDOM.nextInt(10) + 2000));
arguments.addArgument("name", "pressure" + RANDOM.nextInt(100000000) + 10000000);
return arguments;
}
}

异常

OutOfMemoryError

修改 bin/jmeter 脚本。

1
2
3
...
: "${HEAP:="-Xms4g -Xmx4g -XX:MaxMetaspaceSize=1024m"}" # 调大堆内存和 Meta 内存
...

k8s 安装

[toc]

安装

安装运行时

docker

1
2
3
4
5
6
7
8
9
10
11
12
# ubuntu
## 安装依赖
sudo apt-get install -y ca-certificates curl gnupg lsb-release
## 添加 GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
## 添加 sources repository
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
## install docker engine
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

对于新版的 k8s,使用 docker 还需要安装 cri-docker,从这里下载二进制程序,把下面的内容保存为两个文件 cri-docker.servicecri-docker.socket

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# cri-docker.service
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket

[Service]
Type=notify
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
10
11
12
13
# cri-docker.socket
[Unit]
Description=CRI Docker Socket for the API
PartOf=cri-docker.service

[Socket]
ListenStream=%t/cri-dockerd.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker

[Install]
WantedBy=sockets.target

移动文件

1
2
3
4
5
# 移动二进制
sudo mv cri-docker /usr/bin
# 移动 systemd 配置
sudo mv cri-docker.service /etc/systemd/system/
sudo mv cri-docker.socket /etc/systemd/system/

systemd 启动服务

1
2
3
4
5
6
sudo systemctl daemon-reload
sudo systemctl enable cri-docker.service
sudo systemctl enable --now cri-docker.socket

# 启动服务
sudo service cri-docker start

安装 kub*

安装 kubeadm、kubelet、kubectl,国内源参考 阿里云镜像 或者 清华开源镜像站

1
2
3
4
5
6
7
8
sudo apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
# root 用户运行
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl

拉取镜像

1
2
3
$ sudo kubeadm config images pull \
--image-repository registry.aliyuncs.com/google_containers \
--cri-socket unix:///var/run/cri-dockerd.sock

初始化

1
2
3
4
5
6
# 10.244.0.0/16 是 chennel 扩展的配置
# --apiserver-advertise-address 是 master 节点的 ip,如果是单机,即为该机器 ip 地址
$ kubeadm init --image-repository registry.aliyuncs.com/google_containers \
--pod-network-cidr 10.244.0.0/16 \
--control-plane-endpoint 10.1.0.145 \
--cri-socket unix:///var/run/cri-dockerd.sock

root 用户使用

需要配置 KUBECONFIG=/etc/kubernetes/admin.conf

1
2
3
4
root@k8s-master-1:/home/ubuntu# export KUBECONFIG=/etc/kubernetes/admin.conf
root@k8s-master-1:/home/ubuntu# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-1 Ready control-plane 5m55s v1.24.1

配置 none root 用户使用

1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

验证

1
2
3
ubuntu@k8s-master-1:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-1 Ready control-plane 4m52s v1.24.1

回滚操作

1
2
3
4
5
6
kubeadm reset [glags] 

preflight Run reset pre-flight checks
update-cluster-status Remove this node from the ClusterStatus object.
remove-etcd-member Remove a local etcd member.
cleanup-node Run cleanup node.

配置网络

这一步很关键,如不能正确配置集群网络,pod 间可能无法通讯,kubectl proxy 无法正常访问(通常表现为 pod 运行正常,但提示连接拒绝)。以 flannel 为例,首先安装 flannel。

1
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

使用工具 mk-docker-opts.sh生成网络信息,这个工具也可以使用 sudo find / -name 'mk-docker-opts.sh' 在 docker 容器中找到。

1
$ mk-docker-opts.sh -d /run/docker_opts.env -c

修改 docker service。

1
2
3
4
5
6
7
8
9
10
# root 用户执行
$ vim /lib/systemd/system/docker.service
# 添加这一行
EnvironmentFile=/run/docker_opts.env
# 修改这一行
ExecStart=/usr/bin/dockerd $DOCKER_OPTS -H fd:// ...

# 重启 docker
$ systemctl daemon-reload
$ systemctl restart docker

添加节点

对添加的节点同样需要配置网络,且不可复用其他节点的 docker_opts.env 文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 在 master 节点
$ kubeadm token create --print-join-command
...

# 在待加入的节点, 在上面生成的命令后面指定 cri socket
$ kubeadm join 10.1.0.145:6443 --token nxxcv7.gge00x97wiphualw --discovery-token-ca-cert-hash sha256:cfb324b2ee7ee548b08e38d2e6d60905e392553bf6715504e87888183a1238fd
u --cri-socket unix:///var/run/cri-dockerd.sock

# 为新节点指定 label
$ kubectl label node k8s-worker-1 node-role.kubernetes.io/worker=worker

# 验证
ubuntu@k8s-master-1:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-1 Ready control-plane 9h v1.24.1
k8s-worker-1 Ready worker 3m17s v1.24.1

安装 dashboard

1
2
3
4
5
# 安装 dashboard
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.6.0/aio/deploy/recommended.yaml

# 启动代理
$ kubectl proxy --address=0.0.0.0

创建服务账号

保存到 account.yaml

1
2
3
4
5
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard

然后运行。

1
kubectl apply -f account.yaml

设置权限

保存到 permission.yaml

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard

然后运行。

1
kubectl apply -f permission.yaml

也可以放到一个文件里面。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard

获取 token

1
kubectl -n kubernetes-dashboard create token admin-user

https 证书

k8s dashboard 默认会自己生成证书,可以跳过。对于 https 证书,可以自己生成证书,可以用证书认证服务商。对于自己生成证书,可以手动生成,也可以通过添加 --auto-generate-certificates 来自动生成,更多参数参考这里

1
2
3
4
5
6
7
8
9
10
# 自认证证书
# 生成 dashboard.pass.key
$ openssl genrsa -des3 -passout pass:over4chars -out dashboard.pass.key 2048
# 生成 dashboard.key
$ openssl rsa -passin pass:over4chars -in dashboard.pass.key -out dashboard.key
$ rm dashboard.pass.key # 可以删除了
# 生成 dashboard.csr
$ openssl req -new -key dashboard.key -out dashboard.csr # 一直回车
# 生成 dashboard.crt
$ openssl x509 -req -sha256 -days 365 -in dashboard.csr -signkey dashboard.key -out dashboard.crt

删除 dashboard

1
$ kubectl delete -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.6.0/aio/deploy/recommended.yaml

网络扩展

flannel

安装

1
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

常用命令

查看节点信息

1
2
3
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
localhost.localdomain NotReady control-plane,master 2m13s v1.22.1

查看 pod 信息

1
2
3
4
5
6
7
8
9
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-78fcd69978-9dk4n 0/1 Pending 0 2m52s
kube-system coredns-78fcd69978-w52zc 0/1 Pending 0 2m52s
kube-system etcd-localhost.localdomain 1/1 Running 0 3m6s
kube-system kube-apiserver-localhost.localdomain 1/1 Running 0 3m6s
kube-system kube-controller-manager-localhost.localdomain 1/1 Running 0 3m8s
kube-system kube-proxy-4w84n 1/1 Running 0 2m52s
kube-system kube-scheduler-localhost.localdomain 1/1 Running 0 3m6s

pod 管理

1
2
3
4
5
# 查看 pod  信息
$ kubectl get pods -A # A = all-namespaces

# 删除 pod
$ kubectl delete pod kubernetes-dashboard --namespace=kubernetes-dashboard

参考

问题

配置

k8s 配置在这里 /etc/kubernetes/kubelet.conf

找不到节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ubuntu@k8s-master-1:~$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2022-06-08 16:18:53 UTC; 2s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 69055 (kubelet)
Tasks: 29 (limit: 38495)
Memory: 39.5M
CGroup: /system.slice/kubelet.service
└─69055 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/co>

Jun 08 16:18:55 k8s-master-1 kubelet[69055]: E0608 16:18:55.158403 69055 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.1.0>
Jun 08 16:18:55 k8s-master-1 kubelet[69055]: E0608 16:18:55.193156 69055 kubelet.go:2419] "Error getting node" err="node \"k8s-master-1\" not found"
...

这里 有说,可能是 api server 不能连接,由于 cri 用了 docker,随检查 docker 状态。

node not found is a misleading error by the kubelet. at this point it means that the kubelet was unable to register a Node object with the api server.

https://nalshsvrk8ss01.railcarmgt.com:6443/healthz?timeout=10s in 0 milliseconds

this means that the api server cannot be connected. could be caused by a number of things (including firewall).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
root@k8s-master-1:~# service docker status
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2022-06-08 16:30:32 UTC; 34min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 1039 (dockerd)
Tasks: 23
Memory: 134.1M
CGroup: /system.slice/docker.service
└─1039 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Jun 08 17:04:52 k8s-master-1 dockerd[1039]: time="2022-06-08T17:04:52.874831490Z" level=error msg="Handler for POST /v1.40/images/create returned error: Get \"https://k8s.gcr.io/v2/\": context deadline exceeded"
Jun 08 17:04:58 k8s-master-1 dockerd[1039]: time="2022-06-08T17:04:58.867350258Z" level=warning msg="Error getting v2 registry: Get \"https://k8s.gcr.io/v2/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
...

果然是镜像有问题,从 k8s.gcr.io 拉取镜像失败。可以根据这里的说明,来指定自定义的镜像地址。

1
2
# 查看服务实时日志
$ journalctl -u docker -f

设置 --image-repository可以拉下来镜像,但还是会在启动 control plane 时超时。

1
2
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

老老实实挂代理吧。

1
2
3
4
5
6
7
8
9
10
11
sudo mkdir -p /etc/systemd/system/docker.service.d 
sudo touch /etc/systemd/system/docker.service.d/proxy.conf
sudo chmod 777 /etc/systemd/system/docker.service.d/proxy.conf
sudo echo '
[Service]
Environment="HTTP_PROXY=socks5://192.168.6.19:3213"
Environment="HTTPS_PROXY=socks5://192.168.6.19:3213"
' >> /etc/systemd/system/docker.service.d/proxy.conf
sudo systemctl daemon-reload
sudo systemctl restart docker
sudo systemctl restart kubelet

使用 kubectl proxy 无法访问其他节点服务

下面是访问 dashboard 的错误信息,运行时是 docker,kubernetes-dashboard 运行在另外一台 worker node 上,使用 master node 的 proxy 访问 dashboard 服务会报下面的错误。

1
2
3
4
5
6
7
8
9
10
11
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "error trying to reach service: dial tcp 172.17.0.3:8443: connect: connection refused",
"reason": "ServiceUnavailable",
"code": 503
}

原因是 docker 容器使用的网络(172.17.0.1/16)和网络扩展(用的是 flannel,10.244.0.0/32)不是统一个网络,导致无法访问。这里的 172.17.0.3 其实是 worker node 的网络地址,这是因为 proxy 容易部署早于 flannel,不在同一个网络。

1
2
3
4
5
6
# 查看 flannel 网络
$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1400
FLANNEL_IPMASQ=true

生成 docker 环境变量 DOCKER_OPTS。

1
2
3
4
5
6
7
8
# 找到 mk-docker-opts.sh,在 flannel 镜像里面;也可以下载 flannel 的二进制包找到这个脚本
$ sudo find / -name 'mk-docker-opts.sh'
/var/lib/docker/overlay2/8779d2bd83ddf0e237da15f5c0e62fd79bbf6d3868cea87ec926c471f1184774/merged/opt/bin/mk-docker-opts.sh
/var/lib/docker/overlay2/99462f1d9e955f5c40a11844119dc1e0f295208c20a696e7bea76b39324a9943/diff/opt/bin/mk-docker-opts.sh

# root 用户; 生成 docker opts
$ alias mk-docker-opts="/var/lib/docker/overlay2/99462f1d9e955f5c40a11844119dc1e0f295208c20a696e7bea76b39324a9943/diff/opt/bin/mk-docker-opts.sh"
$ mk-docker-opts -d /run/docker_opts.env -c

修改 docker service。

1
2
3
4
5
6
7
8
9
10
# root 用户执行
$ vim /lib/systemd/system/docker.service
# 添加这一行
EnvironmentFile=/run/docker_opts.env
# 修改这一行
ExecStart=/usr/bin/dockerd $DOCKER_OPTS -H fd:// ...

# 重启 docker
$ systemctl daemon-reload
$ systemctl restart docker

验证 pod ip。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# alias k8s="kubectl --namespace=default"
$ k8s get pods
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-8c47d4b5d-2bbbx 1/1 Running 0 43s
kubernetes-dashboard-59fccbc7d7-9wmn9 1/1 Running 0 43s
$ k8s describe pod kubernetes-dashboard-59fccbc7d7-9wmn9
Name: kubernetes-dashboard-59fccbc7d7-9wmn9
Namespace: default
Priority: 0
Node: k8s-worker-1/10.1.0.123
Start Time: Sat, 11 Jun 2022 07:03:10 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=59fccbc7d7
Annotations: seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP: 10.244.1.3
...

这里需要注意的是,需要在每个 node 执行上面的操作,docker_opt.env 文件不能共用。

dashboard 无法登录

最简单的方式是,网络类型设置为 NodePort,使用火狐浏览器打开。

使用 proxy 的方式打开登录界面之后,会发现无法登录,有如下提示。

1
Insecure access detected. Sign in will not be available. Access Dashboard securely over HTTPS or using localhost. Read more here .

即便添加如下参数。

1
2
3
4
5
--auto-generate-certificates
--namespace=default
--insecure-bind-address=0.0.0.0
--insecure-port=8080
--enable-insecure-login

根据 这里 说的,--enable-insecure-login 仅适用在 127.0.0.1localhost 使用 http 登录的情景,对于使用 kubectl proxy 使用 http 协议并不适用。

端口转发

1
$ kubectl port-forward -n default service/kubernetes-dashboard 18082:443 --address='0.0.0.0' 

解决方案

dashboard 的 ssl 认证有点坑,可以确认下面几点。

  • 不允许使用非 localhost127.0.0.1 地址使用 HTTP 协议访问,没有配置可以规避这个问题,所以

    • kubectl proxy 方式只能是在本机安装 k8s 时使用
    • 基本可以放弃使用 HTTP 访问了
  • localhost127.0.0.1 只能使用 HTTPS 协议访问了

所以,对于使用 HTTPS ,可以从两个方向来解决。

  • 购买证书认证商的认证服务,使用域名,并配置域名解析
  • 搭建 self-signed 站点

购买证书认证服务就不说了,记录几个可行的解决方案。

  • 端口转发
    • Firefox 可以访问
    • Chrome 不能查看证书,Safari、Chrome、Opera 不能访问
  • 搭建 nginx 服务并做自认证
    • Safari、Firefox 可以忽略风险直接访问
    • Chrome 需要先下载证书,标记信任后可访问

搭建 nginx 并配置自认证

至于搭建参考这里吧,端口类型改为 NodePort。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: default
spec:
ports:
- port: 443
targetPort: 8443
type: NodePort
selector:
k8s-app: kubernetes-dashboard

添加 nginx 配置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
upstream k8s-dashboard {
server 10.244.1.2:8443;
}

server {
# auth_basic "Auth Message";
# auth_basic_user_file /etc/apache2/.htpasswd;
client_max_body_size 100m;
listen 8443 ssl default_server;
listen [::]:8443 ssl default_server;
server_name 192.168.6.103;
include snippets/self-signed.conf;

location / {
proxy_pass https://k8s-dashboard;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

# enable websocket
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}

然后就可以使用 Chrome 访问了。

1
2
# 浏览器输入地址
https://192.168.6.103:8443/#/login