Kubernetes-v1.24版安装部署之基础环境准备

二进制安装Kubernete(k8s) v1.24.0

环境准备

主机名角色IP安装软件
k8s-master.boysec.cn代理节点10.1.1.100etcd、kueblet、kube-porxy、kube-apiserver、kube-controller-manager、kube-scheduler、Containerd
k8s-node01.boysec.cn运算节点10.1.1.120etcd、kueblet、kube-porxy、Containerd
k8s-node02.boysec.cn运算节点10.1.1.130etcd、kueblet、kube-porxy、Containerd
  • 3台vm,每台至少2g。
  • OS: CentOS 7.9
  • containerd:v1.6.4
  • kubernetes:v1.24
  • etcd:v3.3.22
  • flannel:v0.12.0
  • 证书签发工具CFSSL: V1.6.0

本次使用单master节点部署,需要多master请移步至一步步编译安装Kubernetes之master计算节点安装

安装CFSSL

CFSSL相关下载地址

1
2
3
4
wget -O /usr/bin/cfssl https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssl_1.6.0_linux_amd64
wget -O /usr/bin/cfssljson https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssljson_1.6.0_linux_amd64

chmod +x /usr/local/bin/cfssl /usr/local/bin/cfssljson

创建CA证书JSON配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
mkdir /opt/certs/ -p
cd /opt/certs/
cat > /opt/certs/ca-csr.json << EOF
{
"CN": "kubernetes-ca",
"hosts": [
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "beijing",
"L": "beijing",
"O": "system:masters",
"OU": "kubernetes"
}
],
"ca": {
"expiry": "876000h"
}
}
EOF

## 生成CA公钥和私钥文件
cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
cat > /opt/certs/ca-config.json << EOF 
{
"signing": {
"default": {
"expiry": "876000h"
},
"profiles": {
"kubernetes": {
"expiry": "876000h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
},
"etcd": {
"expiry": "876000h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
EOF

k8s基本组件安装

安装Containerd

在k8s所有节点上安装Containerd作为Runtime

1
2
3
4
5
6
7
8
cd /server/tools/
wget https://github.com/containerd/containerd/releases/download/v1.6.4/cri-containerd-cni-1.6.4-linux-amd64.tar.gz
mkdir /opt/containerd-1.6.4
tar xf cri-containerd-cni-1.6.4-linux-amd64.tar.gz -C /opt/containerd-1.6.4
cd /opt/containerd-1.6.4/
ln -s /opt/containerd-1.6.4/usr/local/bin/* /usr/local/bin/
## 服务启动文件
cp /opt/containerd-1.6.4/etc/systemd/system/containerd.service /usr/lib/systemd/system/

配置Containerd所需的模块

1
2
3
4
5
6
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
## 加载模块
systemctl restart systemd-modules-load.service

配置Containerd所需的内核

1
2
3
4
5
6
7
8
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

# 加载内核
sysctl --system

配置runc支持

containerd 被设计成嵌入到一个更大的系统中,而不是直接由开发人员或终端用户使用。当 containerd 和 runC 成为标准化容器服务的基石后,上层的应用就可以直接建立在 containerd 和 runC 之上。我们的目的就是开发一个最小化容器系统,这需要containerd和runC的支持,使得Linux kernel在启动的时候,首先启动containerd而非init,并在容器中包含系统必要组件,如shell。但是containerd安装包中runc缺少undefined symbol: seccomp_notify_respond需要单独下载安装

1
2
wget -O /usr/local/sbin/runc https://github.com/opencontainers/runc/releases/download/v1.1.2/runc.amd64
chmod +x /usr/local/sbin/runc

创建Containerd的配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml

# 1. 修改Containerd的配置文件(任选1 2)
sed -i "s#SystemdCgroup\ \=\ false#SystemdCgroup\ \=\ true#g" /etc/containerd/config.toml

cat /etc/containerd/config.toml | grep SystemdCgroup

# 2. 找到containerd.runtimes.runc.options,在其下加入SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".cni]
# 3. 添加阿里镜像源
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://l2v84zex.mirror.aliyuncs.com"] # 设置你的阿里镜像源
# 4. 将sandbox_image默认地址改为符合版本地址
sandbox_image = "kubernetes/pause" # 默认可能会被墙。

配置cni网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat > /etc/cni/net.d/10-flannel.conflist <<EOF
{
"name": "flannel",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
EOF
mkdir /opt/cni/bin -p
ln -s /opt/containerd-1.6.4/opt/cni/bin/* /opt/cni/bin/

启动并设置为开机启动

1
2
systemctl daemon-reload
systemctl enable --now containerd

配置crictl客户端连接的运行时位置

1
2
3
4
5
6
7
8
9
10
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF

#测试
systemctl restart containerd
crictl info

安装etcd集群

创建证书

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
cat > /opt/certs/etcd-csr.json << EOF
{
"CN": "etcd-peer",
"hosts": [
"10.1.1.100",
"10.1.1.110",
"10.1.1.120",
"10.1.1.130"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "beijing",
"L": "beijing",
"O": "etcd",
"OU": "Etcd Security"
}
]
}
EOF

## 生成证书
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd etcd-csr.json |cfssljson -bare etcd
# 或者
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-hostname=10.1.1.100,10.1.1.110,10.1.1.120,10.1.1.130 \
-profile=etcd \
etcd-csr.json | cfssljson -bare etcd

安装etcd

etcd下载地址

1
2
3
4
5
6
7
8
9
10
11
12
13
14
### 创建用户
useradd -s /sbin/nologin -M etcd

## 解压
cd /server/tools
tar xf etcd-v3.3.22-linux-amd64.tar.gz -C /opt
ln -s /opt/etcd-v3.3.22-linux-amd64 /opt/etcd

### 创建目录拷贝证书
mkdir -p /opt/etcd/{ssl,cfg}

### 将运维主机上生成的ca.pem、etcd-key.pem、etcd.pem拷贝到/opt/etcd/certs目录中,注意私钥文件权限600
chown etcd.etcd /opt/etcd/ssl/*
chmod 600 /opt/etcd/ssl/etcd-key.pem
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
cat > /opt/etcd/cfg/etcd.config.yml << EOF 
name: 'etcd-1'
data-dir: /var/lib/etcd
wal-dir: /var/lib/etcd/wal
snapshot-count: 5000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 0
listen-peer-urls: 'https://10.1.1.100:2380'
listen-client-urls: 'https://10.1.1.100:2379,http://127.0.0.1:2379'
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: 'https://10.1.1.100:2380'
advertise-client-urls: 'https://10.1.1.100:2379,http://127.0.0.1:2379'
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'etcd-1=https://10.1.1.100:2380,etcd-2=https://10.1.1.120:2380,etcd-3=https://10.1.1.130:2380'
initial-cluster-token: 'etcd-k8s-cluster'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
cert-file: '/opt/etcd/ssl/etcd.pem'
key-file: '/opt/etcd/ssl/etcd-key.pem'
client-cert-auth: true
trusted-ca-file: '/opt/etcd/ssl/ca.pem'
auto-tls: true
peer-transport-security:
cert-file: '/opt/etcd/ssl/etcd.pem'
key-file: '/opt/etcd/ssl/etcd-key.pem'
peer-client-cert-auth: true
trusted-ca-file: '/opt/etcd/ssl/ca.pem'
auto-tls: true
debug: false
log-package-levels:
log-outputs: [default]
force-new-cluster: false
EOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
cat > /opt/etcd/cfg/etcd.config.yml << EOF 
name: 'etcd-2'
data-dir: /var/lib/etcd
wal-dir: /var/lib/etcd/wal
snapshot-count: 5000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 0
listen-peer-urls: 'https://10.1.1.120:2380'
listen-client-urls: 'https://10.1.1.120:2379,http://127.0.0.1:2379'
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: 'https://10.1.1.120:2380'
advertise-client-urls: 'https://10.1.1.120:2379,http://127.0.0.1:2379'
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'etcd-1=https://10.1.1.100:2380,etcd-2=https://10.1.1.120:2380,etcd-3=https://10.1.1.130:2380'
initial-cluster-token: 'etcd-k8s-cluster'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
cert-file: '/opt/etcd/ssl/etcd.pem'
key-file: '/opt/etcd/ssl/etcd-key.pem'
client-cert-auth: true
trusted-ca-file: '/opt/etcd/ssl/ca.pem'
auto-tls: true
peer-transport-security:
cert-file: '/opt/etcd/ssl/etcd.pem'
key-file: '/opt/etcd/ssl/etcd-key.pem'
peer-client-cert-auth: true
trusted-ca-file: '/opt/etcd/ssl/ca.pem'
auto-tls: true
debug: false
log-package-levels:
log-outputs: [default]
force-new-cluster: false
EOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
cat > /opt/etcd/cfg/etcd.config.yml << EOF 
name: 'etcd-3'
data-dir: /var/lib/etcd
wal-dir: /var/lib/etcd/wal
snapshot-count: 5000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 0
listen-peer-urls: 'https://10.1.1.130:2380'
listen-client-urls: 'https://10.1.1.130:2379,http://127.0.0.1:2379'
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: 'https://10.1.1.130:2380'
advertise-client-urls: 'https://10.1.1.130:2379,http://127.0.0.1:2379'
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'etcd-1=https://10.1.1.100:2380,etcd-2=https://10.1.1.120:2380,etcd-3=https://10.1.1.130:2380'
initial-cluster-token: 'etcd-k8s-cluster'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
cert-file: '/opt/etcd/ssl/etcd.pem'
key-file: '/opt/etcd/ssl/etcd-key.pem'
client-cert-auth: true
trusted-ca-file: '/opt/etcd/ssl/ca.pem'
auto-tls: true
peer-transport-security:
cert-file: '/opt/etcd/ssl/etcd.pem'
key-file: '/opt/etcd/ssl/etcd-key.pem'
peer-client-cert-auth: true
trusted-ca-file: '/opt/etcd/ssl/ca.pem'
auto-tls: true
debug: false
log-package-levels:
log-outputs: [default]
force-new-cluster: false
EOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat > /usr/lib/systemd/system/etcd.service << EOF
[Unit]
Description=Etcd Service
Documentation=https://coreos.com/etcd/docs/latest/
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
ExecStart=/opt/etcd/etcd --config-file=/opt/etcd/cfg/etcd.config.yml
Restart=on-failure
RestartSec=10
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
Alias=etcd3.service
EOF

启动etcd

1
2
systemctl daemon-reload
systemctl enable --now etcd

查看etcd集群状态

1
2
3
4
5
6
7
8
9
export ETCDCTL_API=3
/opt/etcd/etcdctl --endpoints="10.1.1.100:2379,10.1.1.120:2379,10.1.1.130:2379" --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl//etcd.pem --key=/opt/etcd/ssl/etcd-key.pem endpoint status --write-out=table
+-----------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------+------------------+---------+---------+-----------+-----------+------------+
| 10.1.1.100:2379 | 4988e076821369e3 | 3.3.22 | 20 kB | true | 86 | 9 |
| 10.1.1.120:2379 | 2612ebaf51b393a5 | 3.3.22 | 20 kB | false | 86 | 9 |
| 10.1.1.130:2379 | 8de0ef816eba4013 | 3.3.22 | 20 kB | false | 86 | 9 |
+-----------------+------------------+---------+---------+-----------+-----------+------------+

etcd常见报错

问题背景:
当前部署了 3 个 etcd 节点,突然有一天 3 台集群全部停电宕机了。重新启动之后发现 K8S 集群是可以正常使用的,但是检查了一遍组件之后,发现有一个节点的 etcd 启动不了。
经过一遍探查,发现时间不准确,通过以下命令 ntpdate ntp.aliyun.com 重新将时间调整正确,重新启动 etcd,发现还是起不来,报错如下:

1
2
3
4
5
Jun 26 05:38:12 moban etcd: listening for peers on https://10.1.1.120:2380
Jun 26 05:38:12 moban etcd: ignoring client auto TLS since certs given
Jun 26 05:38:12 moban etcd: pprof is enabled under /debug/pprof
Jun 26 05:38:12 moban etcd: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
Jun 26 05:38:12 moban etcd: The scheme of client url http://127.0.0.1:2379 is HTTP while client cert auth (--client-cert-auth) is enabled. Ignored client cert auth for this url.

解决方法:
检查日志发现并没有特别明显的错误,根据经验来讲,etcd 节点坏掉一个其实对集群没有大的影响,这时集群已经可以正常使用了,但是这个坏掉的 etcd 节点并没有启动,解决方法如下:
进入 etcd 的数据存储目录进行备份 备份原有数据:
cd /var/lib/etcd/member/
cp * /data/bak/
删除这个目录下的所有数据文件
rm -rf /var/lib/etcd/default.etcd/member/*
停止另外两台 etcd 节点,因为 etcd 节点启动时需要所有节点一起启动,启动成功后即可使用。