如何基于Percona Operator for PostgreSQL部署备用/临时集群
拥有备用集群可确保最大的数据可用性和灾难恢复解决方案。在本博客文章中,我们将介绍如何使用流复制设置备用集群,以及如何创建使用远程pgBackRest仓库的临时/备用集群。源集群和目标集群可以部署在不同的命名空间、区域或数据中心,彼此之间没有依赖关系。
让我们深入了解以下每个过程。
使用流复制构建备用集群
- 以下是已经设置并运行的主/主集群
1
2
3
4
5
6
7
|
kubectl get pods -n postgres-operator
NAME READY STATUS RESTARTS AGE
cluster1-backup-wffk-9lbcf 0/1 Completed 0 2d22h
cluster1-instance1-wltm-0 4/4 Running 1 (6h39m ago) 22h
cluster1-pgbouncer-556659fb94-szvjt 2/2 Running 0 3d21h
cluster1-repo-host-0 2/2 Running 0 2d22h
percona-postgresql-operator-6746bff4c7-729z5 1/1 Running 3 (11h ago) 3d21h
|
为了让备用集群连接到主集群,我们需要在[cr.yaml]文件的以下部分暴露服务:
1
2
3
4
5
6
7
8
9
10
11
|
image: docker.io/percona/percona-postgresql-operator:2.7.0-ppg17.5.2-postgres
imagePullPolicy: Always
postgresVersion: 17
# port: 5432
expose:
# annotations:
# my-annotation: value1
# labels:
# my-label: value2
type: clusterIP
|
1
|
kubectl apply -f cr.yaml -n postgres-operator
|
1
2
3
4
5
6
7
8
|
kubectl get services -n postgres-operator
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cluster1-ha ClusterIP 10.43.101.40 <none> 5432/TCP 2d15h
cluster1-ha-config ClusterIP None <none> <none> 2d15h
cluster1-pgbouncer ClusterIP 10.43.149.182 <none> 5432/TCP 2d15h
cluster1-pods ClusterIP None <none> <none> 2d15h
cluster1-primary ClusterIP None <none> 5432/TCP 2d15h
cluster1-replicas ClusterIP 10.43.85.169 <none> 5432/TCP 2d15h
|
以下确切的端点详细信息将稍后在备用集群配置中使用:
1
|
<service-name>.<namespace>.svc.cluster.local
|
例如:
1
|
cluster1-ha.postgres-operator.svc.cluster.local
|
- 接下来,我们需要确保已从主集群复制所有证书,并在备用集群上部署相同的证书,该备用集群设置在不同的命名空间[postgres-operator2]下:
1
2
3
|
kubectl get secret cluster1-cluster-ca-cert -n postgres-operator -o yaml > backup-cluster1-cluster-ca-cert.yaml
kubectl get secret cluster1-cluster-cert -n postgres-operator -o yaml > backup-cluster1-cluster-cert.yaml
kubectl get secret cluster1-replication-cert -n postgres-operator -o yaml > backup-cluster1-replication-cert.yaml
|
在备份后从新设置/备用集群中删除旧证书(如果需要):
1
2
3
|
kubectl get secret cluster1-cluster-ca-cert -n postgres-operator2 -o yaml > backup-cluster1-cluster-ca-cert.yaml
kubectl get secret cluster1-cluster-cert -n postgres-operator2 -o yaml > backup-cluster1-cluster-cert.yaml
kubectl get secret cluster1-replication-cert -n postgres-operator2 -o yaml > backup-cluster1-replication-cert.yaml
|
1
2
3
|
kubectl delete secret cluster1-cluster-ca-cert -n postgres-operator2
kubectl delete secret cluster1-cluster-cert -n postgres-operator2
kubectl delete secret cluster1-replication-cert -n postgres-operator2
|
在应用新的密钥更改之前,确保根据新集群更改命名空间[postgres-operator2]:
1
2
3
|
kubectl apply -f backup-cluster1-cluster-ca-cert.yaml -n postgres-operator2
kubectl apply -f backup-cluster1-cluster-cert.yaml -n postgres-operator2
kubectl apply -f backup-cluster1-replication-cert.yaml -n postgres-operator2
|
- 如果我们将证书名称更改为任何不同的命名,则需要在备用[cr.yaml]文件中相应执行更改,并在那里重新应用更改:
1
2
3
4
5
6
7
8
9
10
11
12
|
secrets:
# customRootCATLSSecret:
# name: cluster1-ca-cert
# items:
# - key: "tls.crt"
# path: "root.crt"
# - key: "tls.key"
# path: "root.key"
customTLSSecret:
name: cluster1-cert
customReplicationTLSSecret:
name: replication1-cert
|
此外,我们还需要启用备用选项,并在备用[cr.yaml]中添加主端点详细信息:
1
2
3
|
standby:
enabled: true
host: cluster1-ha.postgres-operator.svc.cluster.local
|
- 最后,我们可以部署修改后的更改:
1
|
kubectl apply -f deploy/cr.yaml -n pg-operator
|
如果更改未反映,请确保删除pod和关联的pvc:
1
2
|
kubectl delete pvc cluster1-instance1-ft6m-pgdata -n postgres-operator2
kubectl delete pod cluster1-instance1-ft6m-0 -n postgres-operator2
|
- 在备用集群上验证更改
主/领导集群:
1
2
3
4
5
6
7
|
kubectl exec -it cluster1-instance1-wltm-0 -n postgres-operator -- sh
hello=# dt
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | h1 | table | postgres
(1 rows)
|
备用集群:
1
2
3
4
5
6
7
|
kubectl exec -it cluster1-instance1-ft6m-0 -n postgres-operator2 -- sh
hello=# dt
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | h1 | table | postgres
(1 rows)
|
1
2
3
4
5
6
7
|
sh-5.1$ patronictl list
+ Cluster: cluster1-ha (7569663519331602522) -------------------------+----------------+---------------------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------------------------+-----------------------------------------+----------------+---------------------+----+-----------+
| cluster1-instance1-ft6m-0 | cluster1-instance1-ft6m-0.cluster1-pods | Standby Leader | in archive recovery | 6 | |
+---------------------------+-----------------------------------------+----------------+---------------------+----+-----------+
sh-5.1$
|
使用pgBackRest仓库构建备用/临时集群
- 考虑以下备用集群:
1
2
3
4
5
6
|
kubectl get pods -n postgres-operator2
NAME READY STATUS RESTARTS AGE
cluster1-instance1-ft6m-0 4/4 Running 0 36h
cluster1-pgbouncer-556659fb94-qk2ng 2/2 Running 0 2d15h
cluster1-repo-host-0 2/2 Running 0 2d15h
percona-postgresql-operator-6746bff4c7-w7l9h 1/1 Running 0 3d11h
|
- 接下来,我们需要在密钥文件中设置我们的bucket/S3凭据:
1
2
3
4
5
|
cat <<EOF | base64 -b 0
[global]
repo1-s3-key=minioadmin
repo1-s3-key-secret=minioadmin
EOF
|
输出:
1
|
W2dsb2JhbF0KcmVwbzEtczMta2V5PW1pbmlvYWRtaW4KcmVwbzEtczMta2V5LXNlY3JldD1taW5pb2FkbWluCg==
|
1
2
3
4
5
6
7
|
apiVersion: v1
kind: Secret
metadata:
name: cluster1-pgbackrest-secrets
type: Opaque
data:
s3.conf: W2dsb2JhbF0KcmVwbzEtczMta2V5PW1pbmlvYWRtaW4KcmVwbzEtczMta2V5LXNlY3JldD1taW5pb2FkbWluCg==
|
1
|
kubectl apply -f deploy/cluster1-pgbackrest-secrets.yaml -n postgres-operator2
|
注意 - 对于配置其他存储类型(如GCB、ABS等),请参考手册:https://docs.percona.com/percona-operator-for-postgresql/2.0/backups-storage.html#__tabbed_1_3
- 一旦部署了密钥文件,我们需要在[cr.yaml]文件的pgBackRest备份部分添加远程bucket/端点详细信息以及上述密钥[cluster1-pgbackrest-secrets]。远程S3仓库中存储的备份由具有类似pgBackRest配置的主主集群节点启动。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
|
backups:
# trackLatestRestorableTime: true
pgbackrest:
# metadata:
# labels:
image: docker.io/percona/percona-pgbackrest:2.55.0
# initContainer:
# image: docker.io/percona/percona-postgresql-operator:2.7.0
# resources:
# limits:
# cpu: 2.0
# memory: 4Gi
# requests:
# cpu: 1.0
# memory: 3Gi
# containerSecurityContext:
# runAsUser: 1001
# runAsGroup: 1001
# runAsNonRoot: true
# privileged: false
# allowPrivilegeEscalation: false
# readOnlyRootFilesystem: true
# capabilities:
# add:
# - NET_ADMIN
# - SYS_TIME
# drop:
# - ALL
# seccompProfile:
# type: Localhost
# localhostProfile: localhost/profile.json
# procMount: Default
# seLinuxOptions:
# type: spc_t
# level: s0:c123,c456
# containers:
# pgbackrest:
# resources:
# limits:
# cpu: 200m
# memory: 128Mi
# requests:
# cpu: 150m
# memory: 120Mi
# pgbackrestConfig:
# resources:
# limits:
# cpu: 200m
# memory: 128Mi
# requests:
# cpu: 150m
# memory: 120Mi
#
configuration:
- secret:
name: cluster1-pgbackrest-secrets
# jobs:
# restartPolicy: OnFailure
# backoffLimit: 2
# priorityClassName: high-priority
# ttlSecondsAfterFinished: 60
# resources:
# limits:
# cpu: 200m
# memory: 128Mi
# requests:
# cpu: 150m
# memory: 120Mi
# tolerations:
# - effect: NoSchedule
# key: role
# operator: Equal
# value: connection-poolers
#
# securityContext:
# fsGroup: 1001
# runAsUser: 1001
# runAsNonRoot: true
# fsGroupChangePolicy: "OnRootMismatch"
# runAsGroup: 1001
# seLinuxOptions:
# type: spc_t
# level: s0:c123,c456
# seccompProfile:
# type: Localhost
# localhostProfile: localhost/profile.json
# supplementalGroups:
# - 1001
# sysctls:
# - name: net.ipv4.tcp_keepalive_time
# value: "600"
# - name: net.ipv4.tcp_keepalive_intvl
# value: "60"
#
global:
# repo1-retention-full: "14"
# repo1-retention-full-type: time
repo1-path: /pgbackrest/postgres-operator/cluster1/repo1
# repo1-cipher-type: aes-256-cbc
repo1-s3-uri-style: path
repo1-s3-verify-tls: 'n'
# repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2
# repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3
# repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4
repoHost:
# resources:
# limits:
# cpu: 200m
# memory: 128Mi
# requests:
# cpu: 150m
# memory: 120Mi
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/data: pgbackrest
topologyKey: kubernetes.io/hostname
# tolerations:
# - effect: NoSchedule
# key: role
# operator: Equal
# value: connection-poolers
# priorityClassName: high-priority
#
# topologySpreadConstraints:
# - maxSkew: 1
# topologyKey: my-node-label
# whenUnsatisfiable: ScheduleAnyway
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/pgbackrest: ""
#
# securityContext:
# fsGroup: 1001
# runAsUser: 1001
# runAsNonRoot: true
# fsGroupChangePolicy: "OnRootMismatch"
# runAsGroup: 1001
# seLinuxOptions:
# type: spc_t
# level: s0:c123,c456
# seccompProfile:
# type: Localhost
# localhostProfile: localhost/profile.json
# supplementalGroups:
# - 1001
# sysctls:
# - name: net.ipv4.tcp_keepalive_time
# value: "600"
# - name: net.ipv4.tcp_keepalive_intvl
# value: "60"
#
manual:
repoName: repo1
options:
- --type=full
# initialDelaySeconds: 120
repos:
# - name: repo1
# schedules:
# full: "0 0 * * 6"
# differential: "0 1 * * 1-6"
# incremental: "0 1 * * 1-6"
# volume:
# volumeClaimSpec:
# storageClassName: standard
# accessModes:
# - ReadWriteOnce
# resources:
# requests:
# storage: 1Gi
- name: repo1
s3:
bucket: "ajtest"
endpoint: "https://host.k3d.internal:9000"
region: "us-east-1"
|
- 此外,在[cr.yaml]文件中启用备用并提及目标仓库名称:
1
2
3
|
standby:
enabled: true
repoName: repo1
|
最后,我们可以应用修改:
1
|
kubectl apply -f deploy/cr.yaml -n postgres-operator2
|
- 验证数据同步
现有的pgBackRest备份现在也会在备用端列出:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
kubectl exec -it cluster1-repo-host-0 -n postgres-operator -- sh
sh-5.1$ pgbackrest info
stanza: db
status: ok
cipher: none
db (current)
wal archive min/max (17): 00000002000000000000000B/000000060000000000000022
full backup: 20251107-164421F
timestamp start/stop: 2025-11-07 16:44:21+00 / 2025-11-07 16:44:24+00
wal start/stop: 00000002000000000000000C / 00000002000000000000000C
database size: 30.7MB, database backup size: 30.7MB
repo1: backup set size: 4MB, backup size: 4MB
full backup: 20251107-165613F
timestamp start/stop: 2025-11-07 16:56:13+00 / 2025-11-07 16:56:17+00
wal start/stop: 000000020000000000000013 / 000000020000000000000013
database size: 38.3MB, database backup size: 38.3MB
repo1: backup set size: 5MB, backup size: 5MB
full backup: 20251111-070032F
timestamp start/stop: 2025-11-11 07:00:32+00 / 2025-11-11 07:00:35+00
wal start/stop: 000000060000000000000025 / 000000060000000000000026
database size: 38.8MB, database backup size: 38.8MB
repo1: backup set size: 5.1MB, backup size: 5.1MB
|
此外,如果我们访问备用数据库,数据同步将反映在那里:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
kubectl exec -it cluster1-instance1-ft6m-0 -n postgres-operator2 -- sh
sh-5.1$ psql
psql (17.5 - Percona Server for PostgreSQL 17.5.2)
Type "help" for help.
postgres=# c hello
You are now connected to database "hello" as user "postgres".
hello=# dt
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | h1 | table | postgres
(1 rows)
|
如果更改未反映,请尝试删除旧的pod/pvc:
1
2
|
kubectl delete pod <pod_name> -n <namespace>
kubectl delete pvc <pvc_name> -n <namespace>
|
总结
我们上面讨论的程序基本上概述了在基于k8s/Percona operator的环境中从源主集群部署新的独立/备用集群的几种方法。这还提供了灵活性,既可以服务于具有连续数据流的目的,也可以仅构建具有精确数据集的一次性集群。