repmgr安装测试自动故障切换-postgresql12

时间:2021-08-21 20:02:49   收藏:0   阅读:23

安装

环境准备

两台PG12 (host1 host2)host1为原主库,host2为原备库。主从异步流复制。

下载最新repmgr包:repmgr-5.2.1.tar.gz (https://repmgr.org/

./configure
sudo make && sudo make install

配置

  1. 主库配置

    listen_addresses = ‘*‘
    shared_preload_libraries = ‘repmgr‘
    wal_log_hints = ‘on‘
    wal_level = replica
    hot_standby = on
    
    local repmgr repmgr trust
    host repmgr repmgr 127.0.0.1/24 trust
    host repmgr repmgr 192.168.73.140/24 trust
    host repmgr repmgr 192.168.73.141/24 trust
    host postgres repmgr 192.168.73.140/24 trust
    
    local replication repmgr trust
    host replication repmgr 127.0.0.1/24 trust
    host replication repmgr 192.168.73.140/24 trust
    host replication repmgr 192.168.73.141/24 trust
    

    [postgres@pg1 opt]$ pg_config --sysconfdir
    /opt/pgsql/etc

    sudo mkdir -pv /opt/pgsql/etc/
    
    touch   /opt/pgsql/etc/repmgr.conf
    cat >> /opt/pgsql/etc/repmgr.conf << EOF
    node_id=2
    node_name=‘pg1‘
    conninfo=‘host=192.168.73.140 port=5432 dbname=repmgr user=repmgr connect_timeout=2‘
    data_directory =‘/opt/pgsql/data‘
    pg_bindir=‘/opt/pgsql/bin‘
    config_directory=‘/opt/pgsql/data‘
    log_level=INFO
    log_facility=STDERR
    log_file=‘/opt/pgsql/etc/repmgr.log‘
    EOF
    
  2. 注册主节点

    repmgr primary register
    repmgr service status
    psql -h 192.168.73.140 -U repmgr -d repmgr -c ‘SELECT * FROM repmgr.nodes;‘
    
  3. 开启主库守护进程repmgrd

    repmgrd -d
    
  4. 从库配置

sudo mkdir -pv /opt/pgsql/etc/
touch   /opt/pgsql/etc/repmgr.conf
cat >> /opt/pgsql/etc/repmgr.conf << EOF
node_id=2
node_name=‘pg2‘
conninfo=‘host=192.168.73.141 port=5432 dbname=repmgr user=repmgr connect_timeout=2‘
data_directory =‘/opt/pgsql/data‘
pg_bindir=‘/opt/pgsql/bin‘
config_directory=‘/opt/pgsql/data‘
log_level=INFO
log_facility=STDERR
log_file=‘/opt/pgsql/etc/repmgr.log‘
EOF
  1. 克隆主库

    [postgres@pg2 etc]$ repmgr -h 192.168.73.140 -U repmgr -d repmgr -f /opt/pgsql/etc/repmgr.conf standby clone -F
    WARNING: following problems with command line parameters detected:
      "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
    NOTICE: destination directory "/opt/pgsql/data" provided
    INFO: connecting to source node
    DETAIL: connection string is: host=192.168.73.140 user=repmgr dbname=repmgr
    DETAIL: current installation size is 884 MB
    INFO: replication slot usage not requested;  no replication slot will be set up for this standby
    NOTICE: checking for available walsenders on the source node (2 required)
    NOTICE: checking replication connections can be made to the source server (2 required)
    WARNING: directory "/opt/pgsql/data" exists but is not empty
    NOTICE: -F/--force provided - deleting existing data directory "/opt/pgsql/data"
    NOTICE: starting backup (using pg_basebackup)...
    HINT: this may take some time; consider using the -c/--fast-checkpoint option
    INFO: executing:
      /opt/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /opt/pgsql/data -h 192.168.73.140 -p 5432 -U repmgr -X stream 
    NOTICE: standby clone (using pg_basebackup) complete
    NOTICE: you can now start your PostgreSQL server
    HINT: for example: pg_ctl -D /opt/pgsql/data start
    HINT: after starting the server, you need to register this standby with "repmgr standby register"
    

    6.启动备库并注册

    pg_ctl -D /opt/pgsql/data start
    
    repmgr standby register
    

    7.启动repmgr守护进程

    repmgrd -d

    8.查看从库repmgr状态

    repmgr service status
    
    [postgres@pg2 opt]$  repmgr service status
    ID | Name | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
    ----+------+---------+-----------+----------+---------+------+---------+--------------------
    1  | pg1  | primary | * running |          | running | 9603 | no      | n/a                
    2  | pg2  | standby |   running | pg1      | running | 2551 | no      | 0 second(s) ago 
    
    [postgres@pg2 opt]$ repmgr cluster show
    ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                        
    ----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------------------
    1  | pg1  | primary | * running |          | default  | 100      | 7        | host=192.168.73.140 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
    2  | pg2  | standby |   running | pg1      | default  | 100      | 7        | host=192.168.73.141 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
    
    

使用repmgr执行switch over切换

测试切换repmgr  standby switchover --siblings-follow --dry-run
正式进行主备切换 repmgr -f repmgr.conf standby switchover
主备切换后主库需要重新注册 repmgr -f repmgr.conf standby register --force
重新注册后必须重启备库

使用repmgr执行fail over切换

1.手动关闭主库模拟数据库异常: pg_ctl -m fast stop

2.备库强制提升为主库:
postgres@HWFBSB6:~> repmgr -f repmgr.conf standby promote

3.新主库查看主备库状态:repmgr -f repmgr.conf cluster show
WARNING: the following problems were found in the configuration file:
parameter “cluster” is deprecated and will be ignored
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----±--------±--------±----------±---------±---------±---------±---------±---------------------------------------
1 | HWFBSB5 | primary | - failed | | default | 100 | ? | host=HWFBSB5 user=repmgr dbname=repmgr
2 | HWFBSB6 | primary | * running | | default | 100 | 2 | host=HWFBSB6 user=repmgr dbname=repmgr

4.启动原主库:
postgres@HWFBSB5:~> pg_ctl -D /pgdata/pg113 -l /pgdata/pg113/server.log start
waiting for server to start… done
server started

5.原主库查看主备库状态:
postgres@HWFBSB5:~> repmgr -f repmgr.conf cluster show
WARNING: the following problems were found in the configuration file:
parameter “cluster” is deprecated and will be ignored
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----±--------±--------±---------------------±---------±---------±---------±---------±---------------------------------------
1 | HWFBSB5 | primary | * running | | default | 100 | 1 | host=HWFBSB5 user=repmgr dbname=repmgr
2 | HWFBSB6 | standby | ! running as primary | | default | 100 | 2 | host=HWFBSB6 user=repmgr dbname=repmgr

6.可以看到从主备库查看的状态都带有警告,这时如果要把剔除掉的原主库作为备库状态重新加入,需要repmgr node rejoin操作

? 首先关闭原主库,使用repmgr node rejoin重新添加原主库:

? 预命令

repmgr -f repmgr.conf node rejoin -d ‘host=197.0.1.105 user=repmgr dbname=repmgr’ --force-rewind --dry-run --verbose

正式执行repmgr node rejoin重新添加原主库命令:
repmgr -f repmgr.conf node rejoin -d ‘host=197.0.1.105 user=repmgr dbname=repmgr’ --force-rewind --verbose

一主一备利用repmgrd实现自动fail over

cat >> repmgr.conf << EOF
monitoring_history=true (启用监控参数)
monitor_interval_secs=5(定义监视数据间隔写入时间参数)
reconnect_attempts=10(故障转移之前,尝试重新连接主库次数(默认为6)参数)
reconnect_interval=5(每间隔5s尝试重新连接一次参数)
EOF

如下参数为自动故障转移所需配置的重要参数:

failover=automatic
promote_command=’/pgsql/pg113/bin/repmgr standby promote -f /home/postgres/repmgr.conf --log-to-file’
follow_command=’/pgsql/pg113/bin/repmgr standby follow -f /home/postgres/repmgr.conf --log-to-file --upstream-node-id=%n’

*附加参数:
replication_lag_critical(如果备用数据库的复制延迟(已秒为单位)超过此值,则将终止切换)

备:参数配置完成后需要重启主备库使其生效:

repmgr node service --action=restart

启动主备库的repmgrd:
repmgrd --pid-file /tmp/repmgrd.pid

练习环境自测

  1. 关闭主库

     repmgr cluster show
     ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                        
    ----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------------------
     1  | pg1  | standby |   running | pg2      | default  | 100      | 9        | host=192.168.73.140 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
     2  | pg2  | primary | * running |          | default  | 100      | 9        | host=192.168.73.141 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
    [postgres@pg2 data]$ 
    [postgres@pg2 data]$ pg_ctl stop
    
    
  2. 备库查看状态

    [postgres@pg1 data]$ repmgr cluster show
     ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                        
    ----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------------------
     1  | pg1  | primary | * running |          | default  | 100      | 10       | host=192.168.73.140 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
     2  | pg2  | primary | - failed  | ?        | default  | 100      |          | host=192.168.73.141 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
    
    WARNING: following issues were detected
      - unable to connect to node "pg2" (ID: 2)
    
    HINT: execute with --verbose option to see connection error messages
    
    
  3. 原主库修改timeline

    [postgres@pg2 data]$ cat postgresql.auto.conf 
    # Do not edit this file manually!
    # It will be overwritten by the ALTER SYSTEM command.
    archive_mode = ‘on‘
    archive_command = ‘cp -i %p /data/archive_wals/%f‘
    wal_keep_segments = ‘512‘
    primary_conninfo = ‘host=192.168.73.140 user=repmgr application_name=pg2 password=repmgr connect_timeout=2 port=5432‘
    recovery_target_timeline = ‘10‘
    wal_retrieve_retry_interval = ‘5000‘
    
  4. 拉起原主库,查看状态,切换成功

    [postgres@pg1 data]$ repmgr cluster show
     ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                        
    ----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------------------
     1  | pg1  | primary | * running |          | default  | 100      | 10       | host=192.168.73.140 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
     2  | pg2  | primary | ! running |          | default  | 100      | 9        | host=192.168.73.141 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
    
    WARNING: following issues were detected
      - node "pg2" (ID: 2) is running but the repmgr node record is inactive
    
    
    
    [postgres@pg2 data]$ repmgr cluster show
     ID | Name | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                                                                        
    ----+------+---------+----------------------+----------+----------+----------+----------+-------------------------------------------------------------------------------------------
     1  | pg1  | standby | ! running as primary |          | default  | 100      | 10       | host=192.168.73.140 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
     2  | pg2  | primary | * running            |          | default  | 100      | 9        | host=192.168.73.141 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
    
    WARNING: following issues were detected
      - node "pg1" (ID: 1) is registered as standby but running as primary
    
    
    [postgres@pg2 data]$ pg_ctl start
    server started
    
    [postgres@pg2 data]$ repmgr cluster show
     ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                        
    ----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------------------
     1  | pg1  | primary | * running |          | default  | 100      | 10       | host=192.168.73.140 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
     2  | pg2  | standby |   running | pg1      | default  | 100      | 9        | host=192.168.73.141 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
    [postgres@pg2 data]$ repmgr standby follow
    NOTICE: attempting to find and follow current primary
    INFO: timelines are same, this server is not ahead
    DETAIL: local node lsn is 0/C9024B00, follow target lsn is 0/C9024B00
    NOTICE: setting node 2‘s upstream to node 1
    NOTICE: stopping server using "/opt/pgsql/bin/pg_ctl  -D ‘/opt/pgsql/data‘ -w -m fast stop"
    2021-08-21 02:28:29.339 EDT [23663] LOG:  received fast shutdown request
    2021-08-21 02:28:29.340 EDT [23663] LOG:  aborting any active transactions
    2021-08-21 02:28:29.340 EDT [23672] FATAL:  terminating walreceiver process due to administrator command
    2021-08-21 02:28:29.340 EDT [23676] FATAL:  terminating connection due to administrator command
    2021-08-21 02:28:29.340 EDT [23674] FATAL:  terminating connection due to administrator command
    2021-08-21 02:28:29.341 EDT [23669] LOG:  shutting down
    2021-08-21 02:28:29.353 EDT [23663] LOG:  database system is shut down
    NOTICE: starting server using "/opt/pgsql/bin/pg_ctl  -w -D ‘/opt/pgsql/data‘ start"
    NOTICE: STANDBY FOLLOW successful
    DETAIL: standby attached to upstream node "pg1" (ID: 1)
    
    

    ?

发现原主库时间线不对:

?

[postgres@pg2 data]$ repmgr standby follow
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/C9024B00, follow target lsn is 0/C9024B00
NOTICE: setting node 2‘s upstream to node 1
NOTICE: stopping server using "/opt/pgsql/bin/pg_ctl  -D ‘/opt/pgsql/data‘ -w -m fast stop"
2021-08-21 02:28:29.339 EDT [23663] LOG:  received fast shutdown request
2021-08-21 02:28:29.340 EDT [23663] LOG:  aborting any active transactions
2021-08-21 02:28:29.340 EDT [23672] FATAL:  terminating walreceiver process due to administrator command
2021-08-21 02:28:29.340 EDT [23676] FATAL:  terminating connection due to administrator command
2021-08-21 02:28:29.340 EDT [23674] FATAL:  terminating connection due to administrator command
2021-08-21 02:28:29.341 EDT [23669] LOG:  shutting down
2021-08-21 02:28:29.353 EDT [23663] LOG:  database system is shut down
NOTICE: starting server using "/opt/pgsql/bin/pg_ctl  -w -D ‘/opt/pgsql/data‘ start"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "pg1" (ID: 1)


[postgres@pg2 data]$ repmgr cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                        
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------------------
 1  | pg1  | primary | * running |          | default  | 100      | 10       | host=192.168.73.140 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
 2  | pg2  | standby |   running | pg1      | default  | 100      | 10       | host=192.168.73.141 port=5432 dbname=repmgr user=repmgr password=repmgr connect_timeout=2

发现正常了~~

repmgr相关命令

repmgr cluster show
repmgr node status
repmgr  cluster event
repmgr cluster crosscheck

repmgr  node service --list-actions
测试切换repmgr  standby switchover --siblings-follow --dry-run
正式进行主备切换 repmgr -f repmgr.conf standby switchover
主备切换后主库需要重新注册 repmgr -f repmgr.conf standby register --force

利用repmgr switch over数据库:
repmgr -f /home/postgres/repmgr.conf standby switchover --siblings-follow --dry-run
repmgr -f/home/postgres/repmgr.conf standby switchover

利用repmgr promote备库:
repmgr -f /home/postgres/repmgr.conf standby promote

利用rpmgr rejoin备用数据库:(inactive时)
repmgr  node rejoin -d ‘host=192.168.73.141 user=repmgr dbname=repmgr‘ --force-rewind --dry-run –verbose
repmgr  node rejoin -d ‘host=192.168.73.141 user=repmgr dbname=repmgr‘ --force-rewind –verbose


repmgr primary register  安装pg的repmgr扩展并注册为主节点
repmgr primary unregister 注销不活动的主节点
repmgr standby clone 从其他节点复制数据到从节点
repmgr standby register 注册从节点(添加从的信息到repmgr元数据)
repmgr standby unregister repmgr  元数据中移除从的信息
repmgr standby promote 将从提升为主
repmgr standby follow  将从跟随新主
repmgr standby switchover 将从提升为主并将主降级为从
repmgr witness register  注册一个观察节点
repmgr witness unregister  移除一个观察节点
repmgr node status  显示节点的基本信息和复制状态
repmgr node check  从复制的角度对节点进行健康监测
repmgr node rejoin   重新加入一个失效节点到集群
repmgr cluster show  显示所有集群中注册的节点信息
repmgr cluster matrix 在所有节点运行show并汇总
repmgr cluster crosscheck  在节点间两两交叉监测连接
repmgr cluster event 输出时间记录
repmgr cluster cleanup   清理监控历史
repmgr service status  节点状态

留个坑,还差witness没有尝试。

原文:https://www.cnblogs.com/ddlearning/p/15169618.html

评论(0
© 2014 bubuko.com 版权所有 - 联系我们:wmxa8@hotmail.com
打开技术之扣,分享程序人生!