Corosync+Pacemaker+NFS+Httpd高可用Web集群部署
(2018-08-10 23:30:42)
标签:
pcs命令使用pcs配置参数corosync配置web高可httpdnfspcs |
Corosync+pacemaker+NFS+Httpd高可用Web集群部署(使用资源管理工具pcs配置)
框架:pcs(Corosync+pacemaker)+nfs+httpd
集群节点1:192.168.88.132
集群节点2:192.168.88.133
集群节点3:192.168.88.134
vip:
192.168.88.188
nfs服务器:node1.field.com
web服务器:cen7.field.com
配置集群的前提
(1)、时间同步;
(2)、.基于当前正在使用的主机名互相访问;
(3)、是否会用到仲裁设备;
配置集群的前提:参考《使用资源管理工具pcs配置Corosync+pacemaker》
http://blog.sina.com.cn/s/blog_b8918f7b0102ybf1.html
配置nfs服务器:参考《Corosync+pacemaker+nfs+web高可用服务器(使用资源管理工具crmsh)》
各节点安装配置httpd:参考《Corosync+pacemaker+nfs+web高可用服务器(使用资源管理工具crmsh)》: http://blog.sina.com.cn/s/blog_b8918f7b0102yc8h.html
此处不做赘述
一、启动corosync和pacemaker
1、使用crm交互模式:删除此前定义的资源
[root@cen7 corosync]# crm configure
crm(live)configure# edit
ERROR: Cannot delete running resources: webip, webserver, webstore
Edit or discard changes (yes to edit, no to discard) (y/n)? n
该报错显示:不能删除正在运行的资源,删除资源之前需要先停止资源
# resource资源管理子命令:所有的资源都在这个子命令后定义
crm(live)configure# cd
crm(live)# cd resource
crm(live)resource# help
#命令用法可以使用“help [command]”查看
crm(live)resource# help stop
Usage:
stop [ ...]
#关闭此前定义的资源:webip、webserver、webstore
crm(live)resource# stop webip webserver webstore
#删除此前定义的资源:delete [ ...]
crm(live)resource# cd ../configure
crm(live)configure# delete webip webserver webstore
INFO: hanging order:webserver_after_webstore deleted
INFO: constraint colocation:webserver_with_webstore_and_webip updated
INFO: hanging order:webstore_after_webip deleted
INFO: hanging colocation:webserver_with_webstore_and_webip deleted
INFO: hanging location:webservice_prefer_cen7 deleted
crm(live)configure# verify
ERROR: Warnings found during check: config may not be valid
注意:如果有无意义的配置会报如上error,使用show查看配置,直接使用edit编辑删除无效配置:wq保存退出即可。
crm(live)configure# show
node 1: cen7.field.com \
node 2: node2.field.com
node 3: node1.field.com
property cib-bootstrap-options: \
crm(live)configure# commit
2、确认pcs状态信息
[root@cen7 corosync]# pcs status
Cluster name:
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu
Aug
Last change: Thu
Aug
3 nodes configured
0 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
No resources
Daemon Status:
3、查看配置文件:备份使用crmsh管理时的corosync.conf文件
[root@cen7 corosync]# ls
authkey
[root@cen7 corosync]# cp corosync.conf corosync.conf.bak080215
停止pacemaker和corosync服务
[root@cen7 corosync]# ansible hacluster -m service -a 'name=pacemaker state=stopped'
[root@cen7 corosync]# ansible hacluster -m service -a 'name=corosync state=stopped'
二、安装pcs配置集群管理工具
1、安装pcs
[root@cen7 corosync]# ansible hacluster -m yum -a 'name=pcs state=latest'
2、配置pcs
1)、修改pcs安装时默认用户hacluster密码,两个节点密码设置一致。
[root@cen7 corosync]# ansible hacluster -m shell -a 'echo hacluster |passwd --stdin hacluster'
192.168.88.133 | SUCCESS | rc=0 >>
更改用户 hacluster 的密码 。
passwd:所有的身份验证令牌已经成功更新。
192.168.88.134 | SUCCESS | rc=0 >>
更改用户 hacluster 的密码 。
passwd:所有的身份验证令牌已经成功更新。
192.168.88.132 | SUCCESS | rc=0 >>
更改用户 hacluster 的密码 。
passwd:所有的身份验证令牌已经成功更新。
2)、集群节点间认证
注意:认证时用户必须是pcs安装时的默认用户hacluster,否则无法认证通过
[root@cen7 corosync]# pcs cluster auth cen7.field.com node1.field.com node2.field.com -u hacluster
Password:
Error: Unable to communicate with node1.field.com
Error: Unable to communicate with cen7.field.com
Error: Unable to communicate with node2.field.com
此处认证不通过,逐一排查原因,可能是:iptables、setLinux、Firewall,centos7默认开启Firewall
此处全部认证不通过,原因是未启动pcsd服务
[root@cen7 corosync]#
firewall-cmd
not running
[root@cen7 corosync]# ansible hacluster -m service -a 'name=pcsd state=started enabled=yes'
[root@cen7 corosync]# pcs cluster auth cen7.field.com node1.field.com node2.field.com -u hacluster
Password:
node1.field.com: Authorized
cen7.field.com: Authorized
node2.field.com: Authorized
3)、创建并启动集群
[root@cen7 corosync]# pcs cluster setup --name=mycluster cen7.field.com node1.field.com node2.field.com
Error: cen7.field.com: node is already in a cluster -->提示此前节点已经定义过集群,使用--force强制创建为新集群
Error: node1.field.com: node is already in a cluster
Error: node2.field.com: node is already in a cluster
Error: nodes availability check failed, use --force to override. WARNING: This will destroy existing cluster on the nodes.
[root@cen7 corosync]# pcs cluster setup --name=mycluster cen7.field.com node1.field.com node2.field.com --force
Destroying cluster on nodes: cen7.field.com, node1.field.com, node2.field.com...
node1.field.com: Stopping Cluster (pacemaker)...
node2.field.com: Stopping Cluster (pacemaker)...
cen7.field.com: Stopping Cluster (pacemaker)...
node2.field.com: Successfully destroyed cluster
node1.field.com: Successfully destroyed cluster
cen7.field.com: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'cen7.field.com', 'node1.field.com', 'node2.field.com'
node1.field.com: successful distribution of the file 'pacemaker_remote authkey'
node2.field.com: successful distribution of the file 'pacemaker_remote authkey'
cen7.field.com: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
cen7.field.com: Succeeded
node1.field.com: Succeeded
node2.field.com: Succeeded
Synchronizing pcsd certificates on nodes cen7.field.com, node1.field.com, node2.field.com...
node1.field.com: Success
cen7.field.com: Success
node2.field.com: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1.field.com: Success
cen7.field.com: Success
node2.field.com: Success
#由以上信息,可以看到已经配置成功
4)、查看pcs状态:显示当前节点无运行集群
[root@cen7 corosync]# pcs status
Error: cluster is not currently running on this node
三、配置Corosync
1、编辑Corosync配置文件并启动启动集群
[root@cen7 corosync]# ls
corosync.conf
[root@cen7 corosync]# vim corosync.conf
[root@cen7 corosync]# grep -v '^[[:space:]]*#' corosync.conf
totem {
}
nodelist {
}
quorum {
}
logging {
}
配置各节点
各节点Corosync配置文件和认证密钥文件相同:只需将corosync.conf和authkey复制至node2、node1节点即可完成配置
[root@cen7 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/
#scp -p 保留原有权限
authkey
corosync.conf
[root@cen7 corosync]# scp
-p
authkey
corosync.conf
2、启动所有集群节点并确认状态
[root@cen7 corosync]# pcs cluster start --all
cen7.field.com: Starting Cluster...
node1.field.com: Starting Cluster...
node2.field.com: Starting Cluster...
1)、使用“corosync-cfgtool -s”查看集群节点信息
可以看到:节点192.168.88.133/132/134均已启动成功, 状态为:ring 0 active with no faults
[root@cen7 corosync]# ansible hacluster -m shell -a 'corosync-cfgtool -s'
192.168.88.133 | SUCCESS | rc=0 >>
Printing ring status.
Local node ID 3
RING ID 0
192.168.88.134 | SUCCESS | rc=0 >>
Printing ring status.
Local node ID 2
RING ID 0
192.168.88.132 | SUCCESS | rc=0 >>
Printing ring status.
Local node ID 1
RING ID 0
2)、使用“
[root@cen7 corosync]# corosync-cmapctl |grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.88.132)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.88.134)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(192.168.88.133)
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.3.status (str) = joined
3)、查看集群节点状态
[root@cen7 corosync]# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu
Aug
Last change: Thu
Aug
3 nodes configured
0 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
No resources
Daemon Status:
4)、“crm_verify -L -V”命令查看配置正确与否
[root@cen7 corosync]# crm_verify -L -V
Errors found during check: config not valid
以上错误是因为默认开启stonish,配置集群的工作属性,使用以下命令禁用stonith,可消除该错误。
[root@cen7 corosync]# pcs property set stonith-enabled=false
[root@cen7 corosync]# crm_verify -L -V
四、使用pcs配置Corosync+pacemaker+nfs+httpd高可用集群
1、定义集群资源和约束
1)、定义VIP资源:
Pcs常见资源命名说明:
pcs resource create :创建资源命令
pcs resource delete
:创删除资源命令
op:指定参数
monitor 定义资源监控选项:interval(检测间隔时间)timeout(监控超时时间)
以下命令表示配置虚拟IP地址“192.168.88.188”,检测间隔时间为20s,超时时长为,10s
[root@cen7 corosync]# pcs resource create webip ocf:heartbeat:IPaddr ip="192.168.88.188" op monitor interval=20s timeout=10s
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu
Aug
Last change: Thu
Aug
3 nodes configured
1 resource configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
#可以看到:webip资源已经定义成功并且启动在cen7节点上
Daemon Status:
[root@cen7 corosync]# pcs resource delete webip
[root@cen7 corosync]# pcs
resource create webip ocf:heartbeat:IPaddr
2)、定义nfs文件系统资源:文件系统资源属于ocf资源代理
文件系统资源中,必须指明的参数有device(设备)、directory(挂载目录)、fstype(文件系统类型)
文件系统资源常见参数说明:
start启动资源选项:interval(间隔时间)、timeout(超时时间)
stop停止资源选项:interval(间隔时间)、timeout(超时时间)
monitor资源监控选项:interval(间隔时间)timeout(监控超时时间)
以下命令表示创建nfs文件系统资源:
将"192.168.88.134:/www/hadocs"nfs共享目录挂载到/var/www/html/目录下,启动超时时间60s,关闭超时时间60s,监控检测间隔时间为20s,超时时长为40s
[root@cen7 corosync]# pcs resource create webstore ocf:heartbeat:Filesystem device="192.168.88.134:/www/hadocs" directory="/var/www/html/"
fstype="nfs" op start timeout=60s op stop timeout=60s op monitor interval=20s timeout=40s
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu
Aug
Last change: Thu
Aug
3 nodes configured
2 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
#可以看到:webstore资源已经定义成功并且启动在node1节点上
Daemon Status:
3)、定义httpd资源
以下命令表示定义httpd资源,监控检测间隔时间为30s,超时时长为20s
[root@cen7 corosync]# pcs resource create webserver systemd:httpd op monitor interval=30s timeout=20s
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu
Aug
Last change: Thu
Aug
3 nodes configured
3 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
#可以看到:webserver资源已经定义成功并且启动在node2节点上
Daemon Status:
4)、定义组资源:组中资源会在同一节点运行
以下命令表示创建webservice组,包含webip、webstore、webserver资源
[root@cen7 corosync]# pcs resource group add webservice webip webstore webserver
2、确认高可用集群配置成功与否及其高可用性
1)、查看集群状态:
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu
Aug
Last change: Thu
Aug
3 nodes configured
3 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
#定义组资源后,所有资源都启动在cen7节点上
Daemon Status:
2)、手动使当前节点转为备用状态,验证资源流转
命令说明:
pcs cluster standby :可以将当前节点转为备用状态
pcs cluster unstandby :可以将当前节点恢复为在线状态
[root@cen7 corosync]# pcs cluster standby cen7.field.com
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu
Aug
Last change: Thu
Aug
3 nodes configured
3 resources configured
Node cen7.field.com: standby
Online: [ node1.field.com node2.field.com ]
Full list of resources:
#可以看到:资源成功流转到了node1节点
Daemon Status:
3)、确认VIP是否流转成功:node1上已经启动vip:192.168.88.188
[root@cen7 corosync]# ansible 192.168.88.134 -m shell -a 'ip addr list |grep ens'
192.168.88.134 | SUCCESS | rc=0 >>
2: ens34: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
4)、确认网站页面访问是否能正常:各节点均能成功访问
[root@cen7 corosync]# ansible hacluster -m shell -a 'curl 192.168.88.188 warn=False'
192.168.88.133 | SUCCESS | rc=0 >>
hacluster page on NFS Service
100
192.168.88.134 | SUCCESS | rc=0 >>
hacluster page on NFS Service
100
192.168.88.132 | SUCCESS | rc=0 >>
hacluster page on NFS Service
100
5)、将node1节点转为备用状态,验证资源流转
[root@cen7 corosync]# pcs cluster standby node1.field.com
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu
Aug
Last change: Thu
Aug
3 nodes configured
3 resources configured
Node cen7.field.com: standby
Node node1.field.com: standby
Online: [ node2.field.com ]
Full list of resources:
#成功流转到了node2节点上
Daemon Status:
6)、恢复节点为在线状态
[root@cen7 corosync]# pcs cluster unstandby node1.field.com
[root@cen7 corosync]# pcs cluster unstandby cen7.field.com
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu
Aug
Last change: Thu
Aug
3 nodes configured
3 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
Daemon Status:
3、定义资源约束
资源约束常见命令说明:
pcs constraint:定义/查看资源约束pcs constraint --full
pcs constraint colocation:配置资源捆绑关系:指定哪些资源捆绑一起,在同一节点上运行
pcs constraint order :配置资源启动顺序:指定排列约束中的资源启动顺序
pcs constraint location:配置资源位置约束:指定资源首选在哪些节点上运行,由节点黏性值绝对优先级
1)、指定资源首选在哪些节点上运行:定义单个节点对资源黏性值
以下命令表示webservice对节点cen7的黏性值为100
[root@cen7 corosync]# pcs constraint location add webservice_prefer_cen7 webservice cen7.field.com 100
查看集群状态:
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu
Aug
Last change: Thu
Aug
3 nodes configured
3 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
#定义了资源倾向性后所有资源都启动在了cen7节点上
Daemon Status:
2)、定义集群节点默认黏性值
pcs property
list:查看集群属性
[root@cen7 corosync]# pcs property list --all | grep default
以下命令定义各节点黏性值为0
[root@cen7 corosync]# pcs property set default-resource-stickiness=0
[root@cen7 corosync]# pcs property list --all | grep default-resource-stickiness
[root@cen7 corosync]# ip addr list| grep ens
2: ens32: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
[root@cen7 corosync]# curl 192.168.88.188
hacluster page on NFS Service

加载中…