
1.1.服务器配置说明
在这里使用的是两台完全一样的HP服务器,每台都有3块网卡、在linux下分别是eth0、eth1、eth2。操作系统是RHEL 4.0 32位Linux。
| 服务器一(主) | 服务器二(务) | |
| 名称 | jmvs8 | jmvs7 |
| eth0 | 192.168.100.3 | 192.168.100.4 |
| eth1 | 192.167.0.3 | 192.167.0.4 |
| eht2 | 192.168.105.3 | 192.168.105.4 |
| 模拟的IP | 192.168.100.2 | |
1.2.安装
(1)安装libnet,这是编译Heartbeat HA需要的:
tar xzvf libnet.tar.gz
cd libnet
ls
./configure
make
make install
(2)编译安装Heartbeat HA
groupadd haclient
useradd hacluster -g haclient
tar xzvf heartbeat-2.0.3.tar.gz
ls
cd heartbeat-2.0.3
ls
./ConfigureMe configure
make
make install
1.3.配置
服务配置在这里要配置四个文件,分别是authkeys、ha.cf、haresources、hosts
下面针对这几个文件分别说明。
注:可直接将authkeys、ha.cf、haresources拷到机器上,修改ha.cf里机器对应的机器名和心跳地址,修改haresources里机器对应的虚拟地址即可用,并修改hosts文件即可用。
1.3.1.authkeys
在这里使用的是CRC认证方式。灰色背景部分是配置的内容。
auth 2
2 crc
除此之外,还可以使用md5和sha1认让方式。
两个文件在这里配置是完全一样的。
文件所在:/etc/ha.d/authkeys
补充:linuxer_jlu:注释说得很清楚,在这里我还是解释一下,该文件主要是用于集群中两个节点的认证,采用的算法和密钥(如果有的话)在集群中节点上必须相同,目前提供了3种算法:md5,sha1和crc。其中crc不能够提供认证,它只能够用于校验数据包是否损坏,而sha1,md5需要一个密钥来进行认证,从资源消耗的角度来讲,md5消耗的比较多,sha1次之,因此建议一般使用sha1算法。
我们如果要采用sha1算法,只需要将authkeys中的auth 指令(去掉注释符)改为2,而对应的2 sha1行则需要去掉注释符(#),后面的密钥自己改变(两节点上必须相同)。改完之后,保存,同时需要改变该文件的属性为600,否则heartbeat启动将失败。具体命令为:chmod 600 authkeys
1.3.2.ha.cf
文件所在:/etc/ha.d/ha.cf
中文注释来自linuxer_jlu,蓝色部分是我在主服务器的配置。两台服务器的这个配置文件内容完全相同。
为了保证能够正常启动,需要创建两个用户,分别是haclient和hacluster。
指令如下:
adduser haclient
adduser hacluster
下面灰色背景部分是配置文件内容:蓝色字的部分是具体配置
#用于记录heartbeat的调试信息
#debugfile /var/log/ha-debug
#logfile用于记录heartbeat的日志信息
logfile /var/log/ha.log
#如果未定义上述的日志文件,那么日志信息将送往local0(对应的#/var/log/messages),如果
#这3个日志文件都未定义,那么heartbeat默认情况下
#将在/var/log下建立ha-debug和ha-log来记录相应的日志信息。
logfacility local0
#发送心跳报文的间隔,默认单位为秒,如果你毫秒为单位,那么需要在后面跟
#ms单位,如1500ms即代表1.5s
keepalive 2
#用于配置认为对方节点菪掉的间隔
deadtime 30
#发出最后的心跳警告报文的间隔
warntime 10
#网络启动的时间
initdead 120
#广播/单播通讯使用的udp端口
udpport 694
#串口通讯的波特率
#baud 19200
# serial serialportname ...
#使用的串口设备,在linux上即为/dev/ttyS0(1,2,3…)
#serial /dev/ttyS0 # Linux
#serial /dev/cuaa0 # FreeBSD
#serial /dev/cuad0 # FreeBSD 6.x
#serial /dev/cua/a # Solaris
#心跳所使用的网络接口
#bcast eth0 # Linux
#bcast eth1 eth2 # Linux
#bcast eth1
#bcast le0 # Solaris
#bcast le1 le2 # Solaris
bcast eth1
# Set up a multicast heartbeat medium
# mcast [dev] [mcast group] [port] [ttl] [loop]
#
# [dev] device to send/rcv heartbeats on
# [mcast group] multicast group to join (class D multicast address
# 224.0.0.0 - 239.255.255.255)
# [port] udp port to sendto/rcvfrom (set this value to the
# same value as "udpport" above)
# [ttl] the ttl value for outbound heartbeats. this effects
# how far the multicast packet will propagate. (0-255)
# Must be greater than zero.
# [loop] toggles loopback for outbound multicast heartbeats.
# if enabled, an outbound packet will be looped back and
# received by the interface it was sent on. (0 or 1)
# Set this value to zero.
#
#如果采用组播通讯,在这里可以设置组播通讯所使用的接口,绑定的组播ip地
#址(在224.0.0.0 - 239.255.255.255间),通讯端口,ttl(time to live)所能经过路由的
#跳数,是否允许环回(也就是本地发出的数据包时候还接收)
#mcast eth0 225.0.0.1 694 1 0
#
# Set up a unicast / udp heartbeat medium
# ucast [dev] [peer-ip-addr]
#
# [dev] device to send/rcv heartbeats on
# [peer-ip-addr] IP address of peer to send packets to
#如果采用单播,那么可以配置其网络接口以及所使用的ip地址
#ucast eth0 192.168.1.2
#
#
# About boolean values...
#
# Any of the following case-insensitive values will work for true:
# true, on, yes, y, 1
# Any of the following case-insensitive values will work for false:
# false, off, no, n, 0
#
#
#
# auto_failback: determines whether a resource will
# automatically fail back to its "primary" node, or remain
# on whatever node is serving it until that node fails, or
# an administrator intervenes.
#
# The possible values for auto_failback are:
# on - enable automatic failbacks
# off - disable automatic failbacks
# legacy - enable automatic failbacks in systems
# where all nodes do not yet support
# the auto_failback option.
#
# auto_failback "on" and "off" are backwards compatible with the old
# "nice_failback on" setting.
#
# See the FAQ for information on how to convert
# from "legacy" to "on" without a flash cut.
# (i.e., using a "rolling upgrade" process)
#
# The default value for auto_failback is "legacy", which
# will issue a warning at startup. So, make sure you put
# an auto_failback directive in your ha.cf file.
# (note: auto_failback can be any boolean or "legacy")
#
#用于决定,当拥有该资源的属主恢复之后,资源是否变迁:是迁移到属主上,
#还是在当前节点上继续运行,直到当前节点出现故障。
auto_failback off
#
#
# Basic STONITH support
# Using this directive assumes that there is one stonith
# device in the cluster. Parameters to this device are
# read from a configuration file. The format of this line is:
#
# stonith # # NOTE: it is up to you to maintain this file on each node in the # cluster! #用于共享资源的集群环境中,采用stonith防御技术来保证数据的一致性 #stonith baytech /etc/ha.d/conf/stonith.baytech # # STONITH support # You can configure multiple stonith devices using this directive. # The format of the line is: # stonith_host # # to or * to mean it is accessible from any host. # # supported drives is in /usr/lib/stonith.) # # format for a particular device, run: # stonith -l -t # # # Note that if you put your stonith device access information in # here, and you make this file publically readable, you're asking # for a denial of service attack ;-) # # To get a list of supported stonith devices, run # stonith -L # For detailed information on which stonith devices are supported # and their detailed configuration options, run this command: # stonith -h # #stonith_host * baytech 10.0.0.3 mylogin mysecretpassword #stonith_host ken3 rps10 /dev/ttyS1 kathy 0 #stonith_host kathy rps10 /dev/ttyS1 ken3 0 # # Watchdog is the watchdog timer. If our own heart doesn't beat for # a minute, then our machine will reboot. # NOTE: If you are using the software watchdog, you very likely # wish to load the module with the parameter "nowayout=0" or # compile it without CONFIG_WATCHDOG_NOWAYOUT set. Otherwise even # an orderly shutdown of heartbeat will trigger a reboot, which is # very likely NOT what you want. #该指令是用于设置看门狗定时器,如果节点一分钟内都没有心跳,那么节点将 #重新启动 #watchdog /dev/watchdog # # Tell what machines are in the cluster # node nodename ... -- must match uname -n #node ken3 #node kathy #设置集群中的节点,注意:节点名必须与uname –n相匹配 node jmvs8 node jmvs7 # # Less common options... # # Treats 10.10.10.254 as a psuedo-cluster-member # Used together with ipfail below... # note: don't use a cluster node as ping node #ping 10.10.10.254 #ping指令以及下面的ping_group指令是用于建立伪集群成员,它们必须与下述 #的ipfail指令一起使用,它们的作用是监测物理链路,也就是说如果集群节点 #与上述伪设备不相通,那么该节点也将无权接管资源或服务,它将释放掉资源。 # # Treats 10.10.10.254 and 10.10.10.253 as a psuedo-cluster-member # called group1. If either 10.10.10.254 or 10.10.10.253 are up # then group1 is up # Used together with ipfail below... # #ping_group group1 10.10.10.254 10.10.10.253 ping_group group1 192.167.0.3 192.167.0.4 # # HBA ping derective for Fiber Channel # Treats fc-card-name as psudo-cluster-member # used with ipfail below ... # # You can obtain HBAAPI from http://hbaapi.sourceforge.net. You need # to get the library specific to your HBA directly from the vender # To install HBAAPI stuff, all You need to do is to compile the common # part you obtained from the sourceforge. This will produce libHBAAPI.so # which you need to copy to /usr/lib. You need also copy hbaapi.h to # /usr/include. # # The fc-card-name is the name obtained from the hbaapitest program # that is part of the hbaapi package. Running hbaapitest will produce # a verbose output. One of the first line is similar to: # Apapter number 0 is named: qlogic-qla2200-0 # Here fc-card-name is qlogic-qla2200-0. # #hbaping fc-card-name # # # Processes started and stopped with heartbeat. Restarted unless # they exit with rc=100 #可以定义与heartbeat一起启动和停止的进程 #respawn userid /path/name/to/run # respawn hacluster /usr/lib/heartbeat/ipfail # # Access control for client api # default is no access #设置你所指定的启动进程的权限 #apiauth client-name gid=gidlist uid=uidlist #apiauth ipfail gid=haclient uid=hacluster ########################### #下面是一些非常用选项,在这里就不祥述了 # Unusual options. # ########################### # # hopfudge maximum hop count minus number of nodes in config #hopfudge 1 # # deadping - dead time for ping nodes #deadping 30 # # hbgenmethod - Heartbeat generation number creation method # Normally these are stored on disk and incremented as needed. #hbgenmethod time # # realtime - enable/disable realtime execution (high priority, etc.) # defaults to on #realtime off # # debug - set debug level # defaults to zero #debug 1 # # API Authentication - replaces the fifo-permissions-based system of the past # # # You can put a uid list and/or a gid list. # If you put both, then a process is authorized if it qualifies under either # the uid list, or under the gid list. # # The groupname "default" has special meaning. If it is specified, then # this will be used for authorizing groupless clients, and any client groups # not otherwise specified. # # There is a subtle exception to this. "default" will never be used in the # following cases (actual default auth directives noted in brackets) # ipfail (uid=HA_CCMUSER) # ccm (uid=HA_CCMUSER) # ping (gid=HA_APIGROUP) # cl_status (gid=HA_APIGROUP) # # This is done to avoid creating a gaping security hole and matches the most # likely desired configuration. # #apiauth ipfail uid=hacluster #apiauth ccm uid=hacluster #apiauth cms uid=hacluster #apiauth ping gid=haclient uid=alanr,root #apiauth default gid=haclient # message format in the wire, it can be classic or netstring, # default: classic #msgfmt classic/netstring # Do we use logging daemon? # If logging daemon is used, logfile/debugfile/logfacility in this file # are not meaningful any longer. You should check the config file for logging # daemon (the default is /etc/logd.cf) # Setting use_logd to "yes" is recommended # # use_logd yes/no # # the interval we reconnect to logging daemon if the previous connection failed # default: 60 seconds #conn_logd_time 60 # # # Configure compression module # It could be zlib or bz2, depending on whether u have the corresponding # library in the system. #compression bz2 # # Confiugre compression threshold # This value determines the threshold to compress a message, # e.g. if the threshold is 1, then any message with size greater than 1 KB # will be compressed, the default is 2 (KB) #compression_threshold 2 1.3.3.haresources 文件所在:/etc/ha.d/haresources 主服务器上的配置:(这里指的是heartbeat模拟的虚拟IP地址) 配置内容如下,在这里仅仅热备了一个同一个IP地址。也就是192.168.100.2。 jmvs8 IPaddr::192.168.100.2 备份服务器上配置: 配置内容如下,在这里仅仅热备了一个同一个IP地址。也就是192.168.100.2。 jmvs7 IPaddr::192.168.100.2 也就是主服务器失效时,备用服务器就会启动并模拟一个这个IP地址:192.168.100.2 1.3.4.hosts 文件所在:/etc/hosts 主服务空对空上的配置 127.0.0.1 jmvs8 localhost.localdomain localhost 192.168.100.3 jmvs8 192.168.100.4 jmvs7 备份服务器上的配置 127.0.0.1 jmvs7 localhost.localdomain localhost 192.168.100.3 jmvs8 192.168.100.4 jmvs7 1.4.启动 首先启动主服务器的heartbeat: service heartbeat start 然后启动备份服务器的heartbeat: service heartbeat start 大概在120秒后,我们用ifconfig可以看到模拟成功的IP地址:192.168.100.2,并且可以ping通。 我们关闭主服务器后,在30秒后,备份服务器启动模拟这个IP地址:192.168.100.2。 当主服务器再次启动并启动heartbeat后,同样在120秒后,会继续模拟IP地址:192.168.100.2。并接管它。 另:要实现热备功能,要注意,两台机器的heartbeat不要同一时刻启动,否则两台都会产生虚拟IP,最好是主机启动并成功产生虚拟IP后,再启动备用服务器。
