前言: (PS:前言是第二次修改本文時(shí)加的)對(duì)于這篇文章,有博友提出了一些疑問和見解,有了博友的關(guān)注,也促使我想把這套東西做的更實(shí)用、更安全。后來(lái)又經(jīng)過(guò)思考,對(duì)腳本中一些條件和行為做了些改變。經(jīng)過(guò)修改,現(xiàn)在終于敢說(shuō)讓小伙伴本使用這套東西了。
前言:(PS: 前言是第二次修改本文時(shí)加的)對(duì)于這篇文章,有博友提出了一些疑問和見解, 有了博友的關(guān)注,也促使我想把這套東西做的更實(shí)用、更安全。后來(lái)又經(jīng)過(guò)思考, 對(duì)腳本中一些條件和行為做了些改變。經(jīng)過(guò)修改,現(xiàn)在終于敢說(shuō)讓小伙伴本使用這套東西了。
主要目的:
以雙主結(jié)構(gòu)配合keepalived解決MySQL主從結(jié)構(gòu)中主庫(kù)的單點(diǎn)故障;同時(shí)通過(guò)具體的查詢語(yǔ)句提供更細(xì)粒度、更為真實(shí)的關(guān)于主庫(kù)可用性的判斷。
基本思路:
將DB1和DB2做成主動(dòng)被動(dòng)模式的雙主結(jié)構(gòu):DB1主動(dòng)、DB2被動(dòng),通過(guò)keepalived的VIP對(duì)外,將VIP設(shè)置成原DB1的IP,保證改造過(guò)程對(duì)代碼透明
三個(gè)前提:
兩臺(tái)MySQL的配置文件里需要加上“l(fā)og_slave_updates = 1”;
并且“備用機(jī)”通過(guò)“read_only”參數(shù)實(shí)現(xiàn)除root用戶之外的只讀特性;
分別在兩個(gè)數(shù)據(jù)庫(kù)創(chuàng)建test.test表,插入幾條數(shù)據(jù),供檢測(cè)腳本使用。
正常時(shí),VIP在DB1,通過(guò)keepalived調(diào)用腳本定期檢查mysql服務(wù)可用性(通過(guò)一個(gè)低權(quán)限用戶連接mysql服務(wù)器并執(zhí)行一個(gè)簡(jiǎn)單查詢,根據(jù)返回結(jié)果來(lái)判定mysql是否可用)
若無(wú)法執(zhí)行查詢:
1. 第一次檢測(cè)失敗后,檢查服務(wù)狀態(tài),:
2. 開始執(zhí)行系列切換動(dòng)作
3. 管理員修復(fù)DB1后,通過(guò)腳本“change_to_backup.sh”將主庫(kù)切換回DB1。腳本思路如下: 注:涉及到切換主備,就會(huì)有中斷時(shí)間,所以推薦此步驟在業(yè)務(wù)低谷期執(zhí)行
關(guān)于“數(shù)據(jù)一致性”和“切換時(shí)間”: 連續(xù)兩次失敗以后,通過(guò)對(duì)主MySQL設(shè)置read_only屬性,同時(shí)kill掉用戶線程來(lái)保證在DB2接管服務(wù)之前,DB1上已經(jīng)沒有寫操作,避免主從數(shù)據(jù)不一致。并且切換時(shí)間基本上是可確定的: 30s(兩次檢測(cè)間隔)+2s(等待kill命令時(shí)間)+約1s(keepalived 切換VIP),總時(shí)間不會(huì)超過(guò)35s。 |
以上是大致思路,具體實(shí)現(xiàn)看過(guò)下面的腳本,就會(huì)一目了然了。
DB1上keepalived 配置
! Configuration File for keepalived vrrp_script chk_mysql { script "/etc/keepalived/check_mysql.sh" interval 30 #這里我的檢查間隔設(shè)置的比較長(zhǎng),因?yàn)槲覀償?shù)據(jù)庫(kù)前面有redis做緩存,數(shù)據(jù)庫(kù)一兩分鐘級(jí)別的終端對(duì)整體可用性影響不大。這也是我沒有采用成熟的方案而自己搞了這一套方案的“定心丸” } vrrp_instance VI_1 { state BACKUP #通過(guò)下面的priority來(lái)區(qū)分MASTER和BACKUP,也只有如此,底下的nopreempt才有效 interface em2 virtual_router_id 51 priority 100 advert_int 1 nopreempt #防止切換到從庫(kù)后,主keepalived恢復(fù)后自動(dòng)切換回主庫(kù) authentication { auth_type PASS auth_pass 1111 } track_script { chk_mysql } virtual_ipaddress { 192.168.1.5/24 } }
/etc/keepalived/check_mysql.sh腳本內(nèi)容如下(主要的判斷邏輯都在這里)
#!/bin/sh ###判斷如果上次檢查的腳本還沒執(zhí)行完,則退出此次執(zhí)行 if [ `ps -ef|grep -w "$0"|grep "/bin/sh*"|grep "?"|grep "?"|grep -v "grep"|wc -l` -gt 2 ];then #理論上這里應(yīng)該是1,但是實(shí)驗(yàn)的結(jié)果卻是2 exit 0 fi alias mysql_con='mysql -uxxxx -pxxxx' ###定義一個(gè)簡(jiǎn)單判斷mysql是否可用的函數(shù) function excute_query { mysql_con -e "select * from test.test;" 2>>/etc/keepalived/logs/check_mysql.err } ###定義無(wú)法執(zhí)行查詢,且mysql服務(wù)異常時(shí)的處理函數(shù) function service_error { echo -e "`date "+%F %H:%M:%S"` -----mysql service error,now stop keepalived-----" >> /etc/keepalived/logs/check_mysql.err /sbin/service keepalived stop &>> /etc/keepalived/logs/check_mysql.err echo -e "\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n" >> /etc/keepalived/logs/check_mysql.err } ###定義無(wú)法執(zhí)行查詢,但mysql服務(wù)正常的處理函數(shù) function query_error { echo -e "`date "+%F %H:%M:%S"` -----query error, but mysql service ok, retry after 30s-----" >> /etc/keepalived/logs/check_mysql.err sleep 30 excute_query if [ $? -ne 0 ];then echo -e "`date "+%F %H:%M:%S"` -----still can't execute query-----" >> /etc/keepalived/logs/check_mysql.err ###對(duì)DB1設(shè)置read_only屬性 echo -e "`date "+%F %H:%M:%S"` -----set read_only = 1 on DB1-----" >> /etc/keepalived/logs/check_mysql.err mysql_con -e "set global read_only = 1;" 2>> /etc/keepalived/logs/check_mysql.err ###kill掉當(dāng)前客戶端連接 echo -e "`date "+%F %H:%M:%S"` -----kill current client thread-----" >> /etc/keepalived/logs/check_mysql.err rm -f /tmp/kill.sql &>/dev/null ###這里其實(shí)是一個(gè)批量kill線程的小技巧 mysql_con -e 'select concat("kill ",id,";") from information_schema.PROCESSLIST where command="Query" or command="Execute" into outfile "/tmp/kill.sql";' mysql_con -e "source /tmp/kill.sql" sleep 2 ###給kill一個(gè)執(zhí)行和緩沖時(shí)間 ###關(guān)閉本機(jī)keepalived echo -e "`date "+%F %H:%M:%S"` -----stop keepalived-----" >> /etc/keepalived/logs/check_mysql.err /sbin/service keepalived stop &>> /etc/keepalived/logs/check_mysql.err echo -e "\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n" >> /etc/keepalived/logs/check_mysql.err else echo -e "`date "+%F %H:%M:%S"` -----query ok after 30s-----" >> /etc/keepalived/logs/check_mysql.err echo -e "\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n" >> /etc/keepalived/logs/check_mysql.err fi } ###檢查開始: 執(zhí)行查詢 excute_query if [ $? -ne 0 ];then /sbin/service mysql status &>/dev/null if [ $? -ne 0 ];then service_error else query_error fi fi
DB2上keepalived配置:
! Configuration File for keepalived vrrp_instance VI_1 { state BACKUP interface em2 virtual_router_id 51 priority 90 advert_int 1 authentication { auth_type PASS auth_pass 1111 } notify_master /etc/keepalived/notify_master_mysql.sh #此條指令告訴keepalived發(fā)現(xiàn)自己轉(zhuǎn)為MASTER后執(zhí)行的腳本 virtual_ipaddress { 192.168.1.5/24 } }
/etc/keepalived/notify_master_mysql.sh腳本內(nèi)容:
#!/bin/bash ###當(dāng)keepalived監(jiān)測(cè)到本機(jī)轉(zhuǎn)為MASTER狀態(tài)時(shí),執(zhí)行該腳本 alias mysql_con='mysql -uxxxx -pxxxx' echo -e "`date "+%F %H:%M:%S"` -----keepalived change to MASTER-----" >> /etc/keepalived/logs/state_change.log ###判斷是否已經(jīng)將從master接收到的binlog全部在本地執(zhí)行(這么做仍然無(wú)法完全確定從庫(kù)就已經(jīng)追上了主庫(kù),因?yàn)殡m然說(shuō)從庫(kù)延時(shí)一般情況都是慢在sql_thread上,但是也無(wú)法完全保證io_thread完全就沒有延時(shí)。但至少能保證已經(jīng)讀取到的binlog在本地執(zhí)行完畢) pos=`mysql_con -e "show slave status\G;"|grep "Master_Log_Pos"|awk '{printf ("%s",$NF "\t")}'` read_pos=`echo $pos|awk '{print $1}'` exec_pos=`echo $pos|awk '{print $2}'` until [ $read_pos = $exec_pos ] do echo -e "`date "+%F %H:%M:%S"` -----Exec_Master_Log_Pos is behind Read_Master_Log_Pos, wait......" >> /etc/keepalived/logs/state_ch ange.log sleep 1 done ###然后解除read_only屬性 echo -e "`date "+%F %H:%M:%S"` -----set read_only = 0 on DB2-----" >> /etc/keepalived/logs/state_change.log mysql_con -e "set global read_only = 0;" 2>> /etc/keepalived/logs/state_change.log echo "DB2 keepalived changed to MASTER,online DB server changed to DB2"|/bin/mailx -s "DB2 keepalived change to MASTER" xxxx@xxxx.com 2>> /etc/keepalived/logs/state_change.log echo -e "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n" >> /etc/keepalived/logs/state_change.log
DB2上手動(dòng)切換回DB1的腳本change_to_backup.sh:
#!/bin/sh ###手動(dòng)執(zhí)行將主庫(kù)切換回DB1的操作 alias mysql_con='mysql -uxxxx -pxxxx' echo -e "`date "+%F %H:%M:%S"` -----change to BACKUP manually-----" >> /etc/keepalived/logs/state_change.log echo -e "`date "+%F %H:%M:%S"` -----set read_only = 1 on DB2-----" >> /etc/keepalived/logs/state_change.log mysql_con -e "set global read_only = 1;" 2>> /etc/keepalived/logs/state_change.log ###kill掉當(dāng)前客戶端連接 echo -e "`date "+%F %H:%M:%S"` -----kill current client thread-----" >> /etc/keepalived/logs/state_change.log rm -f /tmp/kill.sql &>/dev/null ###這里其實(shí)是一個(gè)批量kill線程的小技巧 mysql_con -e 'select concat("kill ",id,";") from information_schema.PROCESSLIST where command="Query" or command="Execute" into outfile "/tmp/kill.sql";' mysql_con -e "source /tmp/kill.sql" 2>> /etc/keepalived/logs/state_change.log sleep 2 ###給kill一個(gè)執(zhí)行和緩沖時(shí)間 ###重啟DB2的keepalived使VIP漂移到DB1 echo -e "`date "+%F %H:%M:%S"` -----make VIP move to DB1-----" >> /etc/keepalived/logs/state_change.log /sbin/service keepalived restart &>> /etc/keepalived/logs/state_change.log ###確保DB1已經(jīng)追上了,下面的repl為復(fù)制所用的賬戶,-h后跟DB1的內(nèi)網(wǎng)IP pos=`mysql -urepl -pxxxx -h192.168.1.x -e "show slave status\G;"|grep "Master_Log_Pos"|awk '{printf ("%s",$NF "\t")}'` read_pos=`echo $pos|awk '{print $1}'` exec_pos=`echo $pos|awk '{print $2}'` until [ $read_pos = $exec_pos ] do echo -e "`date "+%F %H:%M:%S"` -----DB1 Exec_Master_Log_Pos($exec_pos) is behind Read_Master_Log_Pos($read_pos), wait......" >> /etc/keepalived/logs/state_change.log sleep 1 done ###然后解除DB1的read_only屬性 echo -e "`date "+%F %H:%M:%S"` -----set read_only = 0 on DB1-----" >> /etc/keepalived/logs/state_change.log ssh -pxxxx 192.168.1.x 'mysql -uxxxx -pxxxx -e "set global read_only = 0;"' 2>> /etc/keepalived/logs/state_change.log echo "DB2 keepalived轉(zhuǎn)為BACKUP狀態(tài),線上數(shù)據(jù)庫(kù)切換至DB1"|/bin/mailx -s "DB2 keepalived change to BACKUP" xxx@xxxx.com 2>> /etc/keepalived/logs/state_change.log echo -e "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n" >> /etc/keepalived/logs/state_change.log
日志截圖:
DB1 mysql服務(wù)故障
DB1 mysql服務(wù)正常,查詢失敗
DB2 一次切換過(guò)程
DB2 執(zhí)行腳本手動(dòng)切回DB1:
總結(jié):此方案適用于中小型企業(yè),解決了主從中master節(jié)點(diǎn)的單點(diǎn)問題;同時(shí),在此基礎(chǔ)上,可以再增加從庫(kù)實(shí)現(xiàn)讀寫分離等架構(gòu)。
聲明:本網(wǎng)頁(yè)內(nèi)容旨在傳播知識(shí),若有侵權(quán)等問題請(qǐng)及時(shí)與本網(wǎng)聯(lián)系,我們將在第一時(shí)間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com