Oracle DBA 生涯日誌: 9月 2017

在有一個 Redis cluster 測試環境中, 發現cluster 狀態不太穩定, 看了一下Log發現大量出現以下訊息

5009:S 11 Sep 00:09:00.086 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5009:S 11 Sep 00:11:11.086 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5009:S 11 Sep 00:16:05.098 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5009:S 11 Sep 00:16:30.030 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5009:S 11 Sep 00:21:37.687 # Cluster state changed: fail
5009:S 11 Sep 00:21:38.772 # Cluster state changed: ok
5009:S 11 Sep 00:21:38.810 * FAIL message received from 903ab5e1fb428082550d14de7b1bf97830bf8ad9 about 00b03db3d55d9c9045023d5e620c6f5d68ccab01
5009:S 11 Sep 00:21:41.368 # Start of election delayed for 754 milliseconds (rank #0, offset 33496646).
5009:S 11 Sep 00:21:41.368 # Cluster state changed: fail
5009:S 11 Sep 00:21:42.169 # Starting a failover election for epoch 72.
5009:S 11 Sep 00:21:42.248 # Failover election won: I'm the new master.
5009:S 11 Sep 00:21:42.248 # configEpoch set to 72 after successful failover
5009:M 11 Sep 00:21:42.248 # Connection with master lost.
5009:M 11 Sep 00:21:42.248 * Caching the disconnected master state.
5009:M 11 Sep 00:21:42.248 * Discarding previously cached master state.
5009:M 11 Sep 00:21:42.845 * Slave 10.22.21.186:8101 asks for synchronization
5009:M 11 Sep 00:21:42.845 * Full resync requested by slave 10.22.21.186:8101
5009:M 11 Sep 00:21:42.845 * Starting BGSAVE for SYNC with target: disk
5009:M 11 Sep 00:21:42.846 * Background saving started by pid 25548

一直出現

5009:S 11 Sep 00:16:30.030 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.

錯誤後 Redis 就開始進行節點切換 Master - Slave

原因:

1.慢查詢 -- 這邊看了一下慢查詢最慢也才 0.064 秒 , 不是這個問題

2.aof 檔案太大 -- bgrewriteaof 減少 size 大小, 但過10分後就又出現錯誤訊息 , 雖然訊息出現的頻率已降低

3.cpu 忙碌 -- 此主機當下 Idle還在 98% 所以排除這個問題

4.硬碟太慢 -- 此Redis 是安裝在VM上, Redis cluster instance 都在同一台主機上

可以調整 aof sync 頻率

參數: appendfsync

總共有三個值可以輸入

always

everysec

官方解釋為

always :將 aof_buf 緩衝區中的所有內容寫入並同步到 AOF 文件。

everysec :將aof_buf 緩衝區中的所有內容寫入到AOF 文件，如果上次同步AOF 文件的時間距離現在超過一秒鐘，那麼再次對AOF 文件進行同步，並且這個同步操作是由一個線程專門負責執行的。

no :將 aof_buf 緩衝區中的所有內容寫入到 AOF 文件，但並不對 AOF 文件進行同步，何時同步由操作系統來決定。

要是資料安全性不用太高的話, 可以嘗試把此值設為 no

以下是官方對於 Redis latency 的 troubleshooting與看法

https://redis.io/topics/latency

Oracle DBA 生涯日誌

2017年9月10日星期日

Redis cluster 不斷 crash

文章列表

搜尋此網誌

2017年9月10日 星期日

Redis cluster 不斷 crash

文章列表

搜尋此網誌

2017年9月10日星期日