在有一台 active data guard Crash 掉 , 查看一下 log 出現
WARNING: inbound connection timed out (ORA-3136)
Time: 19-DEC-2018 17:14:42
Wed Dec 19 17:14:42 2018
Errors in file /ORA/db/diag/rdbms/lg_st/LG/trace/LG_lgwr_189317.trc:
ORA-04021: timeout occurred while waiting to lock object
nt OS err code: 0
Tracing not turned on.
Tns error struct:
ns main err code: 12535
WARNING: inbound connection timed out (ORA-3136)
TNS-12535: TNS:operation timed out
ns secondary err code: 12606
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
WARNING: inbound connection timed out (ORA-3136)
LGWR (ospid: 189317): terminating the instance due to error 4021
在這之前出現大量
WARNING: inbound connection timed out (ORA-3136)
最主要是
LGWR (ospid: 189317): terminating the instance due to error 4021
LGWR 導致 DB Crash 掉
首先先看一下 LGWR trace log
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /ORA/db/11.2.4
System name: Linux
Node name: ora-74
Release: 2.6.32-696.el6.x86_64
Version: #1 SMP Tue Feb 21 00:53:17 EST 2017
Machine: x86_64
Instance name: LG
Redo thread mounted by this instance: 1
Oracle process number: 21
Unix process pid: 189317, image: oracle@ora-74 (LGWR)
*** 2018-12-19 17:14:42.853
*** SESSION ID:(1261.1) 2018-12-19 17:14:42.853
*** CLIENT ID:() 2018-12-19 17:14:42.853
*** SERVICE NAME:(SYS$BACKGROUND) 2018-12-19 17:14:42.853
*** MODULE NAME:() 2018-12-19 17:14:42.853
*** ACTION NAME:() 2018-12-19 17:14:42.853
error 4021 detected in background process
ORA-04021: timeout occurred while waiting to lock object
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+465<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+63<-ksuitm()+5570<-ksbrdp()+3507<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
----- End of Abridged Call Stack Trace -----
*** 2018-12-19 17:14:42.929
LGWR (ospid: 189317): terminating the instance due to error 4021
ksuitm: waiting up to [5] seconds before killing DIAG(189287)
這幾個狀況 蠻符合 metalink 上說的
Bug 18242740 : ACTIVE STANDBY DATABASE CRASHED WITH ORA-4021
LGWR (ospid: Xxx): Terminating The Instance Due To Error 4021 (Doc ID 2177473.1)
LGWR需要獲取到解析鎖(parse lock)才能刷新,可能這個會話話費了大量時間解析。超過了設定時間,依然沒有獲取到解析鎖,就發生了ORA-4021崩潰
解決方法: 增加時間
alter system set "_adg_parselock_timeout" = 3000 scope = both ;
受影響的版本 :12.2 已下
也可透過官方 patch 包 修正
也可透過官方 patch 包 修正