客户的生产环境是AIX 6.1上的Oracle 11.2.0.3.0,在凌晨1:40分左右发生的故障。为了不涉及泄露客户隐私,把数据库实例名进行了替
今天早上刚到公司一会,就接到南京客户打来的电话,说他们的核心会员数据库宕机了,让我远程帮忙查看一下。还没来得急看报纸的我赶紧打开电脑,远程连接到客户的服务器进行诊断。
客户的生产环境是AIX 6.1上的Oracle 11.2.0.3.0,在凌晨1:40分左右发生的故障。为了不涉及泄露客户隐私,把数据库实例名进行了替换。下面看具体的分析:
1. 数据库alert.log分析
Mon Jan 05 01:40:50 2015
WARNING: ASM communication error: op 18 state 0x50 (3113)
ERROR: slave communication error with ASM
NOTE: Deferred communication with ASM instance
Errors in file /u01/app/oracle/diag/rdbms/test5/test5/trace/test5_ora_16581034.trc:
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 288 Serial number: 5649
NOTE: deferred map free for map id 4422
Mon Jan 05 01:40:55 2015
NOTE: ASMB terminating
Mon Jan 05 01:40:55 2015
***********************************************************************
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Time: 05-JAN-2015 01:40:55
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: TNS:operation timed out
ns secondary err code: 12606
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=test2)(PORT=64460))
WARNING: inbound connection timed out (ORA-3136)
Mon Jan 05 01:40:55 2015
***********************************************************************
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Time: 05-JAN-2015 01:40:55
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: TNS:operation timed out
ns secondary err code: 12606
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Mon Jan 05 01:40:55 2015
***********************************************************************
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=test2)(PORT=64530))
Errors in file /u01/app/oracle/diag/rdbms/test5/test5/trace/test5_asmb_5898342.trc:
ORA-15064: communication failure with ASM instance
ORA-03135: connection lost contact
Process ID:
Session ID: 216 Serial number: 7
Fatal NI connect error 12170.
WARNING: inbound connection timed out (ORA-3136)
Errors in file /u01/app/oracle/diag/rdbms/test5/test5/trace/test5_asmb_5898342.trc:
ORA-15064: communication failure with ASM instance
ORA-03135: connection lost contact
Process ID:
Session ID: 216 Serial number: 7
VERSION INFORMATION:
TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Time: 05-JAN-2015 01:40:55
Tracing not turned on.
Tns error struct:
ASMB (ospid: 5898342): terminating the instance due to error 15064
ns main err code: 12535
TNS-12535: TNS:operation timed out
Mon Jan 05 01:40:55 2015
***********************************************************************
ns secondary err code: 12606
Fatal NI connect error 12170.
nt main err code: 0
VERSION INFORMATION:
TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
nt secondary err code: 0
Time: 05-JAN-2015 01:40:55
Tracing not turned on.
nt OS err code: 0
Mon Jan 05 01:40:56 2015
System state dump requested by (instance=1, osid=5898342 (ASMB)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/test5/test5/trace/test5_diag_20578640.trc
Mon Jan 05 01:40:59 2015
ORA-1092 : opitsk aborting process
Mon Jan 05 01:41:00 2015
ORA-1092 : opitsk aborting process
Instance terminated by ASMB, pid = 5898342
Mon Jan 05 08:46:45 2015
Starting ORACLE instance (normal)