Sunday, September 1, 2019

ORA-00368: checksum error in redo log bloc

Issue: The Archive Logs on the Physical Standby are corrupted while shipping,

Error Details found in the Alert Log:
Completed: alter database recover managed standby database using current logfile disconnect from session
Mon Aug 05 14:04:57 2019
Media Recovery Log +RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867
Incomplete read from log member '+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'. Trying next member.
Incomplete read from log member '+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'. Trying next member.
Errors in file /u01/app/oracle/product/diag/rdbms/trace/XXXXX_pr00_29975.trc  (incident=16775):
ORA-00353: log corruption near block 513423 change 437172397 time 07/28/2019 18:00:15
ORA-00334: archived log: '+RECO/XXXXXDR/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'
Incident details in: /u01/app/oracle/product/diag/rdbms/incident/incdir_16775/XXXXX_pr00_29975_i16775.trc
Errors with log +RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867
MRP0: Background Media Recovery terminated with error 368
Mon Aug 05 14:04:59 2019
Errors in file /u01/app/oracle/product/diag/rdbms/trace/XXXXX_pr00_29975.trc:
ORA-00368: checksum error in redo log block
ORA-00353: log corruption near block 513423 change 437172397 time 07/28/2019 18:00:15
ORA-00334: archived log:
'+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'
Mon Aug 05 14:04:59 2019
Errors in file /u01/app/oracle/product/diag/rdbms/trace/XXXXX_m001_136.trc:
ORA-48132: requested file lock is busy, [stg2_16775_inc.swp] [/u01/app/oracle/product/diag/rdbms/lck/SW_16775_1.lck]
ORA-48170: unable to lock file - already in use
SVR4 Error: 11: Resource temporarily unavailable
Additional information: 8
Additional information: 134
Mon Aug 05 14:04:59 2019
Sweep [inc][16775]: completed
Sweep [inc2][16775]: completed
Mon Aug 05 14:04:59 2019
MRP0: Background Media Recovery process shutdown ()
Mon Aug 05 14:05:00 2019
Dumping diagnostic data in directory=[cdmp_20190805140500], requested by (instance=1, osid=4294997271 (PR00)), summary=[incident=16775].
Mon Aug 05 14:05:01 2019
Incomplete read from log member '+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'. Trying next member.
Incomplete read from log member '+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'. Trying next member.
Errors in file /u01/app/oracle/product/diag/rdbms/trace/XXXXX_m000_134.trc  (incident=17347):
ORA-00353: log corruption near block 513423 change 436876222 time 07/28/2019 18:00:15
ORA-00334: archived log: '+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'
Incident details in: /u01/app/oracle/product/diag/rdbms/incident/incdir_17347/XXXXX_m000_134_i17347.trc
Mon Aug 05 14:05:03 2019
Dumping diagnostic data in directory=[cdmp_20190805140503], requested by (instance=1, osid=4294967430 (M000)), summary=[incident=17347].
Mon Aug 05 14:05:43 2019
DMON: NSV0 network call timeout. Killing it now.
Starting background process NSV0
Mon Aug 05 14:05:44 2019
NSV0 started with pid=47, OS id=143
Mon Aug 05 14:06:02 2019
Sweep [inc][17347]: completed
Sweep [inc2][17347]: completed

Troubleshooting:

Troubleshooting:

1. Based on the Logs collected, to ensure the datafiles and archivelogs are not corrupted, we verified the integrity using below methods.
a. Verified the Database and Datafile Status using the RMAN utility
RMAN>backup validate database;

b. Verified the Archive Logs status using the "Validate
RMAN>backup validate archivelog all;

c. Verified using query.
SQL> select * from v$database_block_corruptions.

2. Since there are no corruptions in the Primary archive logs we decided to copy the archivelogs from the primary site to the physical standby using a secured protocol where no firewall is in place, and registered the archivelog and started the MRP then the registered archivelog applied successfully but the consecutive archives failed to apply on the Physical Standby Database.

SQL> select * from v$archive_gap

Or you can use the v$managed_standby view to find where the log apply stuck.

SQL> select sequence#,process,status from v$managed_standby;

Copy the logs to the standby site from the primary site

Using the below command

$ scp log_file_name_n.arc oracle@standby:/log/file/location/log_file_name_n.arc

At standby site, Do the log file registration at the standby site until all the missing log files are registered, Use this below command.

SQL> alter database register logfile '/log/file/location/log_file_name_n.arc';

Cause: The Network Packets are getting corrupted during the log shipping due to the Network Firewall Rules,

Firewall caused partial archive log transferred.

The contents of the mrp trace file show:
The other common error is ORA-3135 and recommend you to check this cause and solution.

ORA-03135: connection lost contact when shipping redo log to standby database
ORA-03135: connection lost contact
ARC2: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3135)

Solution:

Disable any features like below which are enabled on the firewall, or you 

- SQLNet fixup protocol 
- Deep Packet Inspection (DPI) 
- SQLNet packet inspection 
- SQL Fixup 
- SQL ALG (Juniper firewall)