rkorclappsdbastuff’s

Sunday, September 1, 2019

ORA-00368: checksum error in redo log bloc

Issue: The Archive Logs on the Physical Standby are corrupted while shipping,

Error Details found in the Alert Log:
Completed: alter database recover managed standby database using current logfile disconnect from session
Mon Aug 05 14:04:57 2019
Media Recovery Log +RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867
Incomplete read from log member '+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'. Trying next member.
Incomplete read from log member '+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'. Trying next member.
Errors in file /u01/app/oracle/product/diag/rdbms/trace/XXXXX_pr00_29975.trc (incident=16775):
ORA-00353: log corruption near block 513423 change 437172397 time 07/28/2019 18:00:15
ORA-00334: archived log: '+RECO/XXXXXDR/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'
Incident details in: /u01/app/oracle/product/diag/rdbms/incident/incdir_16775/XXXXX_pr00_29975_i16775.trc
Errors with log +RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867
MRP0: Background Media Recovery terminated with error 368
Mon Aug 05 14:04:59 2019
Errors in file /u01/app/oracle/product/diag/rdbms/trace/XXXXX_pr00_29975.trc:
ORA-00368: checksum error in redo log block
ORA-00353: log corruption near block 513423 change 437172397 time 07/28/2019 18:00:15
ORA-00334: archived log:
'+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'
Mon Aug 05 14:04:59 2019
Errors in file /u01/app/oracle/product/diag/rdbms/trace/XXXXX_m001_136.trc:
ORA-48132: requested file lock is busy, [stg2_16775_inc.swp] [/u01/app/oracle/product/diag/rdbms/lck/SW_16775_1.lck]
ORA-48170: unable to lock file - already in use
SVR4 Error: 11: Resource temporarily unavailable
Additional information: 8
Additional information: 134
Mon Aug 05 14:04:59 2019
Sweep [inc][16775]: completed
Sweep [inc2][16775]: completed
Mon Aug 05 14:04:59 2019
MRP0: Background Media Recovery process shutdown ()
Mon Aug 05 14:05:00 2019
Dumping diagnostic data in directory=[cdmp_20190805140500], requested by (instance=1, osid=4294997271 (PR00)), summary=[incident=16775].
Mon Aug 05 14:05:01 2019
Incomplete read from log member '+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'. Trying next member.
Incomplete read from log member '+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'. Trying next member.
Errors in file /u01/app/oracle/product/diag/rdbms/trace/XXXXX_m000_134.trc (incident=17347):
ORA-00353: log corruption near block 513423 change 436876222 time 07/28/2019 18:00:15
ORA-00334: archived log: '+RECO/ARCHIVELOG/2019_08_05/thread_1_seq_447.741.1015502867'
Incident details in: /u01/app/oracle/product/diag/rdbms/incident/incdir_17347/XXXXX_m000_134_i17347.trc
Mon Aug 05 14:05:03 2019
Dumping diagnostic data in directory=[cdmp_20190805140503], requested by (instance=1, osid=4294967430 (M000)), summary=[incident=17347].
Mon Aug 05 14:05:43 2019
DMON: NSV0 network call timeout. Killing it now.
Starting background process NSV0
Mon Aug 05 14:05:44 2019
NSV0 started with pid=47, OS id=143
Mon Aug 05 14:06:02 2019
Sweep [inc][17347]: completed
Sweep [inc2][17347]: completed

Troubleshooting:

Troubleshooting:

1. Based on the Logs collected, to ensure the datafiles and archivelogs are not corrupted, we verified the integrity using below methods.
a. Verified the Database and Datafile Status using the RMAN utility
RMAN>backup validate database;

b. Verified the Archive Logs status using the "Validate
RMAN>backup validate archivelog all;

c. Verified using query.
SQL> select * from v$database_block_corruptions.

2. Since there are no corruptions in the Primary archive logs we decided to copy the archivelogs from the primary site to the physical standby using a secured protocol where no firewall is in place, and registered the archivelog and started the MRP then the registered archivelog applied successfully but the consecutive archives failed to apply on the Physical Standby Database.

SQL> select * from v$archive_gap

Or you can use the v$managed_standby view to find where the log apply stuck.

SQL> select sequence#,process,status from v$managed_standby;

Copy the logs to the standby site from the primary site

Using the below command

$ scp log_file_name_n.arc oracle@standby:/log/file/location/log_file_name_n.arc

At standby site, Do the log file registration at the standby site until all the missing log files are registered, Use this below command.

SQL> alter database register logfile '/log/file/location/log_file_name_n.arc';

Cause: The Network Packets are getting corrupted during the log shipping due to the Network Firewall Rules,

Firewall caused partial archive log transferred.

The contents of the mrp trace file show:
The other common error is ORA-3135 and recommend you to check this cause and solution.

ORA-03135: connection lost contact when shipping redo log to standby database
ORA-03135: connection lost contact
ARC2: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3135)

Solution:

Disable any features like below which are enabled on the firewall, or you

- SQLNet fixup protocol

- Deep Packet Inspection (DPI)

- SQLNet packet inspection

- SQL Fixup

- SQL ALG (Juniper firewall)

Saturday, March 9, 2019

Issue: emcli login failed

[oemapp@xxxxx emcli]$ emcli login -username="sysman" -password="oracle123"
Error: Connection to the current OMS could not be established. Check the log files for further details.
Log file location is : /mwhome/app/gc_inst/em/EMGC_OMS1/sysman/emcli/setup/.emcli/.emcli.log
[oemapp@xxxxx emcli]$ pwd
/mwhome/app/gc_inst/em/EMGC_OMS1/sysman/emcli
[oemapp@xxxxx emcli]$ cd ../../..
[oemapp@xxxxx em]$ ls
EMGC_OMS1

Cause:
emcli.log:
========

Mar 06, 2019 2:05:14 PM oracle.sysman.emCLI.omsbrowser.OMSBrowser submitRequest
SEVERE: IOException in 5(5) try of submitRequest to /em/console/cli :
Mar 06, 2019 2:05:14 PM oracle.sysman.emSDK.emCLI.verb.RemoteVerb execute
SEVERE:
oracle.sysman.emCLI.omsbrowser.HttpConnectionException:
at oracle.sysman.emCLI.omsbrowser.OMSBrowser.submitRequest2(OMSBrowser.java:1797)
at oracle.sysman.emCLI.omsbrowser.OMSBrowser.submitRequest(OMSBrowser.java:1607)
at oracle.sysman.emCLI.omsbrowser.OMSBrowser.submitCommand(OMSBrowser.java:1507)
at oracle.sysman.emCLI.omsbrowser.OMSBrowser.getPageInternal(OMSBrowser.java:948)
at oracle.sysman.emCLI.omsbrowser.OMSBrowser.getPageCommon(OMSBrowser.java:854)
at oracle.sysman.emCLI.omsbrowser.OMSBrowser.getPageSplitStream(OMSBrowser.java:828)
at oracle.sysman.emSDK.emCLI.verb.RemoteVerb.execute(RemoteVerb.java:273)
at oracle.sysman.install.emcli.GetPlatformClientVerb.execute(GetPlatformClientVerb.java:53)
at oracle.sysman.emSDK.emCLI.CLIController.execute(CLIController.java:367)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at oracle.sysman.emCLI.StandAloneLaunchHandler.invoke(StandAloneLaunchHandler.java:413)
at oracle.sysman.emCLI.StandAloneLaunchHandler.launch(StandAloneLaunchHandler.java:286)
at oracle.sysman.emSDK.emCLI.CLIController.launch(CLIController.java:255)
at oracle.sysman.emSDK.emCLI.CLIController.main(CLIController.java:207)
Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.Alerts.getSSLException(Alerts.java:154)
at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2020)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1127)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:750)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:167)
at HTTPClient.HTTPConnection.sendRequest(HTTPConnection.java:3524)
at HTTPClient.HTTPConnection.handleRequest(HTTPConnection.java:3436)
at HTTPClient.HTTPConnection$10.run(HTTPConnection.java:3187)
at HTTPClient.HTTPConnection$10.run(HTTPConnection.java:3178)
at HTTPClient.HttpClientConfiguration.doAction(HttpClientConfiguration.java:1083)
at HTTPClient.HTTPConnection.doAction(HTTPConnection.java:5616)
at HTTPClient.HTTPConnection.setupRequest(HTTPConnection.java:3178)
at HTTPClient.HTTPConnection.Post(HTTPConnection.java:1131)
at HTTPClient.HTTPConnection.Post(HTTPConnection.java:1085)
at oracle.sysman.emCLI.omsbrowser.OMSBrowser.submitRequest2(OMSBrowser.java:1760)
... 16 more

It is observed that some of the 1.8 JDK updates are not allowing the communication between EMCLI and OMS.

Follow the solution from the below MOS note solution for this issue.

EM13c : Emcli Fails While Login And Sync - Error: Connection to the current OMS could not be established ( Doc ID 2392010.1 )

Action Plan:

[root@xxxxx opt]# tar xvf /mwhome/app/patches/jdk-7u211-linux-x64.tar.gz
[root@xxxxx opt]# cd jdk1.8.0_201/
[root@xxxxx jdk1.8.0_201]# alternatives --install /usr/bin/java java /opt/jdk1.7.0_211/bin/java 2
[root@xxxxx jdk1.8.0_201]# alternatives --config java

There are 3 programs which provide 'java'.

Selection Command
-----------------------------------------------
1 java-1.7.0-openjdk.x86_64 (/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.201-2.6.16.1.0.1.el7_6.x86_64/jre/bin/java)
* 2 java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre/bin/java)
+ 3 /opt/jdk1.8.0_201/bin/java

Enter to keep the current selection[+], or type selection number:
[root@xxxxx jdk1.8.0_201]# alternatives --install /usr/bin/jar jar /opt/jdk1.7.0_211/bin/jar 2
[root@xxxxx jdk1.8.0_201]# alternatives --install /usr/bin/javac javac /opt/jdk1.7.0_211/bin/javac 2
[root@xxxxx jdk1.8.0_201]# alternatives --set jar /opt/jdk1.7.0_211/bin/jar
[root@xxxxx jdk1.8.0_201]# alternatives --set javac /opt/jdk1.7.0_211/bin/javac
[root@xxxxx jdk1.8.0_201]# java -version
[root@xxxxx jdk1.8.0_201]# su - oemapp
[root@xxxxx jdk1.8.0_201]#export JAVA_HOME=/mwhome/app/oemapp/oracle_common/jdk/jre
[root@xxxxx jdk1.8.0_201]#export JRE_HOME=/mwhome/app/oemapp/oracle_common/jdk/jre
[root@xxxxx jdk1.8.0_201]#export PATH=$PATH:/mwhome/app/oemapp/oracle_common/jdk/bin:/mwhome/app/oemapp/oracle_common/jdk/jre/bin

[oemapp@xxxxx jre]$ /mwhome/app/oemapp/oracle_common/jdk/jre/bin/java -version
java version "1.7.0_111"
Java(TM) SE Runtime Environment (build 1.7.0_111-b13)

Java HotSpot(TM) 64-Bit Server VM (build 24.111-b13, mixed mode)

[oemapp@xxxxx jre]$ emcli status
Oracle Enterprise Manager 13c Release 2 EM CLI.
Copyright (c) 1996, 2016 Oracle Corporation and/or its affiliates. All rights reserved.

Instance Home : /mwhome/app/gc_inst/em/EMGC_OMS1/sysman/emcli/setup/.emcli
Verb Jars Home : /mwhome/app/oemapp/bin/./bindings/13.2.0.0.0/.emcli
Status : Configured
EM CLI Home : /mwhome/app/oemapp/bin
EM CLI Version : 13.2.0.0.0
Java Home : /mwhome/app/oemapp/oracle_common/jdk/jre
Java Version : 1.7.0_111
Log file : /mwhome/app/gc_inst/em/EMGC_OMS1/sysman/emcli/setup/.emcli/.emcli.log
Log level : SEVERE
EM URL : https://xxxxx .xxxxx .net:7803/em
EM user : sysman
Auto login : false
Trust all certificates : true

[oemapp@xxxxx jre]$

Setup the EMCLI
##########

[oemapp@xxxxx jre]$ emcli setup -url=https://xxxxx .xxxxx .net:7803/em -username="sysman" -password="oracle123" -trustall
Oracle Enterprise Manager 13c Release 2.
Copyright (c) 1996, 2016 Oracle Corporation and/or its affiliates. All rights reserved.

Emcli setup successful

[oemapp@xxxxx jre]$ emcli get_supported_platforms
-----------------------------------------------
Version = 13.2.0.0.0
Platform = Linux x86-64
-----------------------------------------------
Version = 13.2.0.0.0
Platform = Microsoft Windows x64 (64-bit)
-----------------------------------------------
Version = 13.2.0.0.0
Platform = Oracle Solaris on SPARC (64-bit)
-----------------------------------------------
Platforms list displayed successfully.

[oemapp@xxxxx jre]$

Thursday, July 19, 2018

ORA-27125 When attempting to start 12c Database On Solaris 11

Issue: When attempting to start the 12c Database on Solaris 11, we hit the below error.

ORA-27125: unable to create shared memory segment
Invalid Argument.

Cause: The issue was due to an undersized project.max-shm-memory setting, where project.max-shm-memory was set to 8 GB while the RAM was 200 GB.

Solution:

Increase project.shm-max-memory to a value greater than 1/4 of physical memory.

The project.max-shm-memory resource control limits the total amount of shared memory of one project.
The PROJECT.MAX-SHM-MEMORY should be set larger than the sum of all segments used by the project.

In case, we don't set project.shm-max-memory but shmmax is defined, system assumes "shm-max-memory" as shmmax*shmmni. If shmmax is undefined as well,
then default value is 1/4 of physical memory:

http://docs.oracle.com/cd/E19082-01/819-2724/6n50b0793/index.html

For example, to change the setting for project.max-shm-memory to 10 GB for the project default you can use either the 'prctl' command or the 'projmod' command. Changes made with the 'prctl' command are good only until system reboot. The 'projmod' command should be used to make changes that persist across reboot.

The following command requires reboot:

projmod -sK "project.max-shm-memory=(privileged,10G,deny)" user.dba

To modify a parameter until next system reboot:

prctl -n project.max-shm-memory -v 6gb -r -i project user.dba

Thursday, July 12, 2018

What is Cluster Time Synchronization Service CTSS ?

The Cluster Time Synchronization Service (CTSS) is one of the newprocesses in the Oracle Clustware 11gR2 layer , the CTSS is installed duringthe GRID infrastructure installation.

The Daemon is responsible for the time management within the cluster. Makingsure each node in the cluster is using the same time. But before doing this theCTSS will first determine if a Network Time Protocol (NTP) daemon is running inthe system. If this is not the case CTSS will run in active mode and performthe same task as the NTP daemon. If NTP daemon is running CTSS will be started,but in observer mode. The reason Oracle has implemented CTSS is to make suretime management within the cluster is performed. CTSS is Oracles implementationof the NTP daemons normally setup by the OS administrators

The first node in the cluster where CTSS is started will become the mastertime manager. Other CTSS daemons will communicate with the master CTSS andvalidate the time. If a time difference between hosts in the cluster isdetected it will adjust the time, similar to the NTP daemon. CTSS will never goback into time. If time differences are taking place, it will be reported inthe alert.log. If the time difference between hosts during startup is too large( 1000 msec) Oracle Clusterware will not startup on the newly joined nodes. Analert will be written to the alert.log of the Oracle Clusterware home e.g. (/u01/app/11.2.0/grid/log//alert.log).You manually need to modify the time and start Oracle Clusterware in thatcase.

Solution

How does Oracle Clusterware decide to start CTSS in observe or active mode?

CTSS is a process which runs as root on each node. As soon as OracleClusterware is started the CTSS daemon validates if the /etc/ntp.conf fileexists, if this file exists CTSS will run in observer mode. To determine if NTPdaemon is active cluster verification utility to get that part of heinformation.

root 3582 1 0 11:24 ? 00:00:12 /u01/app/11.2.0/grid/bin/octssd.binreboot

How to validate CTSS is running in observer mode or active mode?

Of course there is the alert.log on cluster level which will report the statusas well as the trace file. But the two easiest way is to use crsctl or cluvfufor this purpose. Crsctl will tell if CTSS is running and when the role isActive report the Offset in msec. Cluvfy will report much more information.Below are examples:

$ crsctl check ctss
$ cluvfy comp clocksync -n all
$ cluvfy comp clocksync -verbose

This will list the status of CTSS, if it is running and after that it willreport the current mode, active or observer. If the mode is active it will alsoreport if a time synchronization issue exists.

Sample output using CRSCTL:

[oracle@server1 log]$ /u01/app/11.2.0/grid/bin/crsctlcheck ctss
CRS-4701: The Cluster Time Synchronization Service is in Active mode.
CRS-4702: Offset (in msec): 0

Sample output CTSS in Observer mode but NTP not active:

[oracle@server1 server1]$ cluvfy comp clocksync
Verifying Clock Synchronization across the cluster nodes

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...

CTSS resource check passed

Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...

CTSS is in Observer state. Switchingover to clock synchronization checks using NTP

Starting Clock synchronization checks using NetworkTime Protocol(NTP)...
NTP Configuration file check started...
NTP Configuration file check passed

Checking daemon liveness...
Liveness check failed for "ntpd"
Check failed on nodes: server1

PRVF-5415: Check to see if NTP daemon is running failed
Clock synchronization check using Network Time Protocol(NTP) failed
PRVF-9652: Cluster Time Synchronization Services checkfailed

Verification of Clock Synchronization across thecluster nodes was unsuccessful on all the specified nodes.

Here we see the error messagesPRVF-5415followed byPRVF-9652,indicating there is an issue with NTP. This is correct as it wasnotconfigured.

Sample output CTSS in Observer mode but NTP not active in verbose mode:
[oracle@server1]$ cluvfy comp clocksync -verbose
Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes

Node Name Status

------------------------------------------------------------
server1 passed
Result: CTSS resource check passed

Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed
Check CTSS state started...
Check: CTSS state
Node Name State

------------------------------------------------------------

server1 Observer

CTSS is in Observer state. Switching over to clocksynchronization checks using NTP
Starting Clock synchronization checks using NetworkTime Protocol(NTP)...

NTP Configuration file check started...
The NTP configuration file "/etc/ntp.conf" is available on all nodes
NTP Configuration file check passed

Checking daemon liveness...
Check: Liveness for "ntpd"
Node Name Running?

------------------------------------------------------------

server1 no
Result: Liveness check failed for "ntpd"
PRVF-5415 : Check to see if NTP daemon is running failed
Result: Clock synchronization check using Network Time Protocol(NTP) failed
PRVF-9652 : Cluster Time Synchronization Services check failed

Verification of Clock Synchronization across thecluster nodes was unsuccessful on all the specified nodes.

Same result as above, although easier to see the State and that NTP is notactive.

Sample output when CTSS is in active mode using cluvfy:

[oracle@server1~]$ cluvfy comp clocksync -verbose

Verifying Clock Synchronization across the clusternodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
NodeName                            Status
------------------------------------  ------------------------
server1                              passed
Result: CTSS resource check passed

Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed
Check CTSS state started...
Check: CTSS state
  NodeName                            State
  ------------------------------------  ------------------------
  server1                              Active
CTSS is in Active state. Proceeding with check of clock time offsets on allnodes...
Reference Time Offset Limit: 1000.0 msecs
Check: Reference Time Offset
  Node Name     TimeOffset              Status
  ------------  ------------------------ ------------------------
  server1      0.0                      passed

Time offset is within the specified limits on thefollowing set of nodes:
"[server1]"
Result: Check of clock time offsets passed

Oracle Cluster Time Synchronization Services checkpassed

Verification of Clock Synchronization across the cluster nodes was successful.
[oracle@server1 ~]$

Sample output when time offset is violated using cluvfy:

[oracle@server1 ~]$ cluvfy comp clocksync -n all
Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...

CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Active state. Proceeding with check of clock time offsets on allnodes...
PRVF-9661 : Time offset is NOT within the specified limits on the followingnodes:"[server2]"

PRVF-9652 : Cluster Time Synchronization Services check failed

Verification of Clock Synchronization across the cluster nodes wasunsuccessful.
Checks did not pass for the following node(s): server2

[oracle@server2 ~]$ crsctl check ctss
CRS-4701: The Cluster Time Synchronization Service is in Active mode.
CRS-4702: Offset (in msec): 13700

How to switch between observer mode to active mode in either direction?

To execute this task it is very simple. Make sure the /etc/ntp.conf fileis not available. Based on the existence of ntp.conf file CTSS will be inactive or observe mode. So remove/rename the file.

Every 30 second CTSS will check if the current state is still correct.When CTSS discovers this state is incorrect it will automatically switch fromobserver to active mode when the file is removed. If you don’t want to use theCTSS for the time management create the ntp.conf file again, and on the fly thestate will change.

Trace output explaining the above:

2009-09-28 16:27:13.768: [CTSS][3010210704]sclsctss_gvss1: NTP default config file found
2009-09-28 16:27:13.768: [CTSS][3010210704]sclsctss_gvss8: Return [0] and NTP status [2].
2009-09-28 16:27:13.768: [ CTSS][3010210704]ctss_check_vendor_sw: Vendor timesync software is detected. status [2].
2009-09-28 16:27:15.375: [ CTSS][3020700560]ctsscomm_prh: Handler called
[ CTSS][3020700560]ctss_process_request_handler: Master: Received sync messageevent
2009-09-28 16:27:15.375: [ CTSS][3020700560]ctsscomm_pi: Received sync msg
2009-09-28 16:27:15.375: [ CTSS][3020700560]ctsscomm_pi: Received from slave (mode [0x46] nodenum [2] hostname [server2] )
2009-09-28 16:27:23.378: [ CTSS][3020700560]ctsscomm_prh: Handler called
[ CTSS][3020700560]ctss_process_request_handler: Master: Received sync messageevent
2009-09-28 16:27:23.378: [ CTSS][3020700560]ctsscomm_pi: Received sync msg
2009-09-28 16:27:23.378: [ CTSS][3020700560]ctsscomm_pi: Received from slave (mode [0x46] nodenum [2] hostname [server2] )
2009-09-28 16:27:31.389: [ CTSS][3020700560]ctsscomm_prh: Handler called
[ CTSS][3020700560]ctss_process_request_handler: Master: Received sync messageevent
2009-09-28 16:27:31.389: [ CTSS][3020700560]ctsscomm_pi: Received sync msg
2009-09-28 16:27:31.389: [ CTSS][3020700560]ctsscomm_pi: Received from slave (mode [0x46] nodenum [2] hostname [server2] )
2009-09-28 16:27:39.389: [ CTSS][3020700560]ctsscomm_prh: Handler called
[ CTSS][3020700560]ctss_process_request_handler: Master: Received sync messageevent
2009-09-28 16:27:39.389: [ CTSS][3020700560]ctsscomm_pi: Received sync msg
2009-09-28 16:27:39.389: [ CTSS][3020700560]ctsscomm_pi: Received from slave (mode [0x46] nodenum [2] hostname [server2] )
2009-09-28 16:27:43.383: [ CTSS][2978741136]ctss_checkcb: clsdm requested checkalive. Returns [6e]
2009-09-28 16:27:43.769: [ CTSS][3010210704]sclsctss_gvss2: NTP default pid file not found<==== here /etc/ntp.conf is renamed.
2009-09-28 16:27:43.770: [CTSS][3010210704]sclsctss_gvss8: Return [0] and NTP status [1].
2009-09-28 16:27:43.770: [ CTSS][3010210704]ctss_check_vendor_sw: Vendor timesync software is not detected. status [1].
2009-09-28 16:27:43.786: [ CTSS][3010210704]ctsselect_determine_role: node [1]with mode [0x4e] in the modes table
2009-09-28 16:27:43.799: [ CTSS][3010210704]ctsselect_determine_role: node [2]with mode [0x46] in the modes table
2009-09-28 16:27:43.799: [ CTSS][3010210704]ctsselect_determine_role: Vendor time synchronizationsoftware is not detected on any node in the cluster. Switched to active role.

Output from the alert.log when there is en Time synchronizationissue.[ctssd(3416)]CRS-2408:The clock on host server2 has been updated by theCluster Time Synchronization Service to be synchronous with the mean clustertime.

2009-10-01 13:50:51.727[ctssd(3416)]CRS-2411:The Cluster TimeSynchronization Service will take a long time to perform time synchronizationas local time is significantly different from mean cluster time.Details in/u01/app/11.2.0/grid/log/server2/ctssd/octssd.log.

You can find similar output in the operation system logfile.

Remark:
- CTSS will run in observe mode as soon as an NTPconfiguration is found. This doesn’t tell if the NTP daemon is really workingproperly. Be aware of this! Default Linux installation will have the ntp.conffile in /etc/
- Use either NTP configuration or CTSS for time management. Don’t “play’ withCTSS on production environments. So discuss what you require.

Additional trace information:

When you look into the details when using cluvfy you will find out thefollowing checks are performed.

Validate if this is a cluster environment, does ocr.loc exists?
Check if CTSS is running using: /u01/app/11.2.0/grid/bin/crsctl check ctss
Check if ntp configuration file exists (when found mark as exists):/tmp/CVU_11.2.0.1.0_oracle/exectask.sh -chkfile /etc/ntp.conf
Validate if NTP daemon is really active using:/tmp/CVU_11.2.0.1.0_oracle/exectask.sh -chkalive ntpd

Alert.log will show when inobserver mode:

[ctssd(3582)]CRS-2403:The Cluster TimeSynchronization Service on host server1 is in observer mode.
2009-09-27 21:24:46.766[ctssd(3582)]CRS-2407:The new Cluster TimeSynchronization Service reference node is host server2.
2009-09-27 21:24:46.938[ctssd(3582)]CRS-2412:The Cluster Time SynchronizationService detects that the local time is significantly different from the meancluster time. Details in /u01/app/11.2.0/grid/log/server1/ctssd/octssd.log.
2009-09-27 21:24:46.986[ctssd(3582)]CRS-2409:The clock on host server1 is notsynchronous with the mean cluster time. No action has been taken as the ClusterTime Synchronization Service is running in observer mode.
2009-09-27 21:24:47.277[ctssd(3582)]CRS-2401:The Cluster Time SynchronizationService started on host server1.
2009-09-27 21:26:38.926.....
[ctssd(3582)]CRS-2409:The clock on host server1 is not synchronous with themean cluster time. No action has been taken as the Cluster Time SynchronizationService is running in observer mode.
2009-09-27 22:00:32.725[ctssd(3582)]CRS-2409:The clock on host server1 is notsynchronous with the mean cluster time. No action has been taken as the ClusterTime Synchronization Service is running in observer mode.

Here you can read the state of CTSS as well, but also see there is asynchronization issue No action is taken to fix this issue as CTSS is inobserver mode.

Alert.log will show when in active mode:

[ctssd(3578)]CRS-2401:TheCluster Time Synchronization Service started on host server1 is in active mode.

Advise: As time management is extremely important in clusterenvironments you must make sure NTP is running correctly. In general NTP willbe configured as standard on each system. But as soon as you are not able toconfirm this I would advise that the file /etc/ntp.conf is removed so CTSS willtake over this responsibility and become active. Make sure this is done on eachnode in the cluster.