DataGuard: GAP resolution doesn’t work anymore

Quite interesting situation when GAP resolution stopped working after some time, but primary is still able to send redo to standby…

I’m working with Oracle Standby more than decade, but have faced this particular issue for the first time. Actually there are some similar issues that may be classified as well known, so it doesn’t take much time to resolve them.

Let’s me provide some details for this particular issue:

  • it’s 10.2.0.4 system, but issue may be seen in another releases
  • DataGuard was setup and working fine for some time
  • GAP resolution was working perfectly
  • standby was shutdown-ed for maintenance
  • nothing was changed in configuration, so this is not a configuration change issue
  • after starting standby wasn’t able to resolve GAP with next messages in alert.log:

Fetching gap sequence in thread 1, gap sequence xxxxx-xxxxx

FAL[client]: Failed to request gap sequence

FAL[client]: All defined FAL servers have been attempted.

  • at the same time we see next messages in primary alert.log:

Redo Shipping Client Connected as PUBLIC
— Connected User is Valid

  • primary db has started sending redo to standby

LNS: Standby redo logfile selected for thread 1 sequence xxxx for destination LOG_ARCHIVE_DEST_3

  • primary is working OK and standby is receiving redo, bu unable to resolve GAP

After some non effective troubleshooting, I has found similar issue described on My Oracle Support portal in next notes:

1130523.1 Logs are not shipped to the physical standby database

736739.1 Primary Site No Longer Transmits Log Files To Standby Site

The cause of an issue:

  • ARCHiver process hang because of OS, network or some other issue
  • because of existence standby redo logs on standby and because redo shipping was configured with LGWR, LNS was able to send redo to standby, but GAP resolution which involves ARCH didn’t work

Solution(do one of):

  • reboot primary database to clean ARCH processes
  • kill ARCH processes at primary(they will be automatically restarted)

ps -ef | grep arc

kill -9

disable and enable log archive destination by altering LOG_ARCHIVE_DEST_STATE_x

  • start additional ARCH processes by increasing LOG_ARCHIVE_MAX_PROCESSES

Interesting facts:

  1. Oracle thinks that the main cause of this problem is a network or an OS issues
  2. because of fact #1 THERE ARE NO FIXES FOR THIS PROBLEM from Oracle side, so You may face this issue at any release/platform!
Advertisements

4 thoughts on “DataGuard: GAP resolution doesn’t work anymore

  1. Excellent .
    We faced exactly the same issue after a newtwork outage in our datacenter.
    All of a sudden the gap resolution stopped working .
    rebooting the primary solved the problem

    thanks

    • Rebooting primary is not always possible because if availability demands…
      Because of this I suggest killing ARCH processes.
      I have done it several times in production and it alwaya was safe solution.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s