Investigation of the 11g Oracle*Net connection delay when DNS server is unreachable

Oleksandr Denysenko:

“I’d like to introduce new Contributor to this Blog – Robert Oliynichenko.

Hi will make post on topics that are related to main direction of this Blog.

So, here we have The First Post about Slow 11.2 SQL*Net connections on AIX

Problem description:

Oracle 11g utilities like tnsping, lsnrctl or sqlplus on AIX 6.1 hanged for about 2,5 minutes (150 seconds) before connection with alias to any destination when DNS server was unreachable. In the same conditions (host and configuration) 10g utilities connected without delay. Oracle*Net was configured for local naming method only – NAMES.DIRECTORY_PATH=(tnsnames) in the both 10g and 11g Oracle Home. The tnsnames.ora files where identical in the 10g and 11g Oracle Home. The aliases in the tnsnames.ora used host names instead of IP addresses in the connection descriptions; if a host name was substituted by IP address the delay disappeared for this alias. Nevertheless all the host names from tnsnames where described in the /etc/hosts and resolution order was hosts = local, bind in the netsvc.conf (analog of  the nsswitch.conf in other UNIXs). This configuration should guarantee local name resolution when DNS server was unreachable. What is the root reason of this delay in 11g Oracle*Net ?

Diagnostic:

First of all I checked configuration of the Oracle*Net – sqlnet.ora, tnsnames.ora and AIX configuration of network name resolution  – /etc/resolv.conf, /etc/netsvc.conf. The configuration was typical.

The next steps were Oracle*Net and OS level tracing both 11g and 10g tnsping utility when DNS was reachable and unreachable. To emulate unreachable DNS server it’s IP address was changed in the /etc/resolv.conf

The Oracle*Net Support level tracing was enabled:

  TNSPING.TRACE_LEVEL = 16
TRACE_LEVEL_CLIENT = 16

The comparison of 11g Oracle*Net traces gave no clue. Traces with and without reachable DNS server were almost identical. The comparison of 10g and 11g Oracle*Net traces did not help too. There were some differences but generally they were similar.

To trace tnsping at OS level were used AIX truss utility (truss -aefo tnsping.trc tnsping db_test). I checked the traces but din’t found any system call (like gethostbyname, gethostbyname2, gethostbyaddr or getaddrinfo) which could use DNS for the name resolution.

Comparing 11g with 10g trace files I found some new actions in the 11g trace. Here are some parts of tracing tnsping (11g and 10g) when DNS server was unreachable:

10g part:

 kopen(“/etc/resolv.conf”, O_RDONLY)    = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 38
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 0
statx(“/etc/resolv.conf”, 0x0FFFFFFFFFFFA2D8, 176, 0) = 0
close(4)                = 0
socket(1, 1, 0)            = 4
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
connext(4, 0x0FFFFFFFFFFF9DA8, 1025)    Err#2  ENOENT
close(4)                = 0
kopen(“/etc/netsvc.conf”, O_RDONLY)    = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” #   @ ( # ) 4 3        “.., 4096) = 4096
kread(4, ” o n   a n d   r e s o l”.., 4096) = 663
close(4)                = 0
_thread_self()            = 33816773
__libc_sbrk(0x0000000000010020)    = 0x00000001109A2AA0
_thread_self()            = 33816773
kopen(“/etc/hesiod.conf”, O_RDONLY)    Err#2  ENOENT
kopen(“/etc/irs.conf”, O_RDONLY)    Err#2  ENOENT
_thread_self()            = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 33816773
_thread_self()            = 33816773
__libc_sbrk(0x0000000000010020)    = 0x00000001109B2AC0
kopen(“/etc/hosts”, O_RDONLY)        = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 2029
kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 0
close(4)                = 0
socket(2, 1, 0)            = 4
getsockopt(4, 65535, 4104, 0x0FFFFFFFFFFFC124, 0x0FFFFFFFFFFFC120) = 0
connext(4, 0x00000001109B9E30, 16)    = 0
ngetsockname(4, 0x0FFFFFFFFFFFC1D4, 0x0FFFFFFFFFFFC1D0) = 0
getsockopt(4, 65535, 4097, 0x0FFFFFFFFFFFC324, 0x0FFFFFFFFFFFC320) = 0
getsockopt(4, 65535, 4098, 0x0FFFFFFFFFFFC324, 0x0FFFFFFFFFFFC320) = 0
setsockopt(4, 6, 1, 0x0FFFFFFFFFFFC32C, 4) = 0
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
sigprocmask(2, 0x09001000A01BAAC0, 0x0FFFFFFFFFFFC330) = 0
_sigaction(13, 0x0FFFFFFFFFFFC460, 0x0FFFFFFFFFFFC4B8) = 0
thread_setmystate(0x0FFFFFFFFFFFBF20, 0x0000000000000000) = 0
kwrite(4, ” W0101 901 ,”.., 87) = 87
kread(4, ” A04 ” 5″.., 2064) = 65
close(4)                = 0
lseek(3, 27136, 0)            = 27136
kread(3, “\r\r ® V\r ¯”.., 512) = 512
kwrite(1, ” O K   ( 1 0   m s e c )”.., 13) = 13
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
close(3)                = 0
kfcntl(1, F_GETFL, 0x00000000120400C5) = 67110914
kfcntl(2, F_GETFL, 0x00000000120400C5) = 67110914
_exit(0)

11g part:

kopen(“/etc/resolv.conf”, O_RDONLY)    = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 38
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 0
statx(“/etc/resolv.conf”, 0x0FFFFFFFFFFF7B08, 176, 0) = 0
close(4)                = 0
socket(1, 1, 0)            = 4
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
connext(4, 0x0FFFFFFFFFFF75D8, 1025)    Err#2  ENOENT
close(4)                = 0
kopen(“/etc/netsvc.conf”, O_RDONLY)    = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” #   @ ( # ) 4 3        “.., 4096) = 4096
kread(4, ” o n   a n d   r e s o l”.., 4096) = 663
close(4)                = 0
_thread_self()            = 22020181
_thread_self()            = 22020181
kopen(“/etc/hesiod.conf”, O_RDONLY)    Err#2  ENOENT
kopen(“/etc/irs.conf”, O_RDONLY)    Err#2  ENOENT
_thread_self()            = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 22020181
_thread_self()            = 22020181
__libc_sbrk(0x0000000000010020)    = 0x0000000110A699A0
kopen(“/etc/hosts”, O_RDONLY)        = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 2029
kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 0
close(4)                = 0
 kopen(“/etc/hosts”, O_RDONLY)        = 4
 kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
 kfcntl(4, F_SETFD, 0x0000000000000001) = 0
 kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
 kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 2029
 kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 0
 socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 5000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 5000)    = 0
close(5)                = 0
 socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 10000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 10000)    = 0
close(5)                = 0
socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 20000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 20000)    = 0
close(5)                = 0
socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 40000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 40000)    = 0
close(5)                = 0
 socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 5000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 5000)    = 0
close(5)                = 0
socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 10000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 10000)    = 0
close(5)                = 0
socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 20000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 20000)    = 0
close(5)                = 0
socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 40000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 40000)    = 0
 close(5)                = 0
close(4)                = 0
socket(2, 1, 0)            = 4
kfcntl(4, F_GETFL, 0x0000000000000000) = 2
kfcntl(4, F_SETFL, 0x0000000000000006) = 0
kioctl(4, -2147195266, 0x0FFFFFFFFFFF89C8, 0x0000000000000000) = 0
getsockopt(4, 65535, 4104, 0x0FFFFFFFFFFF9594, 0x0FFFFFFFFFFF9590) = 0
connext(4, 0x0000000110A76590, 16)    = 0
kfcntl(4, F_GETFL, 0x0000000000000000) = 6
kfcntl(4, F_SETFL, 0x0000000000000002) = 0
kioctl(4, -2147195266, 0x0FFFFFFFFFFF89C8, 0x0000000000000000) = 0
ngetsockname(4, 0x0FFFFFFFFFFF9FD8, 0x0FFFFFFFFFFF9680) = 0
getsockopt(4, 65535, 4097, 0x0FFFFFFFFFFFA634, 0x0FFFFFFFFFFFA630) = 0
getsockopt(4, 65535, 4098, 0x0FFFFFFFFFFFA634, 0x0FFFFFFFFFFFA630) = 0
setsockopt(4, 6, 1, 0x0FFFFFFFFFFFA63C, 4) = 0
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
sigprocmask(2, 0x09001000A01BAAC0, 0x0FFFFFFFFFFFB0C0) = 0
_sigaction(13, 0x0FFFFFFFFFFFB1F0, 0x0FFFFFFFFFFFB248) = 0
thread_setmystate(0x0FFFFFFFFFFFACB0, 0x0000000000000000) = 0
__libc_sbrk(0x0000000000010020)    = 0x0000000110A799C0
kwrite(4, ” W0101 :01 ,”.., 87) = 87
kread(4, ” A04 ” 5″.., 8208) = 65
close(4)                = 0
lseek(3, 27136, 0)            = 27136
kread(3, “0E\t Ý \\t Þ”.., 512) = 512
kwrite(1, ” O K   ( 1 5 0 0 2 0   m”.., 17) = 17
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
close(3)                = 0
kfcntl(1, F_GETFL, 0x0000000011500055) = 67110914
kfcntl(2, F_GETFL, 0x0000000011500055) = 67110914
_exit(0)

We can see some new actions in the 11g part:

  • /etc/hosts look up were done 2 times; after the second lookup the file wasn’t close;
  • socket(2, 2, 0) was opened and closed 8 times (2 cycles by 4 times);
  • every time through this socket was sent a message (_esend) and status of the message queue was checked (_poll);
  • every time the checking of this message queue was waiting for response and finally timed out;
  • the waiting time was doubled (5000, 10000, 20000, 40000 ms) in the every next try in the cycle;
  • if we sum all the waiting time (5+10+20+40)*2 = 150 seconds !

The question is why all this new actions were done?

The reason

The answer is IPv6. It’s support was included in 11g Oracle*Net.

Even when the host name was resolved in IPv4 successfuly, the second look up in /etc/hosts is needed for IPv6 address host name resolution. If IPv6 address was not found in /etc/hosts then Oracle*Net requests DNS server for IPv6 name resolution because name resolution order is local, bind configured in the /etc/netsvc.conf.

Solution

To resolve the problem we should replace local, bind by local4, bind4  in the /etc/netsvc.conf. The meaning of the local4 and bind4 is explained in the comments in /etc/netsvc.conf:

# bind4       Uses BIND/DNS services for resolving only IPv4 addresses
# bind6       Uses BIND/DNS services for resolving only IPv6 addresses
# local4      Searches the local /etc/hosts file for resolving only IPv4 addresses
# local6      Searches the local /etc/hosts file for resolving only IPv6 addresses

Advertisements

5 thoughts on “Investigation of the 11g Oracle*Net connection delay when DNS server is unreachable

  1. Pingback: IBM PowerHA(HACMP) shutdowns oracle 11.2 because of IPv6 and DNS is not accessable « Oleksandr Denysenko's Blog

  2. Hi Sir,

    I have some doubt.

    > Whenever we do the tnsping , it tries to resolve the hostname from DNS or /etc/hots ?.

    > what is the importance of netsvc.conf when we are using the tnsping ?

    > would you please tell me what is ” IPv6 address host name ” , which hostname it is taking about ( DB hostname or someting else )

    > when I do the tnsping from the client side , from the client side I am able to ping the host name of DB , Even then why tnsping was delayed “The Listener responds with a TNS Refuse packet (its “pong”).”, which Address its trying to resolve, if any ?

    =====
    Once the hostname and port number have been acquired, a TCP connection is opened to the target host and port the Oracle Listener is on. To open this TCP connection, the host name must be resolved to the IP address of the destination host. TCP/IP then performs the three-way handshake to complete the connection.

    Tnsping then sends a Oracle TNS Connect packet (the “ping”) to the Oracle Listener. The Listener responds with a TNS Refuse packet (its “pong”). The TCP/IP connection between the two hosts is then terminated.

    =======

    We having a same issue with tnsping , Just after making a changes For DNS on DB server , tnsping start working fine.

    During that time I checked that ping to server is working fine.

    • Hi.

      > Whenever we do the tnsping , it tries to resolve the hostname from DNS or /etc/hots ?.
      ===================================
      it depends on your OS configuration
      actually, by default, os prefer to use DNS for name to IP resolution
      but your goal may be to use /etc/hosts

      > what is the importance of netsvc.conf when we are using the tnsping ?
      ===================================
      it specify resolution order
      if first bind/nis/ldap specified, but doesn’t have correct configuration
      you may experience long-resolution time
      check next link netsvc.conf on this topic

      > would you please tell me what is ” IPv6 address host name ” , which hostname it is taking about ( DB hostname or someting else )
      ===================================
      it’s hostname of server running DB – it may have both IPv4 and IPv6 addresses,
      but IPv6 address is different from usual IPv4 address
      the main reason for this story is built-in support of IPv6 into Oracle 11.2
      and if OS configured to allow IPv6, but not configured correct IPv6 resolution,
      than we may see mentioned situation

      > when I do the tnsping from the client side , from the client side I am able to ping the host name of DB , Even then why tnsping was delayed “The Listener responds with a TNS Refuse packet (its “pong”).”, which Address its trying to resolve, if any ?
      ===================================
      actually, it may be well-known issue in 11.2.0.1,
      when listener try to do Reverse DNS lookup of client hostname from client IP address
      you may check this by tracing SQL*Net from the server-side

      Oleksandr

  3. Pingback: AIX 7.1 on Power 750 is very slow with no high memory or process consumption

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s