Strange multiple Oracle(CRS and other) processes and Linux Threads

Once we have been contacted by one customer with very strange problem:

“On one node of our two node RAC cluster on RHEL 4.5 we have quite strange behavior – multiple oracle CRS processes and as a result we are unable to stop CRS using crsctl stop crs… – we have to reboot node…”

It was quite interesting situation from next point of view:

# ps aux | grep crsd.bin
root 3135 0.0 0.3 124832 14924 ? Ss 13:48 0:00 /opt/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 3139 0.0 0.3 124832 14924 ? S  13:48 0:00 /opt/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 3220 0.0 0.3 124832 14924 ? S  13:48 0:00 /opt/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 3226 0.0 0.3 124832 14924 ? S  13:48 0:00 /opt/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 3227 0.0 0.3 124832 14924 ? S  13:48 0:00 /opt/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 3228 0.0 0.3 124832 14924 ? S  13:48 0:00 /opt/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 3229 0.0 0.3 124832 14924 ? S  13:48 0:00 /opt/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 3230 0.0 0.3 124832 14924 ? S  13:48 0:00 /opt/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 3231 0.0 0.3 124832 14924 ? S  13:48 0:00 /opt/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 3232 0.0 0.3 124832 14924 ? S  13:48 0:00 /opt/oracle/product/10.2.0/crs/bin/crsd.bin reboot

The same issue was mentioned without resolution 5 years ago on forums.oracle.com – Thread: Multiple “crsd.bin reboot” at once

Some time was spent on identifying the cause of this issue, but interesting fact is:

node1# getconf GNU_LIBPTHREAD_VERSION
linuxthreads-0.10
node2# getconf GNU_LIBPTHREAD_VERSION
NPTL 2.3.4

So we see that first node has old implementation of threads library, but not required NTPL – Native POSIX Threads Library. How this situation became possible is out of the scope of this post. Just as a comment – environment variable LD_ASSUME_KERNEL wasn’t directly specified at any node for any user.

Mentioned issue was fixed with an update of glibc library:

rpm -Uvh --force glibc-2.3.*.i386.rpm
rpm -Uvh --force glibc-common-2.3.*.i386.rpm

There is quite interesting NOTE at My Oracle Support portal that relates to the NTPL topic and partially to mentioned issue:

841292.1 Linux Threads: Why some Oracle RDBMS Releases do not work on some Linux Releases?

“Oracle RDBMS Release 10gR2 (aka 10.2.0.x) and newer built upon the newer Native POSIX Thread Library (NPTL)

RHEL AS /ES 4 and OEL 4 (the 2.6.9 kernel) included both Linux Threads and NPTL support.

Oracle’s older products (such as 9iR2 and 10gR1) will never be able to run on new Linux OSs (such as RHEL 5, OEL 5, SLES 10, and SLES 11) because those newer OSs have dropped the older Linux Threads support.”

References:

841292.1 Linux Threads: Why some Oracle RDBMS Releases do not work on some Linux Releases?

433292.1 Effect of setting LD_ASSUME_KERNEL Environment Variable For Opmn Module Of Oracle Application Server

1302964.1 Why Manager Process Shows More Than One Process In “ps -ef”

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s