High CPU Archive

Weblogic Server Stuck thread and very high cpu usage when executing a query

We were observing frequent server hangs.
On taking thread dumps at the time of hang we found the following culprit thread.

[STUCK] ExecuteThread: ‘6’ for queue: ‘weblogic.kernel.Default (self-tuning)'” daemon prio=1 tid=01baae30 nid=55 lwp_id=9965842 runnable [1ff05000..1ff03878]
at oracle.jdbc.oci8.OCIDBAccess.do_fetch(Native Method)
at oracle.jdbc.oci8.OCIDBAccess.executeFetch(OCIDBAccess.java:1758)
– locked < 36582dd0> (a oracle.jdbc.oci8.OCIDBAccess)
at oracle.jdbc.driver.OracleStatement.doExecuteQuery(OracleStatement.java:2659)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:2832)
at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:608)
– locked < ae1c1400> (a oracle.jdbc.driver.OraclePreparedStatement)
– locked < 36589900> (a oracle.jdbc.driver.OracleConnection)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:536)
– locked < ae1c1400> (a oracle.jdbc.driver.OraclePreparedStatement)
– locked < 36589900> (a oracle.jdbc.driver.OracleConnection)
at weblogic.jdbc.wrapper.PreparedStatement.executeQuery(PreparedStatement.java:100)

We solved the issue by switching to Type 4 driver.

Shell Script to send alert on HighCPU Usage on Solaris

Find below a sample script which you can use to send a mail when the there is a High CPU.
I have tested it on Solaris. The server name is TestServer. Feel free to use it in your environment and to share any thoughts.

pid=`/usr/ucb/ps -auxwww | grep TestServer | tr -s ‘ ‘ | cut -d ” ” -f 2 | sed ‘2,$ d’`

prstat -L -p $pid 1 1 > prstat.trace

for x in `sed ‘1d;$d’ prstat.trace | tr -s ‘ ‘ | cut -d ” ” -f 10`
do

a=`echo $x | tr -d ‘%’`
tHold=2.0

res=`expr $a \> $tHold`

if [ $res -eq 1 ]
then
echo “High CPU!! $a % “
fi

done

Investigating High CPU Usage with Weblogic on Solaris

1 )First determine the Weblogic Server Process id

/usr/ucb/ps -auxwww | grep java

faiz 8448 1.2 24.025954081965448 ? R Sep 21 249:19 /bea10/jdk1.6.0_14/bin/sparcv9/java -server -Xms2048m -Xmx2048m -verbose:gc -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+UseParallelGC -XX:NewRatio=3 -da -Dplatform.home=/web_apps/bea10/wlserver_10.3 -Dwls.home=/web_apps/bea10/wlserver_10.3/server -Dweblogic.home=/web_apps/bea10/wlserver_10.3/server -Dweblogic.management.discover=false -Dweblogic.management.server=http://xx.xx.xx.xx:10000 -Dwlw.iterativeDev=false -Dwlw.testConsole=false -Dwlw.logErrorsToConsole= -Dweblogic.ext.dirs=/bea10/patch_wls1030/profiles/default/sysext_manifest_classpath:/bea10/patch_cie660/profiles/default/sysext_manifest_classpath -Dweblogic.Name=Server1 -Djava.security.policy=/bea10/wlserver_10.3/server/lib/weblogic.policy weblogic.Server

2) Find the threads (LWPID) that are using the CPU

prstat -L -p 8448 1 1 > prstat.trace

prstat.trace

PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
8448 faiz 2535M 1919M sleep 59 0 0:16:49 2.0% java/98
8448 faiz 2535M 1919M sleep 59 0 0:11:10 0.2% java/99
8448 faiz 2535M 1919M sleep 59 0 0:10:03 0.2% java/123
8448 faiz 2535M 1919M sleep 59 0 0:02:09 0.1% java/38
8448 faiz 2535M 1919M sleep 59 0 0:02:04 0.1% java/68
8448 faiz 2535M 1919M sleep 59 0 0:01:59 0.1% java/23
8448 faiz 2535M 1919M sleep 59 0 0:02:00 0.0% java/22
8448 faiz 2535M 1919M sleep 59 0 0:01:57 0.0% java/21
8448 faiz 2535M 1919M sleep 59 0 0:14:17 0.0% java/19
8448 faiz 2535M 1919M sleep 59 0 0:16:57 0.0% java/5
8448 faiz 2535M 1919M sleep 59 0 0:00:37 0.0% java/12
8448 faiz 2535M 1919M sleep 59 0 0:08:59 0.0% java/309
8448 faiz 2535M 1919M sleep 59 0 0:00:28 0.0% java/72
8448 faiz 2535M 1919M sleep 59 0 0:00:33 0.0% java/34
8448 faiz 2535M 1919M sleep 59 0 0:00:31 0.0% java/28

3) Find the mapping betweek LWP and the process id.

pstack 8448 > pstack.trace

In this example Thread 98 uses the maximum CPU, so lets find the corresponding LWM id ( decimal) from the pstack trace

—————– lwp# 98 / thread# 98 ——————–
ffffffff7edda5c4 lwp_cond_wait (103446b48, 103446b30, 0, 0)
ffffffff7e2f9218 __1cCosNPlatformEventEpark6M_v_ (103446b30, 0, ffffffff7e4fcee9, ffffffff7e69e000, 103446b00, ffffffff7e71b958) + 100
ffffffff7e3a0058 __1cNObjectMonitorEwait6MxbpnGThread__v_ (1053a9e20, 0, 1, 10547b000, 10547b000, 0) + 370
ffffffff7de237b0 __1cSObjectSynchronizerEwait6FnGHandle_xpnGThread__v_ (10343ad88, 0, 4cc33, 10343ad88, 1053a9e20, 10547b000) + 178
ffffffff7de233d4 JVM_MonitorWait (0, 10343ad88, 0, 10547b000, 191, ffffffff7e69e000) + 234
ffffffff7880dff4 * java/lang/Object.wait(J)V+-4888
ffffffff7880df9c * java/lang/Object.wait(J)V+0
ffffffff7958ec78 * *java/lang/Object.wait()V [compiled] +3 (line 970)
ffffffff7958ec78 * *weblogic/work/ExecuteThread.waitForRequest()V+9 (line 312)
ffffffff7958ec78 * *weblogic/work/ExecuteThread.run()V+74 (line 343)
ffffffff78800240 * StubRoutines (1)
ffffffff7dd92b44 __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (1, 10547b000, fffffffed79ff7b8, a, ffffffff788001e0, fffffffed79ff4f0) + 1f4
ffffffff7e0dab28 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ (fffffffed79ff9c8, ffffffff7e6ea148, ffffffff629b3738, 4c148, fffffffed79ff7b8, ffffffffff6f4950) + 130
ffffffff7de2ba7c __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ (fffffffed79ff9c8, 10343ad60, 10343ad68, ffffffff7e71c2b0, ffffffff7e71c798, 10547b000) + 50
ffffffff7de3e488 __1cMthread_entry6FpnKJavaThread_pnGThread__v_ (ffffffff629b3738, 10547b000, 7dd50, 85fc64, ffffffff7e71bd50, 7dc00) + f0
ffffffff7e3bd70c __1cKJavaThreadRthread_main_inner6M_v_ (10547b000, 62, ffffffff629b3738, 0, 2e0934, ffffffff7de3e398) + 44
ffffffff7e2f15b8 java_start (10547b000, 2100, ffffffff7e6ffbb4, ffffffff7e69e000, ffffffff7e4fb2c7, ffffffff7e717094) + 238
ffffffff7edd6488 _lwp_start (0, 0, 0, 0, 0, 0)

Here, the LWP id is the same as thread number (98). LWP and the thread number might not be the same in all boxes.

4) Take thread Dump

kill -3 8448

Convert 98(Dec) to hex value, which will be 62.

Search for string nid=0x62 in the thread dump

“[ACTIVE] ExecuteThread: ‘4’ for queue: ‘weblogic.kernel.Default (self-tuning)'” daemon prio=3 tid=0x000000010547b000 nid=0x62 in Object.wait() [0xfffffffed79ff000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
– waiting on <0xfffffffefe6068f8> (a weblogic.work.ExecuteThread)
at java.lang.Object.wait(Object.java:485)
at weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:157)
– locked <0xfffffffefe6068f8> (a weblogic.work.ExecuteThread)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:178)

We observe that the Thread is waiting on monitor state. 2.0% is significantly less amount of CPU usage so we didnt see any suspicious trace.
But in your case this you might.

The steps described above will help you zero in on the culprit thread.