I have found critical situation about deadlock.
Http thread was full. Most of apache workers  was  in sending reply. 

Busy http thread grew up until suspending its function. When the apache could not respond any request, it removed from L4 (it have L7 function) because of no response.

Every web servers (600 ea) always does not happen, but some web application servers(10ea) fell into the trouble per 2-3 days.

Apache Status : 

Server 
Application Version : 2.0.59,  Mod_jk 1.2.18, Tomcat 5.5.17, JDK 6.0 update 6  Linux Kernel 2.6.9-67.0.22.ELsmp #1 SMP,
OS : CentOS
gcc version
 : gcc version 3.4.6 20060404 (Red Hat 3.4.6-9)
glibc
버전 glibc-2.3.4-2.39


I was wonderying why happened. 

1) I captured apache server status.
Apache Server Status

Server uptime: 4 days 20 hours 41 minutes 2 seconds
Total accesses: 43491259 - Total Traffic: 10.8 GB
CPU Usage: u33.75 s6.42 cu0 cs0 - .00956% CPU load
104 requests/sec - 27.0 kB/second - 266 B/request
109 requests currently being processed, 4 idle workers
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW_WWW.WWWWWWWWWWWW
CWWWWWWWWWWWWWWWWWWWWW_WWWWCWWWWWWWWWRWWWWWW__CCR.C.............
................................................................
................................................................

Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process


2) lsof
The number of CLOSE_WAIT status tcp sockets was 140. The number was close equal to the number of  apache workers was in sending reply.
java    16907  www   52u  IPv6 1014864218                 TCP localhost.localdomain:8009->localhost.localdomain:35205 (CLOSE_WAIT)
java    16907  www   53u  IPv6 1014869414                 TCP localhost.localdomain:8009->localhost.localdomain:35414 (CLOSE_WAIT)
java    16907  www   54u  IPv6 1014861929                 TCP localhost.localdomain:8009->localhost.localdomain:35171 (CLOSE_WAIT)
java    16907  www   55u  IPv6 1014863474                 TCP localhost.localdomain:8009->localhost.localdomain:35200 (CLOSE_WAIT)
java    16907  www   56u  IPv6 1014781598                 TCP localhost.localdomain:8009->localhost.localdomain:60006 (CLOSE_WAIT)
java    16907  www   57u  IPv6 1014860437                 TCP localhost.localdomain:8009->localhost.localdomain:35091 (CLOSE_WAIT)
java    16907  www   58u  IPv6 1014859292                 TCP localhost.localdomain:8009->localhost.localdomain:35062 (CLOSE_WAIT)
java    16907  www   59u  IPv6 1014853698                 TCP localhost.localdomain:8009->localhost.localdomain:34860 (CLOSE_WAIT)
java    16907  www   60u  IPv6 1014864865                 TCP localhost.localdomain:8009->localhost.localdomain:35353 (CLOSE_WAIT)
java    16907  www   61u  IPv6 1014868132                 TCP localhost.localdomain:8009->localhost.localdomain:35356 (CLOSE_WAIT)
java    16907  www   62u  IPv6 1014864283                 TCP localhost.localdomain:8009->localhost.localdomain:35222 (CLOSE_WAIT)
..


3) Thread dump
I found dead lock situation in dumped thread in java log.

                         i.         java.lang.Thread.State: BLOCKED (on object monitor)         
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:916)
          - waiting to lock <0x8fedb478> (a org.apache.commons.pool.impl.GenericObjectPool)

1.       "TP-Processor2" daemon prio=10 tid=0x0821f400 nid=0x4286 waiting for monitor entry [0x88ed3000..0x88ed50a0]

2.    "TP-Processor2" daemon prio=10 tid=0x0821f400 nid=0x4286 waiting for monitor entry [0x88ed3000..0x88ed50a0] 



Found one Java-level deadlock:

=============================

"TP-Processor200":

 waiting to lock monitor 0x080dfd24 (object 0x8fedb478, a org.apache.commons.pool.impl.GenericObjectPool),

 which is held by "Timer-0"

"Timer-0":

waiting to lock monitor 0x080dfcc0 (object 0x8fee6f70, a org.apache.commons.dbcp.PoolableConnection),

which is held by "TP-Processor28"

"TP-Processor28":

waiting to lock monitor 0x080dfd24 (object 0x8fedb478, a org.apache.commons.pool.impl.GenericObjectPool),

which is held by "Timer-0"

 

"TP-Processor200": (Select query) return queryForList("xxx.selectXXXCount",param);

 

"TP-Processor28": (Select query

queryForObject("selectmusic",params);



This deadlock reported in "Dead lock using the evictor"(https://issues.apache.org/jira/browse/DBCP-270)
In this jira, there are patch on that.


Commons Dbcp

Dead lock using the evictor

Created: 26/Jun/08 10:18 AM   Updated: 27/Jun/08 03:41 AM
Component/s: None Affects Version/s: 1.2.2 Fix Version/s: 1.3

Patch attached, dead lock reported. the abandonedtrace does synchronized(this) when it really only needs to do synchronized(this.trace)

Dead lock reported when using the evictor

=============================
"Timer-3":
waiting to lock monitor 0x0000000053b40548 (object 0x00002aaabf3210f0, a
org.apache.tomcat.dbcp.dbcp.PoolableConnection),
which is held by "TP-Processor27"
"TP-Processor27":
waiting to lock monitor 0x0000000053b404d0 (object 0x00002aaab9fa8b08, a
org.apache.tomcat.dbcp.pool.impl.GenericObjectPool),
which is held by "Timer-3"

Java stack information for the threads listed above:

patch source

Index: src/java/org/apache/commons/dbcp/AbandonedTrace.java
===================================================================
--- src/java/org/apache/commons/dbcp/AbandonedTrace.java (revision 671937)
+++ src/java/org/apache/commons/dbcp/AbandonedTrace.java (working copy)
@@ -172,7 +172,7 @@
      * @param trace AbandonedTrace object to add
      */
     protected void addTrace(AbandonedTrace trace) {
-        synchronized (this) {
+        synchronized (this.trace) {
             this.trace.add(trace);
         }
         setLastUsed();
@@ -182,8 +182,8 @@
      * Clear the list of objects being traced by this
      * object.
      */
-    protected synchronized void clearTrace() {
-        if (this.trace != null) {
+    protected void clearTrace() {
+        synchronized(this.trace) {
             this.trace.clear();
         }
     }
@@ -194,8 +194,10 @@
      * @return List of objects
      */
     protected List getTrace() {
-        return trace;
+        synchronized (this.trace) {
+            return new ArrayList(trace);
     }
+    }
 
     /**
      * If logAbandoned=true, print a stack trace of the code that
@@ -206,7 +208,7 @@
             System.out.println(format.format(new Date(createdTime)));
             createdBy.printStackTrace(System.out);
         }
-        synchronized(this) {
+        synchronized(this.trace) {
             Iterator it = this.trace.iterator();
             while (it.hasNext()) {
                 AbandonedTrace at = (AbandonedTrace)it.next();
@@ -220,8 +222,8 @@
      *
      * @param trace AbandonedTrace object to remvoe
      */
-    protected synchronized void removeTrace(AbandonedTrace trace) {
-        if (this.trace != null) {
+    protected void removeTrace(AbandonedTrace trace) {
+        synchronized(this.trace) {
             this.trace.remove(trace);
         }
     }


Because Commons-dbcp 1.3 is not year released in formal. (now : 4. 2009) and I do not know what willing happen if the snapshop patch version, I used patched commons-dbcp 1.2.2 library.

After patching that version, there is no trouble or problem like that.


Posted by '김용환'
,