i have sometimes in a production application a thread stuck, that dont allow database get or return any connection, until i kill all running user task that is in state Vwait, i tried to check many times the running commands at database side, but without success to check what SQL command blocked tomcat, this time i get a java heap dump, and used Memory Analyzer (MAT) http://www.eclipse.org/mat/ to inspect this heap, this tool have an OQL feature that allow us to seek for specific objects, i tried:
SELECT t.mem.data.toString().substring(112) FROM com.sap.dbtech.jdbc.packet.RequestPacket t
this maybe shows current running commands, i say maybe because that data is out of order, and sometimes unknown for human reading.
SELECT dominatorof(t).toString(), t FROM com.sap.dbtech.jdbc.ConnectionSapDB t
this one shows active connections and for some of this connections the dominator is java.lang.ref.Finalizer and for anothers its blank.
anyone tried to check this type of problem using MAT?
any tips or experience are welcome.
it just get blocked
follow is the thread that block others:
"http-80-11" daemon prio=10 tid=0x0000002b0f582000 nid=0x2b7f runnable [0x000000004bd54000..0x000000004bd55db0] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at com.sap.dbtech.rte.comm.BasicSocketComm.receiveData(BasicSocketComm.java:577) at com.sap.dbtech.rte.comm.BasicSocketComm.receive(BasicSocketComm.java:666) at com.sap.dbtech.rte.comm.JdbcCommunication.execute(JdbcCommunication.java:41) at com.sap.dbtech.jdbc.ConnectionSapDB.execute(ConnectionSapDB.java:536) - locked <0x0000002aae91ea30> (a com.sap.dbtech.jdbc.ConnectionSapDB) at com.sap.dbtech.jdbc.ConnectionSapDB.execute(ConnectionSapDB.java:461) at com.sap.dbtech.jdbc.CallableStatementSapDB.execute(CallableStatementSapDB.java:441) at com.sap.dbtech.jdbc.CallableStatementSapDB.execute(CallableStatementSapDB.java:313) at com.sap.dbtech.jdbc.CallableStatementSapDB.executeUpdate(CallableStatementSapDB.java:778) at com.sap.dbtech.jdbc.trace.PreparedStatement.executeUpdate(PreparedStatement.java:81) at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:102) at org.exolab.castor.jdo.engine.SQLEngine.create(SQLEngine.java:616) at org.exolab.castor.jdo.engine.SQLEngine.create(SQLEngine.java:536) at org.exolab.castor.persist.ClassMolder.create(ClassMolder.java:891) at org.exolab.castor.persist.LockEngine.create(LockEngine.java:458) at org.exolab.castor.persist.TransactionContext.create(TransactionContext.java:803) - locked <0x0000002aae8be3e8> (a org.exolab.castor.jdo.engine.TransactionContextImpl) at org.exolab.castor.jdo.engine.DatabaseImpl.create(DatabaseImpl.java:338)
and is the Finalizer is BLOCKED because its waiting to lock <0x0000002aae91ea30> (a com.sap.dbtech.jdbc.ConnectionSapDB) as follow::
"Finalizer" daemon prio=10 tid=0x0000002b0af94800 nid=0x1ee0 waiting for monitor entry [0x0000000041531000..0x0000000041531b30] java.lang.Thread.State: BLOCKED (on object monitor) at com.sap.dbtech.jdbc.StatementSapDB.closeResultSet(StatementSapDB.java:189) - waiting to lock <0x0000002aae91ea30> (a com.sap.dbtech.jdbc.ConnectionSapDB) at com.sap.dbtech.jdbc.StatementSapDB.close(StatementSapDB.java:174) at com.sap.dbtech.jdbc.StatementSapDB.finalize(StatementSapDB.java:442) at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83) at java.lang.ref.Finalizer.access$100(Finalizer.java:14) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)
here we can see that something inside ConnectionSapDB locks something that no other threads can achieve a new connection, until thread http-80-11 reach ends the system get stuck, that will never occurs because the database user task is in Vwait.
as i cant check JDBC source to check why ConnectionSapDB have a lock inside, my trial is to check what SQL statement is blocking and generating the Vwait at database side.
i can only activate trace in production system, but i never activate to check this error because its so random that can occurs 2 times per day or never occurs for a month, and dont is related to high load, its related to one or more specific SQL statements that cause Vwait, and is this SQL statements i'm seeking. The other problem about trace is that i dont know how much bottleneck the trace will cause, since that is in production and the server have about many requests per minute, the last measure from DBAnalyser i get:
I SQL commands executed: 2533045, avg. 8416.26 per second
then i think in how to reach the wrong SQL in this amount?
this looks like an unfortunate phenomen that we already observed elsewhere:
- a statement (and/or its result set) was not closed after processing
- there is an out-of-memory situation triggering a garbage collection, triggering the finalizer to close the result/statement at the database
- while the finalizer runs, other threads are halted, including one that just sent a statement to the database and waits for the reply (which would be checked after the finalizer is finished)
- this leads to never-ending deadlock
I suggest trying to check where the statement comes from that the finalizer wants to close, and check that result sets
are appropriately closed.
but wait, analyzing the stack trace that i post the Finalizer is the one that is BLOCKED, and the thread http-80-11 is RUNNING, for my undestanding, the http-80-11 thread blocked ConnectionSapDB and is waiting for that Vwait at database side to end, and the Finalizer only executes when http-80-11 thread reach the end, then is really right that i must check Finalizer?
in time,i inspected in thread dump what thread http-80-11 was doing when the block thing occurs, its trying to insert data, but i think that the table in that case maybe have a table lock issued by an SELECT statement and is this maybe that causes the others enter in Vwait state, that i'm try to seek. i'm tell developers to put in every SELECT statement that maybe causes a table lock, insert a WITH LOCK ISOLATION LEVEL 0, to avoid table locks, that in commands generated by developers, but are commands generated by Castor, and this one we cant append MaxDB specific instruction. The problem i have is how to get commands running when this type of block situation occurs.