Issue Details (XML | Word | Printable)

Key: SFOS-1086
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Steve Loughran
Reporter: Steve Loughran
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
SmartFrog

Possible deadlock in DelayedTerminator

Created: 27/Jan/09 05:15 PM (GMT)   Updated: 28/Jan/09 01:03 PM (GMT)
Component/s: .sfCore
Affects Version/s: 3.17.004
Fix Version/s: 3.17.010

Time Tracking:
Not Specified

Compatibility: backwards compatible


 Description  « Hide
the job submit test is failing on shutdown; possible cause appears to be the DelayedTerminator deadlocking

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Steve Loughran added a comment - 27/Jan/09 05:16 PM (GMT)
thread dump

D DUMP ===============
[sf-startdaemon-debug] Tue Jan 27 17:10:33 2009
[sf-startdaemon-debug] BEA JRockit(R) R27.4.0-90-89592-1.6.0_02-20070928-1715-linux-x86_64
[sf-startdaemon-debug] "(Signal Handler)" id=2 idx=0x8 tid=22796 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "(GC Main Thread)" id=3 idx=0xc tid=22797 prio=5 alive, in native, native_waiting, daemon
[sf-startdaemon-debug] "(GC Worker Thread 1)" id=? idx=0x10 tid=22798 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "(GC Worker Thread 2)" id=? idx=0x14 tid=22799 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "(GC Worker Thread 3)" id=? idx=0x18 tid=22800 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "(GC Worker Thread 4)" id=? idx=0x1c tid=22801 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "(Code Generation Thread 1)" id=4 idx=0x20 tid=22802 prio=5 alive, in native, native_waiting, daemon
[sf-startdaemon-debug] "(Code Optimization Thread 1)" id=5 idx=0x24 tid=22803 prio=5 alive, in native, native_waiting, daemon
[sf-startdaemon-debug] "(VM Periodic Task)" id=6 idx=0x28 tid=22804 prio=10 alive, in native, daemon
[sf-startdaemon-debug] "Finalizer" id=7 idx=0x2c tid=22805 prio=8 alive, in native, native_waiting, daemon
[sf-startdaemon-debug] at jrockit/memory/Finalizer.waitForFinalizees([Ljava/lang/Object;)I(Native Method)
[sf-startdaemon-debug] at jrockit/memory/Finalizer.access$500(Finalizer.java:12)
[sf-startdaemon-debug] at jrockit/memory/Finalizer$4.run(Finalizer.java:149)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "Reference Handler" id=8 idx=0x30 tid=22806 prio=10 alive, in native, native_waiting, daemon
[sf-startdaemon-debug] at java/lang/ref/Reference.getPending()Ljava/lang/ref/Reference;(Native Method)
[sf-startdaemon-debug] at java/lang/ref/Reference.access$000(Reference.java:24)
[sf-startdaemon-debug] at java/lang/ref/Reference$ReferenceHandler.run(Reference.java:144)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "(Sensor Event Thread)" id=9 idx=0x34 tid=22807 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "RMI TCP Accept-0" id=11 idx=0x38 tid=22810 prio=5 alive, in native, daemon
[sf-startdaemon-debug] at java/net/PlainSocketImpl.socketAccept(Ljava/net/SocketImpl;)V(Native Method)
[sf-startdaemon-debug] at java/net/PlainSocketImpl.accept(PlainSocketImpl.java:384)
[sf-startdaemon-debug] ^-- Holding lock: java/net/SocksSocketImpl@0x431f8bc0[thin lock]
[sf-startdaemon-debug] at java/net/ServerSocket.implAccept(ServerSocket.java:453)
[sf-startdaemon-debug] at java/net/ServerSocket.accept(ServerSocket.java:421)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$AcceptLoop.run(TCPTransport.java:341)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "RMI Reaper" id=12 idx=0x3c tid=22811 prio=5 alive, in native, waiting
[sf-startdaemon-debug] -- Waiting for notification on: java/lang/ref/ReferenceQueue$Lock@0x431f89e8[fat lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
[sf-startdaemon-debug] at java/lang/Object.wait(J)V(Native Method)
[sf-startdaemon-debug] at java/lang/ref/ReferenceQueue.remove(ReferenceQueue.java:116)
[sf-startdaemon-debug] ^-- Lock released while waiting: java/lang/ref/ReferenceQueue$Lock@0x431f89e8[fat lock]
[sf-startdaemon-debug] at java/lang/ref/ReferenceQueue.remove(ReferenceQueue.java:132)
[sf-startdaemon-debug] at sun/rmi/transport/ObjectTable$Reaper.run(ObjectTable.java:333)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "GC Daemon" id=13 idx=0x40 tid=22812 prio=2 alive, in native, waiting, daemon
[sf-startdaemon-debug] -- Waiting for notification on: sun/misc/GC$LatencyLock@0x431f9c78[fat lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
[sf-startdaemon-debug] at java/lang/Object.wait(J)V(Native Method)
[sf-startdaemon-debug] at sun/misc/GC$Daemon.run(GC.java:100)
[sf-startdaemon-debug] ^-- Lock released while waiting: sun/misc/GC$LatencyLock@0x431f9c78[fat lock]
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "LivenessSender_HOST morzine.hpl.hp.com:rootProcess" id=14 idx=0x44 tid=22813 prio=5 alive, in native, waiting
[sf-startdaemon-debug] -- Waiting for notification on: org/smartfrog/sfcore/prim/LivenessSender@0x434fe350[fat lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
[sf-startdaemon-debug] at java/lang/Object.wait(J)V(Native Method)
[sf-startdaemon-debug] at org/smartfrog/sfcore/common/Timer.timerWait(Timer.java:138)
[sf-startdaemon-debug] ^-- Lock released while waiting: org/smartfrog/sfcore/prim/LivenessSender@0x434fe350[fat lock]
[sf-startdaemon-debug] at org/smartfrog/sfcore/common/Timer.run(Timer.java:186)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "RMI TCP Accept-3800" id=15 idx=0x48 tid=22814 prio=5 alive, in native, daemon
[sf-startdaemon-debug] at java/net/PlainSocketImpl.socketAccept(Ljava/net/SocketImpl;)V(Native Method)
[sf-startdaemon-debug] at java/net/PlainSocketImpl.accept(PlainSocketImpl.java:384)
[sf-startdaemon-debug] ^-- Holding lock: java/net/SocksSocketImpl@0x43580f40[thin lock]
[sf-startdaemon-debug] at java/net/ServerSocket.implAccept(ServerSocket.java:453)
[sf-startdaemon-debug] at java/net/ServerSocket.accept(ServerSocket.java:421)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$AcceptLoop.run(TCPTransport.java:341)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "Thread-3" id=17 idx=0x50 tid=22795 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "RMI TCP Connection(5)-127.0.1.1" id=18 idx=0x54 tid=22816 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/services/assertions/TestCompoundImpl@0x43500f78[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/eventbus/EventCompoundImpl.deregister(EventCompoundImpl.java:142)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Reflect.invokeMethod(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
[sf-startdaemon-debug] at sun/reflect/NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
[sf-startdaemon-debug] at sun/reflect/NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[sf-startdaemon-debug] at sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[sf-startdaemon-debug] at java/lang/reflect/Method.invoke(Method.java:597)
[sf-startdaemon-debug] at sun/rmi/server/UnicastServerRef.dispatch(UnicastServerRef.java:305)
[sf-startdaemon-debug] at sun/rmi/transport/Transport$1.run(Transport.java:159)
[sf-startdaemon-debug] at jrockit/vm/AccessController.doPrivileged(AccessController.java:255)
[sf-startdaemon-debug] at sun/rmi/transport/Transport.serviceCall(Transport.java:155)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport.handleMessages(TCPTransport.java:535)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "RMI Scheduler(0)" id=19 idx=0x58 tid=22842 prio=5 alive, in native, parked, daemon
[sf-startdaemon-debug] at jrockit/vm/Locks.park0(J)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.park(Locks.java:2443)
[sf-startdaemon-debug] at sun/misc/Unsafe.park(ZJ)V(Native Method)
[sf-startdaemon-debug] at java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:198)
[sf-startdaemon-debug] at java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
[sf-startdaemon-debug] at java/util/concurrent/DelayQueue.take(DelayQueue.java:164)
[sf-startdaemon-debug] at java/util/concurrent/ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:582)
[sf-startdaemon-debug] at java/util/concurrent/ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:575)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:946)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:906)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "RMI TCP Connection(4)-127.0.1.1" id=20 idx=0x5c tid=22847 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/services/assertions/TestCompoundImpl@0x43500f78[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfDetachAndTerminate(PrimImpl.java:1326)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Reflect.invokeMethod(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
[sf-startdaemon-debug] at sun/reflect/NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
[sf-startdaemon-debug] at sun/reflect/NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[sf-startdaemon-debug] at sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[sf-startdaemon-debug] at java/lang/reflect/Method.invoke(Method.java:597)
[sf-startdaemon-debug] at sun/rmi/server/UnicastServerRef.dispatch(UnicastServerRef.java:305)
[sf-startdaemon-debug] at sun/rmi/transport/Transport$1.run(Transport.java:159)
[sf-startdaemon-debug] at jrockit/vm/AccessController.doPrivileged(AccessController.java:255)
[sf-startdaemon-debug] at sun/rmi/transport/Transport.serviceCall(Transport.java:155)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport.handleMessages(TCPTransport.java:535)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "LivenessSender_HOST morzine.hpl.hp.com:rootProcess:testLocalTaskTracker" id=21 idx=0x60 tid=22848 prio=5 alive, in native, waiting, daemon
[sf-startdaemon-debug] -- Waiting for notification on: org/smartfrog/sfcore/prim/LivenessSender@0x410c23d8[fat lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
[sf-startdaemon-debug] at java/lang/Object.wait(J)V(Native Method)
[sf-startdaemon-debug] at org/smartfrog/sfcore/common/Timer.timerWait(Timer.java:138)
[sf-startdaemon-debug] ^-- Lock released while waiting: org/smartfrog/sfcore/prim/LivenessSender@0x410c23d8[fat lock]
[sf-startdaemon-debug] at org/smartfrog/sfcore/common/Timer.run(Timer.java:186)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "RMI RenewClean-[127.0.1.1:36769]" id=22 idx=0x64 tid=22851 prio=5 alive, in native, waiting, daemon
[sf-startdaemon-debug] -- Waiting for notification on: java/lang/ref/ReferenceQueue$Lock@0x410d15c8[fat lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
[sf-startdaemon-debug] at java/lang/Object.wait(J)V(Native Method)
[sf-startdaemon-debug] at java/lang/ref/ReferenceQueue.remove(ReferenceQueue.java:116)
[sf-startdaemon-debug] ^-- Lock released while waiting: java/lang/ref/ReferenceQueue$Lock@0x410d15c8[fat lock]
[sf-startdaemon-debug] at sun/rmi/transport/DGCClient$EndpointEntry$RenewCleanThread.run(DGCClient.java:516)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "Thread-10" id=29 idx=0x78 tid=22857 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x43501208[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.isForcedShutdown(DelayedTerminator.java:207)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.endTestRun(TestCompoundImpl.java:808)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.onChildTerminated(TestCompoundImpl.java:631)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/eventbus/EventCompoundImpl.sfTerminatedWith(EventCompoundImpl.java:247)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.terminateNotifying(PrimImpl.java:1254)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfTerminate(PrimImpl.java:1177)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.run(DelayedTerminator.java:180)
[sf-startdaemon-debug] ^-- Holding lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x412ced68[thin lock]
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "Thread-14" id=35 idx=0x90 tid=22865 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x412ced68[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.shutdown(DelayedTerminator.java:111)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.shutdown(TestCompoundImpl.java:467)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.sfTerminateWith(TestCompoundImpl.java:453)
[sf-startdaemon-debug] ^-- Holding lock: org/smartfrog/services/assertions/TestCompoundImpl@0x43500f78[thin lock]
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.terminateNotifying(PrimImpl.java:1243)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfTerminate(PrimImpl.java:1177)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.onChildTerminated(TestCompoundImpl.java:641)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/eventbus/EventCompoundImpl.sfTerminatedWith(EventCompoundImpl.java:247)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.terminateNotifying(PrimImpl.java:1254)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfTerminate(PrimImpl.java:1177)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.run(DelayedTerminator.java:180)
[sf-startdaemon-debug] ^-- Holding lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x43501208[thin lock]
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "LeaseChecker" id=149 idx=0x258 tid=22990 prio=5 alive, in native, sleeping, native_waiting, daemon
[sf-startdaemon-debug] at java/lang/Thread.sleep(J)V(Native Method)
[sf-startdaemon-debug] at org/apache/hadoop/hdfs/DFSClient$LeaseChecker.run(DFSClient.java:957)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)

Steve Loughran added a comment - 27/Jan/09 05:19 PM (GMT)
the DelayedTerminator is used in the TestBlockImpl and TestCompoundImpl; its in the latter that the problem appears to have arisen

Steve Loughran added a comment - 28/Jan/09 11:20 AM (GMT)
[sf-startdaemon-debug] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
[sf-startdaemon-debug] at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:305)
[sf-startdaemon-debug] ... 44 more
[sf-startdaemon-debug] 09/01/27 17:07:53 [TerminatorThread] INFO mapred.ExtJobTracker : State change: JobTracker is now TERMINATED
[sf-startdaemon-debug] 09/01/27 17:07:53 [TerminatorThread] INFO mapred.JobTracker : Stopping infoServer
[sf-startdaemon-debug] 09/01/27 17:07:53 [TerminatorThread] INFO mapred.JobTracker : Stopping interTrackerServer
[sf-startdaemon-debug] 09/01/27 17:07:53 [TerminatorThread] INFO ipc.Server : Stopping server on 8012
[sf-startdaemon-debug] 09/01/27 17:07:53 [IPC Server handler 0 on 8012] INFO ipc.Server : IPC Server handler 0 on 8012: exiting
[sf-startdaemon-debug] 09/01/27 17:07:53 [IPC Server handler 1 on 8012] INFO ipc.Server : IPC Server handler 1 on 8012: exiting
[sf-startdaemon-debug] 09/01/27 17:07:53 [IPC Server handler 2 on 8012] INFO ipc.Server : IPC Server handler 2 on 8012: exiting
[sf-startdaemon-debug] 09/01/27 17:07:53 [IPC Server handler 3 on 8012] INFO ipc.Server : IPC Server handler 3 on 8012: exiting
[sf-startdaemon-debug] 09/01/27 17:07:53 [IPC Server listener on 8012] INFO ipc.Server : Stopping IPC Server listener on 8012
[sf-startdaemon-debug] 09/01/27 17:07:53 [IPC Server Responder] INFO ipc.Server : Stopping IPC Server Responder
[sf-startdaemon-debug] 09/01/27 17:07:53 [TerminatorThread] INFO mapred.JobTracker : Stopping expireTrackersThread
[sf-startdaemon-debug] 09/01/27 17:07:53 [TerminatorThread] INFO mapred.JobTracker : Stopping retirer
[sf-startdaemon-debug] 09/01/27 17:07:53 [TerminatorThread] INFO mapred.EagerTaskInitializationListener : Stopping initer
[sf-startdaemon-debug] 09/01/27 17:07:53 [TerminatorThread] INFO mapred.JobTracker : Stopping expireLaunchingTasks
[sf-startdaemon-debug] 09/01/27 17:07:53 [TerminatorThread] INFO mapred.JobTracker : stopped all jobtracker services
[sf-startdaemon-debug] 09/01/27 17:07:53 [TerminatorThread] WARN mapred.JobTracker : When closing the filesystem: java.io.IOException: Filesystem closed
[sf-startdaemon-debug] java.io.IOException: Filesystem closed
[sf-startdaemon-debug] at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
[sf-startdaemon-debug] at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:208)
[sf-startdaemon-debug] at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:260)
[sf-startdaemon-debug] at org.apache.hadoop.mapred.JobTracker.closeTheFilesystemQuietly(JobTracker.java:1700)
[sf-startdaemon-debug] at org.apache.hadoop.mapred.ExtJobTracker.innerClose(ExtJobTracker.java:75)
[sf-startdaemon-debug] at org.apache.hadoop.util.Service.close(Service.java:312)
[sf-startdaemon-debug] at org.smartfrog.services.hadoop.components.cluster.HadoopServiceImpl.terminateService(HadoopServiceImpl.java:354)
[sf-startdaemon-debug] at org.smartfrog.services.hadoop.components.cluster.HadoopServiceImpl.terminateHadoopService(HadoopServiceImpl.java:336)
[sf-startdaemon-debug] at org.smartfrog.services.hadoop.components.cluster.HadoopServiceImpl.sfTerminateWith(HadoopServiceImpl.java:104)
[sf-startdaemon-debug] at org.smartfrog.sfcore.prim.PrimImpl.terminateNotifying(PrimImpl.java:1243)
[sf-startdaemon-debug] at org.smartfrog.sfcore.prim.PrimImpl.sfTerminateQuietlyWith(PrimImpl.java:1356)
[sf-startdaemon-debug] at org.smartfrog.sfcore.common.TerminatorThread.execute(TerminatorThread.java:211)
[sf-startdaemon-debug] at org.smartfrog.sfcore.utils.SmartFrogThread.run(SmartFrogThread.java:279)
[sf-startdaemon-debug] ===== FULL THREAD DUMP ===============
[sf-startdaemon-debug] Tue Jan 27 17:10:33 2009
[sf-startdaemon-debug] BEA JRockit(R) R27.4.0-90-89592-1.6.0_02-20070928-1715-linux-x86_64
[sf-startdaemon-debug] "(Signal Handler)" id=2 idx=0x8 tid=22796 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "(GC Main Thread)" id=3 idx=0xc tid=22797 prio=5 alive, in native, native_waiting, daemon
[sf-startdaemon-debug] "(GC Worker Thread 1)" id=? idx=0x10 tid=22798 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "(GC Worker Thread 2)" id=? idx=0x14 tid=22799 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "(GC Worker Thread 3)" id=? idx=0x18 tid=22800 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "(GC Worker Thread 4)" id=? idx=0x1c tid=22801 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "(Code Generation Thread 1)" id=4 idx=0x20 tid=22802 prio=5 alive, in native, native_waiting, daemon
[sf-startdaemon-debug] "(Code Optimization Thread 1)" id=5 idx=0x24 tid=22803 prio=5 alive, in native, native_waiting, daemon
[sf-startdaemon-debug] "(VM Periodic Task)" id=6 idx=0x28 tid=22804 prio=10 alive, in native, daemon
[sf-startdaemon-debug] "Finalizer" id=7 idx=0x2c tid=22805 prio=8 alive, in native, native_waiting, daemon
[sf-startdaemon-debug] at jrockit/memory/Finalizer.waitForFinalizees([Ljava/lang/Object;)I(Native Method)
[sf-startdaemon-debug] at jrockit/memory/Finalizer.access$500(Finalizer.java:12)
[sf-startdaemon-debug] at jrockit/memory/Finalizer$4.run(Finalizer.java:149)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "Reference Handler" id=8 idx=0x30 tid=22806 prio=10 alive, in native, native_waiting, daemon
[sf-startdaemon-debug] at java/lang/ref/Reference.getPending()Ljava/lang/ref/Reference;(Native Method)
[sf-startdaemon-debug] at java/lang/ref/Reference.access$000(Reference.java:24)
[sf-startdaemon-debug] at java/lang/ref/Reference$ReferenceHandler.run(Reference.java:144)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "(Sensor Event Thread)" id=9 idx=0x34 tid=22807 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "RMI TCP Accept-0" id=11 idx=0x38 tid=22810 prio=5 alive, in native, daemon
[sf-startdaemon-debug] at java/net/PlainSocketImpl.socketAccept(Ljava/net/SocketImpl;)V(Native Method)
[sf-startdaemon-debug] at java/net/PlainSocketImpl.accept(PlainSocketImpl.java:384)
[sf-startdaemon-debug] ^-- Holding lock: java/net/SocksSocketImpl@0x431f8bc0[thin lock]
[sf-startdaemon-debug] at java/net/ServerSocket.implAccept(ServerSocket.java:453)
[sf-startdaemon-debug] at java/net/ServerSocket.accept(ServerSocket.java:421)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$AcceptLoop.run(TCPTransport.java:341)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "RMI Reaper" id=12 idx=0x3c tid=22811 prio=5 alive, in native, waiting
[sf-startdaemon-debug] -- Waiting for notification on: java/lang/ref/ReferenceQueue$Lock@0x431f89e8[fat lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
[sf-startdaemon-debug] at java/lang/Object.wait(J)V(Native Method)
[sf-startdaemon-debug] at java/lang/ref/ReferenceQueue.remove(ReferenceQueue.java:116)
[sf-startdaemon-debug] ^-- Lock released while waiting: java/lang/ref/ReferenceQueue$Lock@0x431f89e8[fat lock]
[sf-startdaemon-debug] at java/lang/ref/ReferenceQueue.remove(ReferenceQueue.java:132)
[sf-startdaemon-debug] at sun/rmi/transport/ObjectTable$Reaper.run(ObjectTable.java:333)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "GC Daemon" id=13 idx=0x40 tid=22812 prio=2 alive, in native, waiting, daemon
[sf-startdaemon-debug] -- Waiting for notification on: sun/misc/GC$LatencyLock@0x431f9c78[fat lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
[sf-startdaemon-debug] at java/lang/Object.wait(J)V(Native Method)
[sf-startdaemon-debug] at sun/misc/GC$Daemon.run(GC.java:100)
[sf-startdaemon-debug] ^-- Lock released while waiting: sun/misc/GC$LatencyLock@0x431f9c78[fat lock]
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "LivenessSender_HOST morzine.hpl.hp.com:rootProcess" id=14 idx=0x44 tid=22813 prio=5 alive, in native, waiting
[sf-startdaemon-debug] -- Waiting for notification on: org/smartfrog/sfcore/prim/LivenessSender@0x434fe350[fat lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
[sf-startdaemon-debug] at java/lang/Object.wait(J)V(Native Method)
[sf-startdaemon-debug] at org/smartfrog/sfcore/common/Timer.timerWait(Timer.java:138)
[sf-startdaemon-debug] ^-- Lock released while waiting: org/smartfrog/sfcore/prim/LivenessSender@0x434fe350[fat lock]
[sf-startdaemon-debug] at org/smartfrog/sfcore/common/Timer.run(Timer.java:186)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "RMI TCP Accept-3800" id=15 idx=0x48 tid=22814 prio=5 alive, in native, daemon
[sf-startdaemon-debug] at java/net/PlainSocketImpl.socketAccept(Ljava/net/SocketImpl;)V(Native Method)
[sf-startdaemon-debug] at java/net/PlainSocketImpl.accept(PlainSocketImpl.java:384)
[sf-startdaemon-debug] ^-- Holding lock: java/net/SocksSocketImpl@0x43580f40[thin lock]
[sf-startdaemon-debug] at java/net/ServerSocket.implAccept(ServerSocket.java:453)
[sf-startdaemon-debug] at java/net/ServerSocket.accept(ServerSocket.java:421)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$AcceptLoop.run(TCPTransport.java:341)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "Thread-3" id=17 idx=0x50 tid=22795 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "RMI TCP Connection(5)-127.0.1.1" id=18 idx=0x54 tid=22816 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/services/assertions/TestCompoundImpl@0x43500f78[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/eventbus/EventCompoundImpl.deregister(EventCompoundImpl.java:142)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Reflect.invokeMethod(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
[sf-startdaemon-debug] at sun/reflect/NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
[sf-startdaemon-debug] at sun/reflect/NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[sf-startdaemon-debug] at sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[sf-startdaemon-debug] at java/lang/reflect/Method.invoke(Method.java:597)
[sf-startdaemon-debug] at sun/rmi/server/UnicastServerRef.dispatch(UnicastServerRef.java:305)
[sf-startdaemon-debug] at sun/rmi/transport/Transport$1.run(Transport.java:159)
[sf-startdaemon-debug] at jrockit/vm/AccessController.doPrivileged(AccessController.java:255)
[sf-startdaemon-debug] at sun/rmi/transport/Transport.serviceCall(Transport.java:155)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport.handleMessages(TCPTransport.java:535)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "RMI Scheduler(0)" id=19 idx=0x58 tid=22842 prio=5 alive, in native, parked, daemon
[sf-startdaemon-debug] at jrockit/vm/Locks.park0(J)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.park(Locks.java:2443)
[sf-startdaemon-debug] at sun/misc/Unsafe.park(ZJ)V(Native Method)
[sf-startdaemon-debug] at java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:198)
[sf-startdaemon-debug] at java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
[sf-startdaemon-debug] at java/util/concurrent/DelayQueue.take(DelayQueue.java:164)
[sf-startdaemon-debug] at java/util/concurrent/ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:582)
[sf-startdaemon-debug] at java/util/concurrent/ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:575)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:946)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:906)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "RMI TCP Connection(4)-127.0.1.1" id=20 idx=0x5c tid=22847 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/services/assertions/TestCompoundImpl@0x43500f78[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfDetachAndTerminate(PrimImpl.java:1326)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Reflect.invokeMethod(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
[sf-startdaemon-debug] at sun/reflect/NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Native Method)
[sf-startdaemon-debug] at sun/reflect/NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[sf-startdaemon-debug] at sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[sf-startdaemon-debug] at java/lang/reflect/Method.invoke(Method.java:597)
[sf-startdaemon-debug] at sun/rmi/server/UnicastServerRef.dispatch(UnicastServerRef.java:305)
[sf-startdaemon-debug] at sun/rmi/transport/Transport$1.run(Transport.java:159)
[sf-startdaemon-debug] at jrockit/vm/AccessController.doPrivileged(AccessController.java:255)
[sf-startdaemon-debug] at sun/rmi/transport/Transport.serviceCall(Transport.java:155)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport.handleMessages(TCPTransport.java:535)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
[sf-startdaemon-debug] at sun/rmi/transport/tcp/TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
[sf-startdaemon-debug] at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "LivenessSender_HOST morzine.hpl.hp.com:rootProcess:testLocalTaskTracker" id=21 idx=0x60 tid=22848 prio=5 alive, in native, waiting, daemon
[sf-startdaemon-debug] -- Waiting for notification on: org/smartfrog/sfcore/prim/LivenessSender@0x410c23d8[fat lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
[sf-startdaemon-debug] at java/lang/Object.wait(J)V(Native Method)
[sf-startdaemon-debug] at org/smartfrog/sfcore/common/Timer.timerWait(Timer.java:138)
[sf-startdaemon-debug] ^-- Lock released while waiting: org/smartfrog/sfcore/prim/LivenessSender@0x410c23d8[fat lock]
[sf-startdaemon-debug] at org/smartfrog/sfcore/common/Timer.run(Timer.java:186)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "RMI RenewClean-[127.0.1.1:36769]" id=22 idx=0x64 tid=22851 prio=5 alive, in native, waiting, daemon
[sf-startdaemon-debug] -- Waiting for notification on: java/lang/ref/ReferenceQueue$Lock@0x410d15c8[fat lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
[sf-startdaemon-debug] at java/lang/Object.wait(J)V(Native Method)
[sf-startdaemon-debug] at java/lang/ref/ReferenceQueue.remove(ReferenceQueue.java:116)
[sf-startdaemon-debug] ^-- Lock released while waiting: java/lang/ref/ReferenceQueue$Lock@0x410d15c8[fat lock]
[sf-startdaemon-debug] at sun/rmi/transport/DGCClient$EndpointEntry$RenewCleanThread.run(DGCClient.java:516)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "Thread-10" id=29 idx=0x78 tid=22857 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x43501208[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.isForcedShutdown(DelayedTerminator.java:207)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.endTestRun(TestCompoundImpl.java:808)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.onChildTerminated(TestCompoundImpl.java:631)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/eventbus/EventCompoundImpl.sfTerminatedWith(EventCompoundImpl.java:247)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.terminateNotifying(PrimImpl.java:1254)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfTerminate(PrimImpl.java:1177)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.run(DelayedTerminator.java:180)
[sf-startdaemon-debug] ^-- Holding lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x412ced68[thin lock]
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "Thread-14" id=35 idx=0x90 tid=22865 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x412ced68[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.shutdown(DelayedTerminator.java:111)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.shutdown(TestCompoundImpl.java:467)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.sfTerminateWith(TestCompoundImpl.java:453)
[sf-startdaemon-debug] ^-- Holding lock: org/smartfrog/services/assertions/TestCompoundImpl@0x43500f78[thin lock]
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.terminateNotifying(PrimImpl.java:1243)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfTerminate(PrimImpl.java:1177)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.onChildTerminated(TestCompoundImpl.java:641)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/eventbus/EventCompoundImpl.sfTerminatedWith(EventCompoundImpl.java:247)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.terminateNotifying(PrimImpl.java:1254)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfTerminate(PrimImpl.java:1177)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.run(DelayedTerminator.java:180)
[sf-startdaemon-debug] ^-- Holding lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x43501208[thin lock]
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace
[sf-startdaemon-debug] "LeaseChecker" id=149 idx=0x258 tid=22990 prio=5 alive, in native, sleeping, native_waiting, daemon
[sf-startdaemon-debug] at java/lang/Thread.sleep(J)V(Native Method)
[sf-startdaemon-debug] at org/apache/hadoop/hdfs/DFSClient$LeaseChecker.run(DFSClient.java:957)
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)



What's going on?


1. An RMI call is blocked in EventCompoundImpl.deregister waiting for a lock on
TestCompoundImpl@0x43500f78

[sf-startdaemon-debug] "Thread-3" id=17 idx=0x50 tid=22795 prio=5 alive, in native, daemon
[sf-startdaemon-debug] "RMI TCP Connection(5)-127.0.1.1" id=18 idx=0x54 tid=22816 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/services/assertions/TestCompoundImpl@0x43500f78[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/eventbus/EventCompoundImpl.deregister(EventCompoundImpl.java:142)

2. Another RMI call is in PrimImpl.sfDetachAndTerminate for the same Component,
TestCompoundImpl@0x43500f78

[sf-startdaemon-debug] "RMI TCP Connection(4)-127.0.1.1" id=20 idx=0x5c tid=22847 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/services/assertions/TestCompoundImpl@0x43500f78[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfDetachAndTerminate(PrimImpl.java:1326)

3. DelayedTerminator@0x412ced68 (in sychronized) has run and triggered its own sfTerminate, which is raising an onChildTerminated() of the parent, which is then trying to shut down the other DelayedTerminator, DelayedTerminator@0x43501208

is trying to get a lock on itself in the isForcedShutdown() operation
in its sfTerminateWith() operation

[sf-startdaemon-debug] "Thread-10" id=29 idx=0x78 tid=22857 prio=5 alive, in native, blocked, daemon
[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x43501208[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.isForcedShutdown(DelayedTerminator.java:207)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.endTestRun(TestCompoundImpl.java:808)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.onChildTerminated(TestCompoundImpl.java:631)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/eventbus/EventCompoundImpl.sfTerminatedWith(EventCompoundImpl.java:247)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.terminateNotifying(PrimImpl.java:1254)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfTerminate(PrimImpl.java:1177)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.run(DelayedTerminator.java:180)
[sf-startdaemon-debug] ^-- Holding lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x412ced68[thin lock]
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)
[sf-startdaemon-debug] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[sf-startdaemon-debug] -- end of trace

4. Something holding TestCompoundImpl's lock is trying to get the lock of DelayedTerminator
for DelayedTerminator.shutdown(). The TestCompoundImpl's sfTerminate() has been called, which triggers a notification of the child terminating


The DelayedTerminator@0x43501208 has triggered sfTerminate() of its managed component; this
has triggered a a call to onChildTerminated() of the parent, which triggers an sfTerminateWith() of
TestCompoundImpl@0x43500f78; this is blocked trying to get the lock of DelayedTerminator@0x412ced68

[sf-startdaemon-debug] -- Blocked trying to get lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x412ced68[thin lock]
[sf-startdaemon-debug] at jrockit/vm/Threads.sleep(I)V(Native Method)
[sf-startdaemon-debug] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[sf-startdaemon-debug] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.shutdown(DelayedTerminator.java:111)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.shutdown(TestCompoundImpl.java:467)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.sfTerminateWith(TestCompoundImpl.java:453)
[sf-startdaemon-debug] ^-- Holding lock: org/smartfrog/services/assertions/TestCompoundImpl@0x43500f78[thin lock]
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.terminateNotifying(PrimImpl.java:1243)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfTerminate(PrimImpl.java:1177)
[sf-startdaemon-debug] at org/smartfrog/services/assertions/TestCompoundImpl.onChildTerminated(TestCompoundImpl.java:641)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/eventbus/EventCompoundImpl.sfTerminatedWith(EventCompoundImpl.java:247)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.terminateNotifying(PrimImpl.java:1254)
[sf-startdaemon-debug] at org/smartfrog/sfcore/prim/PrimImpl.sfTerminate(PrimImpl.java:1177)
[sf-startdaemon-debug] at org/smartfrog/sfcore/workflow/combinators/DelayedTerminator.run(DelayedTerminator.java:180)
[sf-startdaemon-debug] ^-- Holding lock: org/smartfrog/sfcore/workflow/combinators/DelayedTerminator@0x43501208[thin lock]
[sf-startdaemon-debug] at java/lang/Thread.run(Thread.java:619)

Steve Loughran added a comment - 28/Jan/09 11:23 AM (GMT)
summary. There are two DelayedTerminator instances, both have triggered termination; both trigger the termination of their common parent TestCompoundImpl instance, which then tries to terminate both DelayedTerminators. Only it can't, because they are deadlocking each other.

1. the isForcedShutdown() should go unsync; that variable must go to being volatile. But is that enough?

2. the run() operation in the delayed terminator should be rewritten to not hold any locks on self.

3. What about TestCompoundImpl?

Steve Loughran added a comment - 28/Jan/09 01:03 PM (GMT)
removing synchronization from the isShutdown() method was enough to take this DL away, but I cut back on the synchronized sections of all the classes at fault to make the risk even smaller.