Issue Details (XML | Word | Printable)

Key: SFOS-994
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Steve Loughran
Reporter: Steve Loughran
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
SmartFrog

Hadoop tests failing with namenode locked. Assumption: we aren't terminating namenodes properly

Created: 07/Oct/08 05:04 PM (BST)   Updated: 25/Feb/09 04:52 PM (GMT)
Component/s: _service_hadoop
Affects Version/s: 3.17.004
Fix Version/s: 3.17.010

Time Tracking:
Not Specified

Compatibility: unknown


 Description  « Hide
The second or later time a name node comes up in the sf-hadoop tests, it fails with a locked directory error. This would imply another namenode instance holds the lock

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Steve Loughran added a comment - 07/Oct/08 05:04 PM (BST)
stack:
 at org.smartfrog.services.hadoop.core.SFHadoopException.forward(SFHadoopException.java:203)
at org.smartfrog.services.hadoop.components.cluster.HadoopServiceImpl.innerDeploy(HadoopServiceImpl.java:300)
at org.smartfrog.services.hadoop.components.cluster.HadoopServiceImpl.access$000(HadoopServiceImpl.java:41)
at org.smartfrog.services.hadoop.components.cluster.HadoopServiceImpl$ServiceDeployerThread.execute(HadoopServiceImpl.java:377)
at org.smartfrog.sfcore.utils.SmartFrogThread.run(SmartFrogThread.java:279)
at org.smartfrog.sfcore.utils.WorkflowThread.run(WorkflowThread.java:117)
Caused by: java.io.IOException: Cannot lock storage /tmp/hadoop/dfs/name. The directory is already locked.
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:511)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1062)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1085)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:83)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:310)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:289)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:164)
at org.apache.hadoop.hdfs.server.namenode.NameNode.innerStart(NameNode.java:225)
at org.apache.hadoop.util.Service.start(Service.java:183)
at org.smartfrog.services.hadoop.components.cluster.HadoopServiceImpl.innerDeploy(HadoopServiceImpl.java:294)

Steve Loughran added a comment - 07/Oct/08 05:29 PM (BST)
-consider adding some test for the locks being gone at the end of the run, and for ports being closed. Some simple health checks in the code itself

Steve Loughran added a comment - 25/Feb/09 04:52 PM (GMT)
done