Issue Details (XML | Word | Printable)

Key: SFOS-1079
Type: New Feature New Feature
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Steve Loughran
Reporter: Steve Loughran
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
SmartFrog

entry points could report network connectivity problems better

Created: 23/Jan/09 05:37 PM (GMT)   Updated: 23/Jan/09 10:37 PM (GMT)
Component/s: .sfCore
Affects Version/s: 3.17.004
Fix Version/s: 3.17.010

Time Tracking:
Not Specified

Environment:
Home, desktop PC running linux, trying to talk to a VMWare image of Centos running on a nearby laptop, connected by ethernet. LAN is a private 192.168.2.1 DNS, laptop is an out-the-box Centos image with minimal customisation and hostname set to localhost.localnet, IPAddr given by the DHCP server in the wifi box. There is no DNS server on this network.

Compatibility: backwards compatible


 Description  « Hide
When sfPing fails it could be more helpful

2009/01/23 17:32:28:483 GMT [WARN ][main] SFCORE_LOG - SmartFrog security is NOT active
 - FAILED when trying PING of '-v', host:192.168.1.6
    Result:
      * Exception: 'Unable to connect to sfDaemon on: 192.168.1.6.
        Reason: sfDaemon may not be running on 192.168.1.6java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
         java.net.ConnectException: Connection refused'


1. Prettier layout
2. Better stack trace (how?)

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Steve Loughran added a comment - 23/Jan/09 06:35 PM (GMT)
also, misleading errors

2009/01/23 18:34:47:728 GMT [WARN ][main] SFCORE_LOG - SmartFrog security is NOT active
 - FAILED when trying PING of 'rootProcess', host:192.168.1.6
    Result:
      * Exception: 'SmartFrogException:: Unable to connect to sfDaemon on: /192.168.1.6.
        Reason: sfDaemon may not be authenticated properly
        cause: java.rmi.ConnectIOException: Exception creating connection to: 192.168.1.6; nested exception is:
         java.net.NoRouteToHostException: No route to host
        SmartFrog 3.17.005dev (2009-01-23 16:27:10 GMT)'

Steve Loughran added a comment - 23/Jan/09 07:16 PM (GMT)
A big problem here is that the stack trace is nonexistent, it comes from a different thread or something. To get the stack trace, you need that of the wrapping exception,

org.smartfrog.sfcore.processcompound.SFProcess.sfSelectTargetProcess(SFProcess.java:573)
org.smartfrog.sfcore.processcompound.SFProcess.sfSelectTargetProcess(SFProcess.java:513)
org.smartfrog.sfcore.common.ConfigurationAction.selectTargetProcess(ConfigurationAction.java:48)
...

It is happening in DefaultRootLocatorImpl, in ProcessCompound getRootProcessCompound(InetAddress hostAddress,
        int portNum)

here:
ProcessCompound pc = (ProcessCompound) reg.lookup(defaultName);

Steve Loughran added a comment - 23/Jan/09 08:03 PM (GMT)
the problem here is that the reg variable is an instance of Registry, created via LocateRegistry.getRegistry(host, port);, and it is just a proxy of (address,port). There is no attempt to reach the host/port until the RMI call is made, at which point you are at the mercy of RMI and its errors.


I propose that for the ping command, we start by trying to open a tcp connection to the port, catching early instances of
NoRouteToHost: network problems, laptop has moved
hostname doesnt resolve
port connection refused

Steve Loughran added a comment - 23/Jan/09 08:37 PM (GMT)
ping can also get into trouble in ActionPing.execute() in
targetC = (Prim) targetP.sfResolveWithParser(name);

It is here where the misleading ip address error message is creeping in
java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
java.net.ConnectException: Connection refused

Steve Loughran added a comment - 23/Jan/09 08:41 PM (GMT)
more stack trace

   Result:
      * Exception: 'Unable to connect to sfDaemon on: 192.168.1.6.
        Reason: sfDaemon may not be running on 192.168.1.6java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
         java.net.ConnectException: Connection refused'
      * StackTrace: '
          java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
           java.net.ConnectException: Connection refused
           at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
           at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
           at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
           at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:128)
           at org.smartfrog.sfcore.processcompound.ProcessCompoundImpl_Stub.sfPing(Unknown Source)
           at org.smartfrog.sfcore.common.ActionPing.ping(ActionPing.java:97)
           at org.smartfrog.sfcore.common.ActionPing.execute(ActionPing.java:57)
           at org.smartfrog.sfcore.common.ConfigurationAction.execute(ConfigurationAction.java:113)
           at org.smartfrog.sfcore.common.ConfigurationDescriptor.execute(ConfigurationDescriptor.java:1076)
           at org.smartfrog.SFSystem.runConfigurationDescriptor(SFSystem.java:367)
           at org.smartfrog.SFSystem.runConfigurationDescriptor(SFSystem.java:346)
           at org.smartfrog.SFSystem.runConfigurationDescriptors(SFSystem.java:303)
           at org.smartfrog.SFSystem.runSmartFrog(SFSystem.java:591)
           at org.smartfrog.SFSystem.execute(SFSystem.java:434)
           at org.smartfrog.SFSystem.main(SFSystem.java:395)
          Caused by: java.net.ConnectException: Connection refused
           at java.net.PlainSocketImpl.socketConnect(Native Method)
           at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
           at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
           at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
           at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:381)
           at java.net.Socket.connect(Socket.java:537)
           at java.net.Socket.connect(Socket.java:487)
           at java.net.Socket.<init>(Socket.java:384)
           at java.net.Socket.<init>(Socket.java:198)
           at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
           at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:146)
           at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
           ... 14 more
          '
      * Command line SFACT: ':PING:::192.168.1.6:'
      * To String: ', type:PING, url:, host:192.168.1.6, resultType:FAILED, resultExceptionMessage:Connection refused to host: 127.0.0.1; nested exception is:
java.net.ConnectException: Connection refused, java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: , java.net.ConnectException: Connection refused, at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619), at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216), at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202), at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:128), at org.smartfrog.sfcore.processcompound.ProcessCompoundImpl_Stub.sfPing(Unknown Source), at org.smartfrog.sfcore.common.ActionPing.ping(ActionPing.java:97), at org.smartfrog.sfcore.common.ActionPing.execute(ActionPing.java:57), at org.smartfrog.sfcore.common.ConfigurationAction.execute(ConfigurationAction.java:113), at org.smartfrog.sfcore.common.ConfigurationDescriptor.execute(ConfigurationDescriptor.java:1076), at org.smartfrog.SFSystem.runConfigurationDescriptor(SFSystem.java:367), at org.smartfrog.SFSystem.runConfigurationDescriptor(SFSystem.java:346), at org.smartfrog.SFSystem.runConfigurationDescriptors(SFSystem.java:303), at org.smartfrog.SFSystem.runSmartFrog(SFSystem.java:591), at org.smartfrog.SFSystem.execute(SFSystem.java:434), at org.smartfrog.SFSystem.main(SFSystem.java:395), Caused by: java.net.ConnectException: Connection refused, at java.net.PlainSocketImpl.socketConnect(Native Method), at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310), at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176), at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163), at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:381), at java.net.Socket.connect(Socket.java:537), at java.net.Socket.connect(Socket.java:487), at java.net.Socket.<init>(Socket.java:384), at java.net.Socket.<init>(Socket.java:198), at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40), at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:146), at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613), ... 14 more, '

Steve Loughran added a comment - 23/Jan/09 08:58 PM (GMT)
It looks like the root cause here is the remote endpoint is returning invalid remote object references, 127.0.0.1 as the IPAddress,
ProcessCompoundImpl_Stub[UnicastRef [liveRef: [endpoint:[127.0.0.1:53194](remote),objID:[-a1b19d7:11f05149fc4:-7fff, -7266556617222224901]]]]

We can talk to the remote registry, but when you try to talk to anything it returns, error with Connection refused to host: 127.0.0.1, the joke being that if you had a local daemon, it would be irrelevant -the reference would be wrong and you are still going to miss pinging the far end.

We may be able to add some special diags for this;
-When reg.lookup(defaultName) returns a process compound, we do a toString() and look for that 127.0.0.1 value, and warn that there may be trouble here, as the machine doesn't know who it is. No bailout, just a warning, so that when the problem arises elsewhere, the root cause is more obvious

Steve Loughran added a comment - 23/Jan/09 10:02 PM (GMT)
sfDeploy isnt happy either
namenode:
   [deploy]
   [deploy] SmartFrog 3.17.005dev (2009-01-23 16:27:10 GMT)
   [deploy] (C) Copyright 1998-2009 Hewlett-Packard Development Company, LP
   [deploy]
   [deploy] 2009/01/23 22:01:15:791 GMT [WARN ][main] SFCORE_LOG - SmartFrog security is NOT active
   [deploy] 2009/01/23 22:01:15:796 GMT [WARN ][main] SFCORE_LOG - Warning: stack trace logging enabled
   [deploy] 2009/01/23 22:01:15:810 GMT [WARN ][main] SFCORE_LOG - Possible problem with classpath:
   [deploy] 2 occurrences for sf-anubis
   [deploy] /home/slo/.ivy/cache/org.smartfrog/sf-anubis/jars/sf-anubis-3.17.005dev.jar
   [deploy] /home/slo/.ivy/cache/org.smartfrog/sf-anubis/jars/sf-anubis-3.17.005dev.jar
   [deploy]
   [deploy] 2009/01/23 22:01:15:817 GMT [WARN ][main] SFCORE_LOG - Possible problem with codebase: 2 occurrences for sf-anubis
   [deploy] /home/slo/.ivy/cache/org.smartfrog/sf-anubis/jars/sf-anubis-3.17.005dev.jar file
   [deploy] /home/slo/.ivy/cache/org.smartfrog/sf-anubis/jars/sf-anubis-3.17.005dev.jar file
   [deploy]
   [deploy] - FAILED when trying DEPLOY of 'namenode', [/org/smartfrog/extras/hadoop/cluster/services/namenode/default.sf], host:192.168.1.6
   [deploy] Result:
   [deploy] * Exception: 'SmartFrogException:: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
   [deploy] java.net.ConnectException: Connection refused
   [deploy] SmartFrog 3.17.005dev (2009-01-23 16:27:10 GMT)'
   [deploy] * StackTrace: '
   [deploy] SmartFrogException:: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
   [deploy] java.net.ConnectException: Connection refused, SmartFrog 3.17.005dev (2009-01-23 16:27:10 GMT)
   [deploy] at org.smartfrog.sfcore.common.SmartFrogException.forward(SmartFrogException.java:92)
   [deploy] at org.smartfrog.sfcore.common.ActionDeploy.Deploy(ActionDeploy.java:192)
   [deploy] at org.smartfrog.sfcore.common.ActionDeploy.doDeploy(ActionDeploy.java:297)
   [deploy] at org.smartfrog.sfcore.common.ActionDeploy.execute(ActionDeploy.java:272)
   [deploy] at org.smartfrog.sfcore.common.ConfigurationAction.execute(ConfigurationAction.java:113)
   [deploy] at org.smartfrog.sfcore.common.ConfigurationDescriptor.execute(ConfigurationDescriptor.java:1076)
   [deploy] at org.smartfrog.SFSystem.runConfigurationDescriptor(SFSystem.java:367)
   [deploy] at org.smartfrog.SFSystem.runConfigurationDescriptor(SFSystem.java:346)
   [deploy] at org.smartfrog.SFSystem.runConfigurationDescriptors(SFSystem.java:303)
   [deploy] at org.smartfrog.SFSystem.runSmartFrog(SFSystem.java:591)
   [deploy] at org.smartfrog.SFSystem.execute(SFSystem.java:434)
   [deploy] at org.smartfrog.SFSystem.main(SFSystem.java:395)
   [deploy] Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
   [deploy] java.net.ConnectException: Connection refused
   [deploy] at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601)
   [deploy] at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198)
   [deploy] at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
   [deploy] at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110)
   [deploy] at org.smartfrog.sfcore.processcompound.ProcessCompoundImpl_Stub.sfDeployComponentDescription(Unknown Source)
   [deploy] at org.smartfrog.sfcore.common.ActionDeploy.Deploy(ActionDeploy.java:155)
   [deploy] ... 10 more
   [deploy] Caused by: java.net.ConnectException: Connection refused
   [deploy] at java.net.PlainSocketImpl.socketConnect(Native Method)
   [deploy] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
   [deploy] at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
   [deploy] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
   [deploy] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
   [deploy] at java.net.Socket.connect(Socket.java:519)
   [deploy] at java.net.Socket.connect(Socket.java:469)
   [deploy] at java.net.Socket.<init>(Socket.java:366)
   [deploy] at java.net.Socket.<init>(Socket.java:180)
   [deploy] at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)
   [deploy] at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)
   [deploy] at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595)
   [deploy] ... 15 more
   [deploy] ALL: SmartFrogException: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
   [deploy] java.net.ConnectException: Connection refused'
   [deploy] * Command line SFACT: 'namenode:DEPLOY:'"/org/smartfrog/extras/hadoop/cluster/services/namenode/default.sf"'::192.168.1.6:'
   [deploy] * To String: ' name:namenode, type:DEPLOY, url:/org/smartfrog/extras/hadoop/cluster/services/namenode/default.sf, host:192.168.1.6, resultType:FAILED, resultExceptionMessage:java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
   [deploy] java.net.ConnectException: Connection refused, SmartFrogException:: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: , java.net.ConnectException: Connection refused, SmartFrog 3.17.005dev (2009-01-23 16:27:10 GMT), at org.smartfrog.sfcore.common.SmartFrogException.forward(SmartFrogException.java:92), at org.smartfrog.sfcore.common.ActionDeploy.Deploy(ActionDeploy.java:192), at org.smartfrog.sfcore.common.ActionDeploy.doDeploy(ActionDeploy.java:297), at org.smartfrog.sfcore.common.ActionDeploy.execute(ActionDeploy.java:272), at org.smartfrog.sfcore.common.ConfigurationAction.execute(ConfigurationAction.java:113), at org.smartfrog.sfcore.common.ConfigurationDescriptor.execute(ConfigurationDescriptor.java:1076), at org.smartfrog.SFSystem.runConfigurationDescriptor(SFSystem.java:367), at org.smartfrog.SFSystem.runConfigurationDescriptor(SFSystem.java:346), at org.smartfrog.SFSystem.runConfigurationDescriptors(SFSystem.java:303), at org.smartfrog.SFSystem.runSmartFrog(SFSystem.java:591), at org.smartfrog.SFSystem.execute(SFSystem.java:434), at org.smartfrog.SFSystem.main(SFSystem.java:395), Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: , java.net.ConnectException: Connection refused, at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601), at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198), at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184), at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110), at org.smartfrog.sfcore.processcompound.ProcessCompoundImpl_Stub.sfDeployComponentDescription(Unknown Source), at org.smartfrog.sfcore.common.ActionDeploy.Deploy(ActionDeploy.java:155), ... 10 more, Caused by: java.net.ConnectException: Connection refused, at java.net.PlainSocketImpl.socketConnect(Native Method), at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333), at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195), at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182), at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366), at java.net.Socket.connect(Socket.java:519), at java.net.Socket.connect(Socket.java:469), at java.net.Socket.<init>(Socket.java:366), at java.net.Socket.<init>(Socket.java:180), at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22), at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128), at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595), ... 15 more, ALL: SmartFrogException: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
   [deploy] java.net.ConnectException: Connection refused'
   [deploy]

Steve Loughran added a comment - 23/Jan/09 10:08 PM (GMT)
I've committed a patch which works for sfPing, sfManagementConsole, and issues a warning that the problem is on the far end, not locally

 sfManagementConsole -h 192.168.1.6
SFHOME undefined, using /home/slo/Projects/SmartFrog/Forge/core/smartfrog/dist as base directory
Starting management window...
sfManagementConsole connecting to 192.168.1.6:3800
2009/01/23 22:03:03:263 GMT [WARN ][main] SFCORE_LOG - Remote Process Compound for /192.168.1.6 appears to think its IP address is "127.0.0.1"
2009/01/23 22:03:03:268 GMT [WARN ][main] SFCORE_LOG - Communications with components are likely to fail with Connection refused exceptions
Error: Connection refused to host: 127.0.0.1; nested exception is:
java.net.ConnectException: Connection refused

Steve Loughran added a comment - 23/Jan/09 10:37 PM (GMT)
The patch fixes the ant tasks too

   [deploy] 2009/01/23 22:35:52:647 GMT [WARN ][main] SFCORE_LOG - Remote Process Compound for /192.168.1.6 appears to think its IP address is "127.0.0.1"
   [deploy] 2009/01/23 22:35:52:647 GMT [WARN ][main] SFCORE_LOG - Communications with components are likely to fail with Connection refused exceptions
   [deploy] - FAILED when trying DEPLOY of 'namenode', [/org/smartfrog/extras/hadoop/cluster/services/namenode/default.sf], host:192.168.1.6
   [deploy] Result:
   [deploy] * Exception: 'SmartFrogException:: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
   [deploy] java.net.ConnectException: Connection refused
   [deploy] SmartFrog 3.17.005dev (2009-01-23 22:07:29 GMT)'
   [deploy] * StackTrace: '
   [deploy] SmartFrogException:: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:

It's a bit ugly -looking for strings, but the only alternative is looking for the text "Connection refused to host: 127.0.0.1" in a ConnectException; same problem, only later, and could even internationalise worse.