Shell access and logs work intermittently after upgrade to 1.4

Since we upgraded our HA setup from Rancher 1.1.4 to 1.4 we see that shell access and viewing of the logs regularly don’t work. Oddly enough, every once in a while they do while we changed nothing.

After reviewing our security groups, the agent logs, the audit logs and the resources of our managers we could not find a reason why it wouldn’t work. The FAQ mentions that the servers may not be time-synced but I have verified they are (and the mentioned error does not occur).

We currently allow UDP 4500, 500 between nodes, and port 9345 is open between the server nodes. In addition we allow traffic from port 22. Reviewing the audit logs I can see the connection attempts but a proper response seems to be returned.

Does anyone here have an angle that I can try or something I might have missed during troubleshooting?

1 Like

Hi,

I am having the same issues, shell just opens and closes, logs say “disconnected” stats are in “connecting” state.
All servers are in sync and there is no firewall between them. Would love to see this resolved as well.

We also running 1.4 rancher version.

kr,
Milos

Our issues with the terminal and logs appear to be solved; we discovered in the rancher-server logs that they were unable to communicate with eachother through port 80 on the public interface. Since we opened that up in the security groups it seems to be working consistently again

Actually i discovered that every time i try to execute shell i get this in the logs of the agent (nothing on rancher server):

time=“2017-02-09T21:22:38Z” level=error msg=“Error writing EOT message.” error=“write unix @->/var/run/docker.sock: write: broken pipe”

kr,
Milos

Opened port 80 where?, on the Rancher HA servers or the nodes ?

So did you ingress rules allow port 80 access directly or only from your load balancer or between the nodes (and restricted to self) ?

I opened port 80 between the HA servers themselves. I have to add that we configured Rancher HA to listen to port 80 instead of 8080; so for others I think 8080 would make more sense.