I'm getting "Timeout getting IP address " error rancher v1.4.1

getting “Timeout getting IP address” when trying to run any container , most of infrastructure containers are not working now .

any fix or root cause for that ?

This essentially means cross-host networking isn’t working. Check that each host shows up in the UI with a unique IP and that each host can reach every other using those IPs on 500/udp and 4500/udp.

Thanks a lot .

I removed all the infrastructure containers , then I restarted the machines after making sure that the ins are shown in the UI ,

installed the infrastructure containers again one by one , and I can ping from each container to the others now.

I don’t know why the the servers didn’t see each others before .

it happened again , now I don’t know what makes the machines lose the connections to each others after a day . any Ideas ?

@ayman2nov what version of rancher are you running?
Can you grab the logs from the network-manager container and share?
Also, it might be better if you open an issue at https://github.com/rancher/rancher/issues for this.

I’m using v1.4.1

and this is the Error log

2/19/2017 2:07:44 PMtime=“2017-02-19T19:07:44Z” level=error msg="Failed to evaluate network state for 85ca24e23b563723a83c8e1cb89f6d43f0f442d9ccf97a41f570c4a93a917331: Couldn’t bring up network: failed to open netns “/proc/9146/ns/net”: failed to Statfs “/proc/9146/ns/net”: no such file or directory"
2/19/2017 2:07:44 PMtime=“2017-02-19T19:07:44Z” level=error msg="Error processing event &docker.APIEvents{Action:“start”, Type:“container”, Actor:docker.APIActor{ID:“85ca24e23b563723a83c8e1cb89f6d43f0f442d9ccf97a41f570c4a93a917331”, Attributes:map[string]string{“io.rancher.service.deployment.unit”:“8b3bd256-252c-4813-b9f3-ae1ee050d7e7”, “io.rancher.service.launch.config”:“io.rancher.service.primary.launch.config”, “io.rancher.stack_service.name”:“Default/team”, “io.rancher.project_service.name”:“Default/team”, “io.rancher.container.ip”:“”, “io.rancher.container.name”:“Default_team_1”, “io.rancher.container.pull_image”:“always”, “io.rancher.container.uuid”:“4c5eeaf3-fb75-4027-b6d0-3d5fecff5243”, “io.rancher.project.name”:“Default”, “io.rancher.stack.name”:“Default”, “name”:“r-Default_team_1”, “image”:“openproject/community:6.0”}}, Status:“start”, ID:“85ca24e23b563723a83c8e1cb89f6d43f0f442d9ccf97a41f570c4a93a917331”, From:“openproject/community:6.0”, Time:1487531264, TimeNano:1487531264896873785}. Error: Couldn’t bring up network: failed to open netns “/proc/9146/ns/net”:

did you find out ? same here.

I did remove all the network related stacks , and reinstall them all again ( The Library stacks ) , and it works for me