I have ROS deployed in a HA configuration (3 ROS nodes with a non cluster Load balancer in front of it). I have a node template to deploy my primary work-horse cluster, which was working really well.
I wanted to change the networking a bit so that I had a network that had more ip room for auto-scaling (I didn’t think about this before I created the cluster).
Sooo I created a new vlan. I added it to my vSphere server, my managed switch, my pfsense DHCP/DNS server… everywhere where it should be. I can ping to and from the new vlan, and there’s no firewall rules or anything blocking my path to isolation! (I can ssh through to it)
I updated my node template to use the new vSphere Network, and… nothing. I get hung on registering kubernetes.
I take a look at the logs on the worker node and I see:
INFO: Arguments: --server can’t-link://myranch.internal.domain --token REDACTED --ca-checksum b397189c7c3dfa1066d5158e947b9fc3c40bc7a2bb554076acfb3cde2400c470 -r -n m-rvgtm
INFO: Environment: CATTLE_ADDRESS=172.16.0.250 CATTLE_AGENT_CONNECT=true CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=m-rvgtm CATTLE_SERVER=can’t-link://myranch.internal.domain CATTLE_TOKEN=REDACTED
INFO: Using resolv.conf: domain internal.domain nameserver 172.16.0.252
INFO: can’t-link://myranch.internal.domain/ping is accessible
INFO: Value from can’t-link://myranch.internal.domain/v3/settings/cacerts is an x509 certificate
time=“2019-02-22T07:23:59Z” level=info msg=“Rancher agent version v2.1.6 is starting”
time=“2019-02-22T07:23:59Z” level=info msg=“Option customConfig=map[roles:[] label:map[] address:172.16.0.250 internalAddress:]”
time=“2019-02-22T07:23:59Z” level=info msg=“Option etcd=false”
time=“2019-02-22T07:23:59Z” level=info msg=“Option controlPlane=false”
time=“2019-02-22T07:23:59Z” level=info msg=“Option worker=false”
time=“2019-02-22T07:23:59Z” level=info msg=“Option requestedHostname=m-rvgtm”
time=“2019-02-22T07:23:59Z” level=info msg=“Listening on /tmp/log.sock”
time=“2019-02-22T07:23:59Z” level=info msg=“Connecting to wss://myranch.internal.domain/v3/connect with token w4prnrhdmdgmv64pt28fzmwbpgk2rd8tbdmcz9tq882q65fpbhbx92”
time=“2019-02-22T07:23:59Z” level=info msg=“Connecting to proxy” url=“wss://myranch.int.nvoke.ca/v3/connect”
time=“2019-02-22T07:23:59Z” level=info msg="Error while getting agent config: invalid response 500: failed to find plan for 172.16.0.250"
And the Error just repeats until the cluster tries to redeploy that node again…aaand…rinse /repeat. The closest thing I’ve seen that’s similar is Change external IP of AWS Node - they didn’t have an answer though… I’m hoping to have better luck.
I only tried changing the ip scheme on the one node… and when it failed, I tried changing it back on the node template so now there shouldn’t be any changes… same error… so now I’m really stuck… any thoughts?
Here’s the docker version, and the /etc/os-release, rancher version… let me know what else you need.
docker version
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 01:35:00 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.2-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 01:35:00 2017
OS/Arch: linux/amd64
Experimental: false
[docker@work3 ~]$ cat /etc/os-release
NAME=“RancherOS”
VERSION=v1.5.1
ID=rancheros
ID_LIKE=
VERSION_ID=v1.5.1
PRETTY_NAME=“RancherOS v1.5.1”
HOME_URL=“can’t-link://rancher.com/rancher-os/”
SUPPORT_URL=“can’t-link://forums.rancher.com/c/rancher-os”
BUG_REPORT_URL=“can’t-link://github.com/rancher/os/issues”
BUILD_ID=
|Rancher|v2.1.6|
|User Interface|v2.1.19|
|Helm|v2.10.0-rancher5|
|Machine|v0.15.0-rancher1-1|
Thanks for any assistance you can offer!