Hi,
I migrated the rancher server onto another server with Ubuntu 18.04 by simply copying the mysql database (stopped rancher, copied it, installed it as new on another server, copied the db on the new one, started the rancher server there). I’m using Rancher 1.6.25.
Then I reran the the rancher-agent (docker run etc.) on one of the hosts.
It also seemed fine, but Rancher kept recreating the scheduler container, so I thought that the problem was the initial scheduler container ‘stood in the way’. It was identified as a standalone container. I deleted the initial scheduler container (through the rancher interface), but Rancher kept on recreating the new scheduler container. I then restarted the host and now, besides the scheduler container, the healthcheck container also isn’t working.
Any ideas how I can get past this?
This is what I keep getting on the healthcheck container logs:
time="2019-01-24T12:03:36Z" level=error msg="Failed to report status 6573f81c-ebfc-48f8-8aa8-81a73fde790c_1a9f83c1-6b68-4bc8-bcb4-41f8f255e9bc_2=DOWN: Bad response from [http://207.154.200.246:8080/v1/serviceevents], statusCode [403]. Status [403 Forbidden]. Body: [{\"id\":\"449637b7-265f-4238-a955-75e830645649\",\"type\":\"error\",\"links\":{},\"actions\":{},\"status\":403,\"code\":\"CantVerifyHealthcheck\",\"message\":\"CantVerifyHealthcheck\",\"detail\":null,\"baseType\":\"error\"}]"
And this is what I get on the scheduler container logs:
time="2019-01-24T12:05:29Z" level=info msg="Listening on /tmp/log.sock" time="2019-01-24T12:05:29Z" level=info msg="Connecting to cattle event stream." time="2019-01-24T12:05:29Z" level=info msg="Subscribing to metadata changes." time="2019-01-24T12:05:29Z" level=info msg="Listening for health checks on 0.0.0.0:80/healthcheck" time="2019-01-24T12:05:29Z" level=info msg="Initializing event router" workerCount=100 time="2019-01-24T12:05:29Z" level=info msg="Connection established" time="2019-01-24T12:05:29Z" level=info msg="Starting websocket pings" time="2019-01-24T12:05:30Z" level=info msg="Adding resource pool [instanceReservation] with total 1000000 and used 15 for host 6573f81c-ebfc-48f8-8aa8-81a73fde790c" time="2019-01-24T12:05:30Z" level=info msg="Adding resource pool [cpuReservation] with total 2000 and used 0 for host 6573f81c-ebfc-48f8-8aa8-81a73fde790c" time="2019-01-24T12:05:30Z" level=info msg="Adding resource pool [memoryReservation] with total 4135583744 and used 0 for host 6573f81c-ebfc-48f8-8aa8-81a73fde790c" time="2019-01-24T12:05:30Z" level=info msg="Adding resource pool [storageSize] with total 81032015 and used 0 for host 6573f81c-ebfc-48f8-8aa8-81a73fde790c" time="2019-01-24T12:05:30Z" level=info msg="Adding resource pool [portReservation], ip set [0.0.0.0], ports map tcp map[0.0.0.0:map[80:30eee514-b6ac-491b-be8e-5fb8dfec2754]], ports map udp map[0.0.0.0:map[500:a760f279-482d-4880-8bd3-3b1ee7800912 4500:a760f279-482d-4880-8bd3-3b1ee7800912]] for host 6573f81c-ebfc-48f8-8aa8-81a73fde790c" time="2019-01-24T12:05:30Z" level=info msg="Adding resource pool [hostLabels] with label map [map[io.rancher.host.agent_image:rancher/agent:v1.2.11 io.rancher.host.docker_version:18.06 io.rancher.host.kvm:true io.rancher.host.linux_kernel_version:4.15 io.rancher.host.os:linux]]"
Am I missing something regarding the migration of the Rancher server? I don’t understand why it simply doesn’t work as it should. I can’t find this information anywhere in the documentation. I’d gladly read it if someone could point me to it.