Cannot add multiple hosts in rancher server

I have encountered an issue when adding multiple hosts to rancher server.

The issue is that the first added host works well in rancher server from GUI and the first one will be removed automatically when I adding a new host to rancher. It seems that the first host has been replaced by the second one.

The steps for adding hosts,

  • run the latest rancher server docker image using single mode
  • add a new environment with kubenetes
  • change to the new environment and click ‘add host’, copy the command and run on the minion server
  • do the same action as item 3 on minion server 2

The VM I used is,

  • Rancher version v1.2.1
  • Linux 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • Docker version 1.12.5, build 7392c3b

All the VMs are copied from one VM template and I did,

  • change the host name in each VM(modified the file /etc/hosts and /etc/hostname) and reboot
  • stop the docker service by “service docker stop”
  • remove all the docker lib files under /var/lib/docker to keep it clean
  • remove all the iptables by “iptables -F”
  • start the docker service

I checked the docker running containers after the first host had been removed automatically,

  • On first minion,
    1. rancher agent exits by 137
    7052c81844b6 rancher/agent:v1.1.1 “/run.sh run” 41 minutes ago Exited (137) 22 minutes ago

  • On second minion,
    1. rancher agent starts successfully
    f42c19a95b60 rancher/agent:v1.1.1 “/run.sh run” 29 minutes ago Up 29 minutes rancher-agent

  • On the rancher server,
    1. The logs when adding second host

    2016-12-20 09:22:46,767 ERROR [61bdca2e-41ee-4e85-abf7-aaa01014293d:616] [instance:42] [instance.start->(InstanceStart)] [] [ecutorService-1] [i.c.p.process.instance.InstanceStart] Failed [Dependencies readiness error instance is not running] for instance [42]
    2016-12-20 09:22:56,230 ERROR [:] [] [] [] [cutorService-10] [o.a.c.m.context.NoExceptionRunnable ] Expected state running but got removed
    time=“2016-12-20T09:22:56Z” level=info msg="Shutting down backend 53c8c681-58ab-49de-97c5-279a61aad5d0. Connection closed because: websocket: close 1006 unexpected EOF."
    time=“2016-12-20T09:22:56Z” level=info msg="Not removing backend for key 53c8c681-58ab-49de-97c5-279a61aad5d0. The provided session ID 074db0e2-51d5-4777-ad27-3383ba8f94bd doesn’t match registered session ID 74fdaeba-15b5-4d1a-943e-9824774d913d."
    2016/12/20 09:22:56 http: proxy error: net/http: request canceled
    time=“2016-12-20T09:22:58Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    time=“2016-12-20T09:22:58Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    time=“2016-12-20T09:22:58Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    time=“2016-12-20T09:23:02Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    time=“2016-12-20T09:23:03Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    2016-12-20 09:23:16,798 ERROR [f6abd45e-b12d-47b4-9406-7ef4fd35e995:616] [instance:42] [instance.start->(InstanceStart)] [] [ecutorService-7] [i.c.p.process.instance.InstanceStart] Failed [Dependencies readiness error instance is not running] for instance [42]
    time=“2016-12-20T09:23:17Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    time=“2016-12-20T09:23:25Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    time=“2016-12-20T09:23:26Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    2016-12-20 09:23:29,902 ERROR [0a5f1f74-950f-4f5e-be84-a53f0a64651b:977] [instance:61] [instance.start->(InstanceStart)] [] [ecutorService-7] [i.c.p.process.instance.InstanceStart] Failed [Dependencies readiness error instance is not running] for instance [61]
    time=“2016-12-20T09:23:32Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    time=“2016-12-20T09:23:35Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    2016-12-20 09:23:46,790 ERROR [f243bacb-2210-47cb-998c-7286b864f571:616] [instance:42] [instance.start->(InstanceStart)] [] [cutorService-10] [i.c.p.process.instance.InstanceStart] Failed [Dependencies readiness error instance is not running] for instance [42]
    time=“2016-12-20T09:23:47Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    2016-12-20 09:23:53,958 ERROR [c877180b-3f46-4e84-a607-79cedf3bd2e3:740] [service:12] [service.update] [] [cutorService-16] [c.p.e.p.i.DefaultProcessInstanceImpl] Expected state running but got stopped
    2016-12-20 09:23:53,965 ERROR [:] [] [] [] [cutorService-16] [.e.s.i.ProcessInstanceDispatcherImpl] Expected state running but got stopped
    time=“2016-12-20T09:23:54Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    2016-12-20 09:24:16,803 ERROR [b027aec6-dc22-4140-8d25-3438732449f6:616] [instance:42] [instance.start->(InstanceStart)] [] [ecutorService-2] [i.c.p.process.instance.InstanceStart] Failed [Dependencies readiness error instance is not running] for instance [42]
    time=“2016-12-20T09:24:17Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    2016-12-20 09:24:17,917 ERROR [e96f55b3-f6f0-41fa-931e-7368d50cd897:977] [instance:61] [instance.start->(InstanceStart)] [] [cutorService-15] [i.c.p.process.instance.InstanceStart] Failed [Dependencies readiness error instance is not running] for instance [61]
    2016-12-20 09:24:18,825 ERROR [4350c2bb-b649-4b03-9e81-ced8a22ac3d1:1103] [instance:64] [instance.start->(InstanceStart)] [] [cutorService-15] [i.c.p.process.instance.InstanceStart] Failed to Scheduling for instance [64]
    2016-12-20 09:24:19,051 ERROR [9459a7e7-5adb-44ee-bf78-eee0c90ea049:1100] [instance:65] [instance.start->(InstanceStart)] [] [ecutorService-4] [i.c.p.process.instance.InstanceStart] Failed to Scheduling for instance [65]
    2016-12-20 09:24:21,965 ERROR [:] [] [] [] [cutorService-10] [o.a.c.m.context.NoExceptionRunnable ] Expected state running but got stopped
    time=“2016-12-20T09:24:29Z” level=error msg=“Error looking up token.” error="HTTP error: 404 Not Found, 404"
    2016-12-20 09:24:39,314 ERROR [20d3a1c0-20df-4f99-921b-5084778544a7:1178] [instance:66] [instance.start->(InstanceStart)] [] [cutorService-15] [i.c.p.process.instance.InstanceStart] Failed to Scheduling for instance [66]
    2016-12-20 09:24:39,932 ERROR [4ed69705-e00b-4b45-bd7b-2e3678d070b1:1176] [instance:67] [instance.start->(InstanceStart)] [] [cutorService-17] [i.c.p.process.instance.InstanceStart] Failed [Dependencies readiness error instance is not running] for instance [67]
    time=“2016-12-20T09:24:40Z” level=info msg=“Stack Create Event Received” eventId=247fd11e-22ea-419c-9f22-c1088ac16761 resourceId=1st12
    time=“2016-12-20T09:24:40Z” level=info msg=“Stack Create Event Done” eventId=247fd11e-22ea-419c-9f22-c1088ac16761 resourceId=1st12
    2016-12-20 09:25:01,841 ERROR [8d71ffba-0b19-4751-8e52-beaf73d02aa0:1176] [instance:67] [instance.start->(InstanceStart)] [] [ecutorService-1] [i.c.p.process.instance.InstanceStart] Failed [Dependencies readiness error instance is not running] for instance [67]
    2016-12-20 09:25:31,919 ERROR [e0324bf9-096d-4487-b481-79cf4ba08c10:1176] [instance:67] [instance.start->(InstanceStart)] [] [ecutorService-8] [i.c.p.process.instance.InstanceStart] Failed [Dependencies readiness error instance is not running] for instance [67]
    2016-12-20 09:26:01,416 ERROR [58c66eff-7896-4b42-81e1-16edc6c112ba:1218] [instance:68] [instance.start->(InstanceStart)] [] [ecutorService-9] [i.c.p.process.instance.InstanceStart] Failed to Scheduling for instance [68]
    2016-12-20 09:26:01,947 ERROR [d92d789b-5f92-439c-877f-5bbd1362e55a:1216] [instance:69] [instance.start->(InstanceStart)] [] [cutorService-16] [i.c.p.process.instance.InstanceStart] Failed [Dependencies readiness error instance is not running] for instance [69]

I also tried to install rancher 1.1.4 and 1.2.0, the issue also exists.

I am new for rancher and currently I have no idea about this. Have anyone encountered this issue before or Could you please help to tell me how can I resolve this issue?

Thanks for your help.

/var/lib/rancher/state contains a UUID that identifies the host, so when you add a 2nd one we think it’s the same as the one that already exists and it replaces it. rm -rf it as part of your template prep.

@vincent, thanks, really appreciate your help. I removed the files on VMs and it works well now. :slight_smile:

BTW, is there any documentation related to the implementation details of rancher so that we can follow? Thanks again.