Got duplicate MachineID on AWS EC2

I’m using Rancher on development environment around 5 months. Everything works very well. I just tried using it on production - all hosts/services are now running on AWS EC2. I can successfully add new created hosts using custom host(running rancher-agent by copying command from Rancher UI) if they are created from a new EC2 instance from Amazon Linux AMI and install required softwares (e.g. docker, rancher, rancher-agent).

However, if I create an instance from my AMI image (created from already created/setup instance), I cannot see new host added on Rancher. I already tried remove the running rancher-agent on the host and run the agent again but it does not appear on Rancher. I can see only the first host(setup from scratch Amazon Linux AMI). I found in the rancher-agent logs that the Machine ID of both hosts are the same. Any suggestions? How can I change the Machine ID rancher-agent use?

Here is the Rancher version I am now using:-
Component Version
Rancher v0.47.0
Cattle v0.115.0
User Interface v0.68.0
Rancher Compose v0.5.3

@skrityak can you share the logs where it says the Machine ID is the same?

We maintain some state to identify a host at /var/lib/rancher/agent/state (they are hidden files that start with a dot) on each host. So, if you are creating an AMI from an existing machine, you will run into problems because the agent will see and use the pre-existing uuids.

Here are the lines cut from different log files. The MachineIDs are the same. EC2 InstanceIDs are different.

I0411 16:34:25.488652 27337 manager.go:205] Machine: {NumCores:4 CpuFrequency:2400056 MemoryCapacity:16830611456 MachineID:5f4fbbdb2804c0eda76019d75707f8ae SystemUUID:EC2C6CED-0F17-032D-5688-D1A9E3BF7C5D BootID:f8ba677d-2b80-45d7-a8e1-3dab68e447c0 Filesystems:[{Device:/dev/xvda1 Capacity:528310767616 Type:vfs Inodes:32768000} {Device:/dev/mapper/docker-202:1-264176-575ea16ea9e7750127ae79dc7e300f4d250a7ce8a2a6f46748d711dd89ad9d1b Capacity:107320705024 Type:vfs Inodes:104856576} {Device:/dev/mapper/docker-202:1-264176-e65c89424eb4231686417000d40d9c387a712f77260f8f0b69293d7beebb9276 Capacity:107320705024 Type:vfs Inodes:104856576} {Device:/dev/mapper/docker-202:1-264176-5e073b12125751406e61339a7fb2f453881def2164950e36fd2e89779050cf34 Capacity:107320705024 Type:vfs Inodes:104856576}] DiskMap:map[202:0:{Name:xvda Major:202 Minor:0 Size:536870912000 Scheduler:none} 253:0:{Name:dm-0 Major:253 Minor:0 Size:107374182400 Scheduler:none} 253:1:{Name:dm-1 Major:253 Minor:1 Size:107374182400 Scheduler:none} 253:2:{Name:dm-2 Major:253 Minor:2 Size:107374182400 Scheduler:none} 253:3:{Name:dm-3 Major:253 Minor:3 Size:107374182400 Scheduler:none} 253:4:{Name:dm-4 Major:253 Minor:4 Size:107374182400 Scheduler:none}] NetworkDevices:[{Name:eth0 MacAddress:06:cb:33:4d:b9:87 Speed:10000 Mtu:9001}] Topology:[{Id:0 Memory:16830611456 Cores:[{Id:0 Threads:[0 2] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]} {Id:1 Threads:[1 3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:31457280 Type:Unified Level:3}]}] CloudProvider:AWS InstanceType:m4.xlarge InstanceID:i-ea742064}

I0411 17:17:17.850260 24700 manager.go:205] Machine: {NumCores:4 CpuFrequency:2400092 MemoryCapacity:16830611456 MachineID:5f4fbbdb2804c0eda76019d75707f8ae SystemUUID:EC28A429-2CA3-7BBE-00C6-5F5FEFF8CEA6 BootID:f0fb2c66-697c-4599-a6ea-38d4cc0ccb78 Filesystems:[{Device:/dev/mapper/docker-202:1-264176-575ea16ea9e7750127ae79dc7e300f4d250a7ce8a2a6f46748d711dd89ad9d1b Capacity:107320705024 Type:vfs Inodes:104856576} {Device:/dev/mapper/docker-202:1-264176-b115e899af3d56276a1543b936234f17ff3a149a1f73916ff4fdeb4d034fb231 Capacity:107320705024 Type:vfs Inodes:104856576} {Device:/dev/xvda1 Capacity:528310767616 Type:vfs Inodes:32768000}] DiskMap:map[202:0:{Name:xvda Major:202 Minor:0 Size:536870912000 Scheduler:none} 253:0:{Name:dm-0 Major:253 Minor:0 Size:107374182400 Scheduler:none} 253:1:{Name:dm-1 Major:253 Minor:1 Size:107374182400 Scheduler:none} 253:2:{Name:dm-2 Major:253 Minor:2 Size:107374182400 Scheduler:none} 253:3:{Name:dm-3 Major:253 Minor:3 Size:107374182400 Scheduler:none}] NetworkDevices:[{Name:eth0 MacAddress:06:2f:a2:b6:08:87 Speed:10000 Mtu:9001}] Topology:[{Id:0 Memory:16830611456 Cores:[{Id:0 Threads:[0 2] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]} {Id:1 Threads:[1 3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:31457280 Type:Unified Level:3}]}] CloudProvider:AWS InstanceType:m4.xlarge InstanceID:i-e3683c6d}

After cleaning up everything in /var/lib/rancher/ and restart the agent, I can now see new host in Rancher UI.

I just found another issue. It seems to me that Rancher is expecting unique docker container ID (or combination of docker server id?) so I cannot see any containers in the cloned host in Rancher UI (I can see only active host and the containers are running fine). If I stop and remove a container with ID is the same as the original one in the cloned host using command line docker, the container with the same id in the original is also be removed from docker.

Ah yes, I think you’re right: we do assume unique docker IDs. We also add some metadata to containers in the form of labels. Specifcally, we add a RANCHER_UUID label that we absolutely expect to be unique. Creating an AMI from an existing host would also break that.

I think the TL;DR here is that it is not a great idea to create an AMI from a host with containers running on it and use that to provision additional hosts.