I have been attempting to replicate the behavior advertised in this blog article.
In short, the idea is being able to utilize Spot Instances for your Rancher cluster. As a cluster Spot node is terminated, a new Spot node is provisioned and the existing containers on the terminating instance are live migrated to the newly provisioned Spot by using CRIU functionality, before the terminating Spot is removed from the cluster and shut down.
I’ve been trying to trace the API calls to narrow down the cause of the issue. In so doing I have found an API call to the following:
I’m assuming that this is the call that is supposed to drain the containers, but inevitably what happens is that the host is deleted. This causes the containers to move to another host, but it doesn’t live migrate the container, but rather it starts fresh on the new host.
I’m trying to understand how to tie this audit log event to the API and understand what API call is being initiated. The hope is that in understanding the call, as well as the intent of the call, I can help troubleshoot. I have already engaged Spotinst and am waiting on response from them as well, but have found myself confused in how they are anticipating these containers to live migrate.
Any help on translating this audit log event to a specific API call would be appreciated.
Also, if anybody else has been able to successfully implement this functionality, I would love to hear how it was done and lessons learned.