I need advice to remove some hung Rancher Processes from my Rancher Server, possibly by removing them from the MySQL Schema.
My production Rancher cluster shows 23 processes which have been “Delayed” for since either 12/1/2016 or 01/05/2017. The Rancher process name is instance.purge
. Here’s a screenshot from https://rancher.example.org/admin/processes/list?which=delayed
The Rancher Server logs are filled with thousands of errors and Java stack traces like these:
{"log":"2017-03-31 19:41:26,121 ERROR [:] [] [] [] [ecutorService-1] [.e.s.i.ProcessInstanceDispatcherImpl] Unknown exception running process [instance.purge:1247761] on [9669], canceled by [State [activating] is not valid for process [instancehostmap.remove:null] on resource [9106]] \n","stream":"stdout","time":"2017-03-31T19:41:26.121741677Z"}
{"log":"2017-03-31 19:41:26,124 ERROR [:] [] [] [] [cutorService-14] [.e.s.i.ProcessInstanceDispatcherImpl] Unknown exception running process [instance.purge:11181739] on [204711], canceled by [State [activating] is not valid for process [instancehostmap.remove:null] on resource [202811]] \n","stream":"stdout","time":"2017-03-31T19:41:26.124572917Z"}
{"log":"2017-03-31 19:41:26,127 ERROR [:] [] [] [] [cutorService-12] [.e.s.i.ProcessInstanceDispatcherImpl] Unknown exception running process [instance.purge:11184576] on [204754], canceled by [State [activating] is not valid for process [instancehostmap.remove:null] on resource [202867]] \n","stream":"stdout","time":"2017-0331T19:41:26.128013789Z"}
{"log":"2017-03-31 19:41:41,112 ERROR [657ae095-4d79-4e8f-84a8-b0bb08137e75:11180638] [instance:204677] [instance.purge] [] [ecutorService-7] [c.p.e.p.i.DefaultProcessInstanceImpl] Unknown exception java.lang.IllegalStateException: Attempt to cancel when process is still transitioning\n","stream":"stdout","time":"2017-03-31T19:41:41.11392176Z"}
If I view the host/instance ID in the API, the purge button is not clickable. If I use the ‘Delete’ button on that screen, Rancher seems return an error message under “HTTP Response:”, and the Process is not removed.
{
"id": "3322515d-5452-441c-90eb-1d3541c605d5",
"type": "error",
"links": { },
"actions": { },
"status": 409,
"code": "Conflict",
"message": "Conflict",
"detail": null,
"baseType": "error"
}
I suppose I could remove these processes from the database. Is there a clear procedure on how to do that? Is the MySQL Schema documented? I could try deleting the rows from process_instance (After backing up the schema), but I’m unclear about other tables.
DELETE FROM process_instance WHERE process_name LIKE "instance.purge" AND start_time LIKE "2016-12-01%";```
For the record, this post is an attempt to fix my issue reported at Dozens of processes named "instance.purge" have been "delayed" since 12/1/2016 and 1/5/2017 · Issue #8316 · rancher/rancher · GitHub