Rancher 1.6.10 upgrade issues - ipsec upgrade error

I’m currently in trying to upgrade my organization’s development environment in Rancher from version 1.5.5 to version 1.6.10.

  • We have around 10+ machines with Ubuntu 16.04 running docker 1.12.6.
  • Rancher Server is running on a separate host

The Rancher server upgrade went smoothly without problems whatsoever.

I followed the infrastructure services upgrade priority: 1st the network-services, and 2nd the ipsec services.

After upgrading the network-services stack (which finished successfully) I started upgrading the ipsec services, and it failed right on the first host.
It kept constantly trying to create and start the container rancher/net:v0.11.9 without sucess, with the following error:

Rancher managed network not work with error `Couldn't bring up network: netplugin failed but error parsing its diagnostic
message "": unexpected end of JSON input`

On the server side, the log when started the stack upgrade:

time="2017-10-02T14:58:05Z" level=info msg="Upgrade Stack Event Received" eventId=943d1871-919f-4969-85c3-98e0d8d36a5f resourceId=1st74
time="2017-10-02T14:58:05Z" level=info msg="[ipsec:]: Starting project " eventId=943d1871-919f-4969-85c3-98e0d8d36a5f resourceId=1st74
time="2017-10-02T14:58:05Z" level=info msg="[ipsec:ipsec]: Starting " eventId=943d1871-919f-4969-85c3-98e0d8d36a5f resourceId=1st74
time="2017-10-02T14:58:05Z" level=info msg="[ipsec:cni-driver]: Starting " eventId=943d1871-919f-4969-85c3-98e0d8d36a5f resourceId=1st74
time="2017-10-02T14:58:05Z" level=info msg="Creating service cni-driver"
time="2017-10-02T14:58:05Z" level=info msg="Updating ipsec"
time="2017-10-02T14:58:05Z" level=error msg="Failed Starting cni-driver : Bad response statusCode [422]. Status [422 status code 422]. Body: [code=NotUnique, fieldName=name, b
aseType=error] from [http://localhost:8080/v2-beta/projects/1a5/networkdriverservices]"
time="2017-10-02T14:58:05Z" level=error msg="Failed to start: cni-driver : Bad response statusCode [422]. Status [422 status code 422]. Body: [code=NotUnique, fieldName=name,
baseType=error] from [http://localhost:8080/v2-beta/projects/1a5/networkdriverservices]"
time="2017-10-02T14:58:06Z" level=info msg="Upgrading ipsec"
2017-10-02 14:59:09,945 ERROR [4bb0ae47-7668-4a07-898b-9d44168e847b:266803] [instance:16741] [instance.start->(InstanceStart)] [] [cutorService-80] [i.c.p.process.instance.Ins
tanceStart] Failed [Dependencies readiness error instance is not running] for instance [16741]
2017-10-02 14:59:15,520 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740->instanceHostMap:7917] [instance.start->(InstanceStart)->instancehostmap.activate]
[] [cutorService-79] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [compute.instance.activate.reply;agent=74]: Timeout getting IP address
2017-10-02 14:59:15,520 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740] [instance.start->(InstanceStart)] [] [cutorService-79] [i.c.p.process.instance.Ins
tanceStart] Failed [1/5] to Starting for instance [16740]
2017-10-02 14:59:15,545 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740->instanceHostMap:7917] [instance.start->(InstanceStart)->instancehostmap.activate]
[] [cutorService-79] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [compute.instance.activate.reply;agent=74]: Timeout getting IP address
2017-10-02 14:59:15,545 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740] [instance.start->(InstanceStart)] [] [cutorService-79] [i.c.p.process.instance.Ins
tanceStart] Failed [2/5] to Starting for instance [16740]
2017-10-02 14:59:15,568 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740->instanceHostMap:7917] [instance.start->(InstanceStart)->instancehostmap.activate]
[] [cutorService-79] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [compute.instance.activate.reply;agent=74]: Couldn't bring up network: netplugin failed but error p
arsing its diagnostic message "": unexpected end of JSON input
2017-10-02 14:59:15,568 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740] [instance.start->(InstanceStart)] [] [cutorService-79] [i.c.p.process.instance.Ins
tanceStart] Failed [3/5] to Starting for instance [16740]
2017-10-02 14:59:15,588 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740->instanceHostMap:7917] [instance.start->(InstanceStart)->instancehostmap.activate]
[] [cutorService-79] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [compute.instance.activate.reply;agent=74]: Timeout getting IP address
2017-10-02 14:59:15,588 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740] [instance.start->(InstanceStart)] [] [cutorService-79] [i.c.p.process.instance.Ins
tanceStart] Failed [4/5] to Starting for instance [16740]
2017-10-02 14:59:15,605 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740->instanceHostMap:7917] [instance.start->(InstanceStart)->instancehostmap.activate]
[] [cutorService-79] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [compute.instance.activate.reply;agent=74]: Couldn't bring up network: netplugin failed but error p
arsing its diagnostic message "": unexpected end of JSON input
2017-10-02 14:59:15,605 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740] [instance.start->(InstanceStart)] [] [cutorService-79] [i.c.p.process.instance.Ins
tanceStart] Failed [5/5] to Starting for instance [16740]
2017-10-02 14:59:15,880 ERROR [0389c68e-b8d2-4674-8aec-f470709a0eb3:266799] [instance:16740] [instance.start] [] [cutorService-79] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent
 error for [compute.instance.activate.reply;agent=74]: Couldn't bring up network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
2017-10-02 14:59:16,015 ERROR [:] [] [] [] [cutorService-79] [.e.s.i.ProcessInstanceDispatcherImpl] Agent error for [compute.instance.activate.reply;agent=74]: Couldn't bring
up network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input

Surprisingly I managed to fully upgrade a different Environment in the same Rancher Server with just one host. I noticed that the cni-driver is now a separate service in the Ipsec stack. That did not happen on my upgrade attempt above…

I don’t know what else I can do. Currently i’m unable to upgrade Ipsec services without having the errors I mentioned above.

This is my current status…

Any advice?

Moved discussion to GitHub issue: https://github.com/rancher/rancher/issues/9994

Thanks for reporting the problem and creating a github issue. Closing this thread.