java.IllegalStateException: More than one volume exists for volume URI

I just upgraded from 1.1.4 to 1.5.5 and we currently use the Docker driver https://github.com/mcuadros/gce-docker to mount Google GCE/GCP drives.

After the upgrade / on creating and upgrading images that utilize GCP drives, the containers never launch. The server logs have exceptions such as:

2017-04-22 19:59:16,732 ERROR [:] [] [] [] [ecutorService-1] [.e.s.i.ProcessInstanceDispatcherImpl] Unknown exception running process [instance.start:227208] on [1650] java.lang.IllegalStateException: More than one volume exists for volume URI [gce:///mnt/rancher-charts-selenium-v2] or name/externalId [rancher-charts-selenium-v2] and storage pool [6].
	at io.cattle.platform.docker.process.dao.impl.DockerComputeDaoImpl.getDockerVolumeInPool(DockerComputeDaoImpl.java:55) ~[cattle-docker-compute-0.5.0-SNAPSHOT.jar:na]
	at sun.reflect.GeneratedMethodAccessor937.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_72]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_72]
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99) ~[spring-tx-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:280) ~[spring-tx-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96) ~[spring-tx-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at com.sun.proxy.$Proxy68.getDockerVolumeInPool(Unknown Source) ~[na:na]
	at io.cattle.platform.docker.process.instancehostmap.DockerPostInstanceHostMapActivate.createVolumeInStoragePool(DockerPostInstanceHostMapActivate.java:213) ~[cattle-docker-compute-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.docker.process.instancehostmap.DockerPostInstanceHostMapActivate.processVolumes(DockerPostInstanceHostMapActivate.java:192) ~[cattle-docker-compute-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.docker.process.instancehostmap.DockerPostInstanceHostMapActivate.handle(DockerPostInstanceHostMapActivate.java:115) ~[cattle-docker-compute-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandler(DefaultProcessInstanceImpl.java:470) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:421) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:415) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.engine.idempotent.Idempotent.execute(Idempotent.java:42) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
...

This previously worked without issue in 1.1.4. I’ve found a similar bug at: https://github.com/rancher/rancher/issues/7489 as well as Issue with duplicate volumes being created using external EFS driver on Rancher 1.2.1. Anyone have thoughts related to this?

To share more context, I currently launch a service using a GCE drive like this with a docker-compose v1 config:

some-service:
  image: some:container
  volume_driver: gce
  volumes:
    - rancher-charts-selenium-v2:/some/mount/point

Where a gce-docker container runs on every machine to distribute the docker drivers for GCE.

Digging in deeper for one particular volume: I have all containers using the volume from before my upgrade deleted, I have 4 entires in the volume table, one of which has the volume’s URI set, while the others are blank. When I create a new container using the existing volume, another row is added (now 5 rows), where 4 of the 5 are missing the URI.

e.g.

id	name	account_id	kind	uuid	description	state	created	removed	remove_time	data	physical_size_mb	virtual_size_mb	device_number	format	allocation_state	attached_state	instance_id	image_id	offering_id	zone_id	uri	external_id	access_mode	host_id	deployment_unit_id	environment_id	volume_template_id	storage_driver_id	size_mb
382	rancher-charts-selenium-v2	5	volume	eeb36fbe-c487-4d56-ac81-7a54fff33602	NULL	detached	2017-01-17 02:45:42	NULL	NULL	{"fields":{"isHostPath":false,"driver":"gce"}}	NULL	NULL	-1	docker	active	inactive	NULL	NULL	NULL	1	gce:///mnt/rancher-charts-selenium-v2	rancher-charts-selenium-v2	NULL	NULL	NULL	NULL	NULL	NULL	NULL
1802	rancher-charts-selenium-v2	5	volume	ee53ef50-8d32-4adb-a772-c60350db7219	NULL	inactive	2017-04-22 19:28:42	NULL	NULL	{"fields":{"driver":"gce"}}	NULL	NULL	NULL	NULL	active	NULL	NULL	NULL	NULL	1	NULL	rancher-charts-selenium-v2	NULL	NULL	NULL	NULL	NULL	NULL	NULL
1814	rancher-charts-selenium-v2	5	volume	09895055-d5e8-4e7d-a2ad-e06fa742293e	NULL	inactive	2017-04-22 19:56:45	NULL	NULL	{"fields":{"driver":"gce"}}	NULL	NULL	NULL	NULL	active	NULL	NULL	NULL	NULL	1	NULL	rancher-charts-selenium-v2	NULL	NULL	NULL	NULL	NULL	NULL	NULL
1817	rancher-charts-selenium-v2	5	volume	18c7cb33-ef65-4852-aa6d-2adcb1c21918	NULL	inactive	2017-04-22 20:31:49	NULL	NULL	{"fields":{"driver":"gce"}}	NULL	NULL	NULL	NULL	active	NULL	NULL	NULL	NULL	1	NULL	rancher-charts-selenium-v2	NULL	NULL	NULL	NULL	NULL	NULL	NULL
1819	rancher-charts-selenium-v2	5	volume	6f0e4dc4-7062-4cdf-aa7c-9ea7d8ea4003	NULL	inactive	2017-04-22 20:59:49	NULL	NULL	{"fields":{"driver":"gce"}}	NULL	NULL	NULL	NULL	active	NULL	NULL	NULL	NULL	1	NULL	rancher-charts-selenium-v2	NULL	NULL	NULL	NULL	NULL	NULL	NULL

Don’t really want to go poking the db, but I’m under the impression I’m currently in a bad state somehow?

If I use a new GCE disk that hasn’t been used in Rancher before, containers are created without trouble. I created a new container with a new disk + deleted the service afterwards. Rows looks different than the row created in v1.1.4.

id	name	account_id	kind	uuid	description	state	created	removed	remove_time	data	physical_size_mb	virtual_size_mb	device_number	format	allocation_state	attached_state	instance_id	image_id	offering_id	zone_id	uri	external_id	access_mode	host_id	deployment_unit_id	environment_id	volume_template_id	storage_driver_id	size_mb
1821	rancher-charts-selenium-v3	5	volume	1c0792aa-606f-4466-ac76-e18b41503448	NULL	detached	2017-04-22 21:17:32	NULL	NULL	{"fields":{"driver":"gce"}}	NULL	NULL	NULL	docker	active	NULL	NULL	NULL	NULL	1	NULL	rancher-charts-selenium-v3	NULL	NULL	NULL	NULL	NULL	NULL	NULL

However, if I try to recreate the service with the same, new disk, I’m back to the same bad state with the same exception message:

2017-04-22 21:26:02,327 ERROR [:] [] [] [] [cutorService-11] [.e.s.i.ProcessInstanceDispatcherImpl] Unknown exception running process [instance.start:227399] on [1655] java.lang.IllegalStateException: More than one volume exists for volume URI [gce:///mnt/rancher-charts-selenium-v3] or name/externalId [rancher-charts-selenium-v3] and storage pool [6].

This appears to be a Rancher bug?

Maybe not a bug, but by design? That said, if by design, this might be an undocumented breaking change?

It seems volumes are created on stack creation. Previously, I’m guessing, Rancher wasn’t really involved in tracking volumes as it is now, so using a Docker Volume driver was fine.

Separately, I tried using the “external” keyword… but that has the error: Volume ___ declared as external, but could not be found. I assume that since the volume it was never created in Rancher, it would cause an error despite the Docker Volume Driver including the volume in its list implementation. I found https://github.com/rancher/rancher/issues/8497 as well, might this be related?

For those who are wondering, I’ve managed to take a guess at the root cause and stumbled into a solution.

I think there are issues with the current handling of docker-compose v1 files on Rancher 1.5.5 with Docker Volume Plugins (not verified). It seems rancher tries to create a volume with every service create (and perhaps delete with every service delete). The behavior used to be to not create/delete volumes in Rancher 1.1.4. It seems volumes are created even if they already exist as well, leading to https://github.com/rancher/rancher/issues/7489.

Secondly, you can convert to docker-compose v2 – however, this only works if you are ok with rancher creating/ deleting volumes. If you are using a docker volume plugin, you must create a new storage service to represent your plugin. Here’s where it gets really fuzzy for me as I’m not sure what I did that’s right and what I just did – I created a service that registered itself as a storage service with:

version: '2'
services:
  gce-driver:
    image: mcuadros/gce-docker:latest
    privileged: true
    tty: true
    stdin_open: true
    labels:
      io.rancher.scheduler.global: 'true'
      io.rancher.container.create_agent: 'true'
      io.rancher.container.agent.role: environment
    volumes:
      - /:/rootfs
      - /run/docker/plugins:/run/docker/plugins
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/lib/rancher/volumes:/var/lib/rancher/volumes:shared

and

version: '2'
services:
  gce-driver:
    storage_driver:
      description: A GCE Volume Driver
      name: gce-driver
      scope: environment
      volume_access_mode: singleHostRW

Once created, you then need to turn this service into a system service by going to the API, clicking edit, and checking system. You then must upgrade the service without making any changes. This added the service to the storage UI.

However, it seemed you can’t use the UI as you need to mark the volumes as external as the external on your docker-compose files doesn’t seem to be respected? See https://github.com/rancher/rancher/issues/8497. I had to create volume templates via:

curl -u "${RANCHER_ACCESS_KEY}:${RANCHER_SECRET_KEY}" \
-X POST \
-H 'Content-Type: application/json' \
-d '{
  "driver": "gce-driver",
  "external": true,
  "name": "rancher-charts-selenium-v4"
}' 'https://rancher.bloomapi.com/v2-beta/projects/1a5/volumeTemplates'

before it seemed to work correctly… again, I have no idea how much of this actually did something and how much didn’t actually do what I think it did, but I now have a system that works with the GCE volumes plugin. Note that GCE specifically doesn’t work with anything except for environment scoped volumes due to: https://github.com/rancher/rancher/issues/8572

@untoldone, we had similar issues using the EFS driver. Apparently with 1.2 onwards rancher tries to manage the volumes for you which it wasn’t doing earlier.

If you have defined a volume with driver in your docker compose file, the Rancher will try to create a new volume for you even if it already exists (unless you have an external volume created in rancher) or if you are trying to mount the same named volume from multiple containers.

I went ahead and updated the EFS volume driver that Rancher had initially provided to fix the issues that I saw. This worked better for me than updating the volume driver that we were using since I am a “go” newbie.

The “system” services are what Rancher defines through “infra” templates. I think you might be able to setup GCE volume driver as the storage infrastructure using infra-templates.

On the application side, the way we handle EFS volumes now is:

  1. Make a Rancher API call to create a EFS volume, this just creates a logical entry on Rancher and no physical resources are created.
  2. Have Rancher automatically parse the named volume as external when it parses the docker-compose file.

Original docker-compose.yml

volumes:
      - ${Variable}_EFS:<container path>
      - ..........................
    volume_driver: rancher-efs

Parsed docker-compose.yml

version: '2'
    volumes:
      <variable value>_EFS:
        external: true
        driver: rancher-efs
   services: 
        ..
        ..
        volumes:
           - <variable value>_EFS:<container path>