Our infrastructure runs on AWS. We are have 1 server (c4.4xlarge) and 50 nodes (m4.large). The database for cattle is hosted in an RDS (db.m3.xlarge). We are running about 1100 stacks and 1 service (httpd) in each.
Our rancher server is running v1.0.1 and is NOT in HA mode.
Problem:
For the last 24 hours, our RDS has been under heavy load.
Spinning up new stack / service takes a really long time (20 mins, usually it takes 5 mins). Services are in “Waiting for [instance:xxxxxxxx_1]. Instance status: Networking” state for a long time.
2016-06-16 12:31:49,895 ERROR [444fa322-5139-40f6-8465-ab1b3b54b34c:2575718] [instance:30845] [instance.start->(InstanceStart)->instance.allocate] [] [torService-1883] [c.p.e.p.i.DefaultProcessInstanceImpl] Unknown exception io.cattle.platform.eventing.exception.EventExecutionException: Scheduling failed: No candidates available at io.cattle.platform.eventing.exception.EventExecutionException.fromEvent(EventExecutionException.java:53) ~[cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.eventing.impl.AbstractEventService.callSync(AbstractEventService.java:258) ~[cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.common.handler.EventBasedProcessHandler.handle(EventBasedProcessHandler.java:109) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.instance.InstanceAllocate.handle(InstanceAllocate.java:48) ~[cattle-iaas-logic-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandler(DefaultProcessInstanceImpl.java:446) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:393) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:387) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.idempotent.Idempotent.execute(Idempotent.java:42) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandlers(DefaultProcessInstanceImpl.java:387) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runLogic(DefaultProcessInstanceImpl.java:493) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runWithProcessLock(DefaultProcessInstanceImpl.java:320) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$2.doWithLockNoResult(DefaultProcessInstanceImpl.java:260) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:7) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:3) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.acquireLockAndRun(DefaultProcessInstanceImpl.java:257) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runDelegateLoop(DefaultProcessInstanceImpl.java:185) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.executeWithProcessInstanceLock(DefaultProcessInstanceImpl.java:158) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$1.doWithLock(DefaultProcessInstanceImpl.java:108) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$1.doWithLock(DefaultProcessInstanceImpl.java:105) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.execute(DefaultProcessInstanceImpl.java:105) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.common.handler.AbstractObjectProcessLogic.execute(AbstractObjectProcessLogic.java:131) [cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.instance.InstanceStart.allocate(InstanceStart.java:217) [cattle-iaas-logic-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.instance.InstanceStart.handle(InstanceStart.java:75) [cattle-iaas-logic-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandler(DefaultProcessInstanceImpl.java:446) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:393) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:387) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.idempotent.Idempotent.execute(Idempotent.java:42) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandlers(DefaultProcessInstanceImpl.java:387) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runLogic(DefaultProcessInstanceImpl.java:493) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
2016-06-16 12:31:45,824 ERROR [:] [] [] [] [ServiceReplay-2] [i.c.p.e.e.i.ProcessEventListenerImpl] Unknown exception running process [volume.activate:2280417] on [28158] io.cattle.platform.eventing.exception.AgentRemovedException: Agent is removed at io.cattle.platform.agent.impl.WrappedEventService.call(WrappedEventService.java:93) ~[cattle-iaas-agent-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.agent.impl.EventCallProgressHelper.call(EventCallProgressHelper.java:57) ~[cattle-iaas-agent-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.agent.impl.RemoteAgentImpl.call(RemoteAgentImpl.java:99) ~[cattle-iaas-agent-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.agent.impl.RemoteAgentImpl.callSync(RemoteAgentImpl.java:72) ~[cattle-iaas-agent-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.agent.impl.RemoteAgentImpl.callSync(RemoteAgentImpl.java:135) ~[cattle-iaas-agent-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.common.handler.AgentBasedProcessHandler.callSync(AgentBasedProcessHandler.java:180) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.common.handler.AgentBasedProcessHandler.handleEvent(AgentBasedProcessHandler.java:166) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.common.handler.AgentBasedProcessHandler.handle(AgentBasedProcessHandler.java:104) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandler(DefaultProcessInstanceImpl.java:446) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:393) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:387) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.idempotent.Idempotent.execute(Idempotent.java:42) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandlers(DefaultProcessInstanceImpl.java:387) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runLogic(DefaultProcessInstanceImpl.java:493) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runWithProcessLock(DefaultProcessInstanceImpl.java:320) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$2.doWithLockNoResult(DefaultProcessInstanceImpl.java:260) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:7) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:3) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.acquireLockAndRun(DefaultProcessInstanceImpl.java:257) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runDelegateLoop(DefaultProcessInstanceImpl.java:185) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.executeWithProcessInstanceLock(DefaultProcessInstanceImpl.java:158) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$1.doWithLock(DefaultProcessInstanceImpl.java:108) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$1.doWithLock(DefaultProcessInstanceImpl.java:105) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.execute(DefaultProcessInstanceImpl.java:105) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.object.process.impl.DefaultObjectProcessManager.executeStandardProcess(DefaultObjectProcessManager.java:29) ~[cattle-framework-object-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.common.handler.AbstractObjectProcessLogic.createIgnoreCancel(AbstractObjectProcessLogic.java:89) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.common.handler.AbstractObjectProcessLogic.createThenActivate(AbstractObjectProcessLogic.java:83) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.volume.VolumeActivate.activatePool(VolumeActivate.java:48) ~[cattle-iaas-logic-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.process.volume.VolumeActivate.handle(VolumeActivate.java:37) ~[cattle-iaas-logic-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandler(DefaultProcessInstanceImpl.java:446) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:393) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:387) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.idempotent.Idempotent.execute(Idempotent.java:42) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandlers(DefaultProcessInstanceImpl.java:387) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runLogic(DefaultProcessInstanceImpl.java:493) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runWithProcessLock(DefaultProcessInstanceImpl.java:320) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$2.doWithLockNoResult(DefaultProcessInstanceImpl.java:260) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:7) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:3) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.acquireLockAndRun(DefaultProcessInstanceImpl.java:257) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runDelegateLoop(DefaultProcessInstanceImpl.java:185) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.executeWithProcessInstanceLock(DefaultProcessInstanceImpl.java:158) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$1.doWithLock(DefaultProcessInstanceImpl.java:108) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$1.doWithLock(DefaultProcessInstanceImpl.java:105) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) ~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.execute(DefaultProcessInstanceImpl.java:105) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.eventing.impl.ProcessEventListenerImpl.processExecute(ProcessEventListenerImpl.java:74) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at io.cattle.platform.engine.server.impl.ProcessInstanceParallelDispatcher$1.runInContext(ProcessInstanceParallelDispatcher.java:27) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na] at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na] at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na] at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:108) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na] at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na] at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_95] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_95] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_95] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_95] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_95]
@msound can you get me the output from docker info on your agent host. Another helpful tidbit would be what processes are the highest CPU consumers on your agent hosts?
@msound
Can I trouble you for a couple more things, how many connections a second are you seeing on your DB? Can I also get the output from netstat -an on your rancher server and a loaded host, as well as the output from ps -ef on your rancher server and host.
Are you launching your server with more memory? e.g.: passing in -e JAVA_OPTS="-Xmx4096m" on your docker run, for running the rancher server.
It looks like your hitting an issue scheduling due to the amount of stacks within one environment. We’re working on a solution to fix this. What happens is during each stack launch we update metadata across all hosts. An interim solution could be add more ram, that could help scheduling speed on the rancher server. This would most likely be a short lived improvement. A more robust interim solution would be to separate hosts across environments within your rancher deployment. That would limit the scope of the broadcast effect where metadata has to be updated across all hosts. We are looking at solving quite a few of these issues in the 1.2 release, but thats due out ~Sept/Oct.
A guideline here would be to ensure you’re under 200 stacks / 750 containers per environment. I think thats the upper limit and if you stay below that, you should keep a responsive environment. Feel free to reach out if you hit any more issues beyond that.
Do you have an update on this? We are still having the same issues w/ the metadata service. We could split into multiple environments, but it would be very inconvenient.
@veered youre responding to a thread thats over a year old. I dont have much context wrt to your environment as well. Can you create a new issue and provide relevant details there?
@aemneina I know it’s a year old, but I wanted the people that were having this issue to see my response. I suspect that they still are having issues, since we’ve been having this issue for more than a year.
Briefly, when there are a lot of containers in a single environment redeploying can peg the host CPU’s to 100%, is really slow, and can sometimes cause host disconnects. This is because redeploying causes lots and lots of metadata updates, which need to be propagated to all metadata subscribers.
According to the Github issue I linked to, most of this time is spent encoding/decoding YAML files haha. So even if the only change was using JSON as the data format, things might be fine (since JSON encoding/decoding is like 20x faster than YAML encoding/decoding).
The reason they are YAML is that it supports references, as JSON the metadata would be hundreds of megabytes of redundant info because the same information is available in many different paths.