A simple way to configure scaling to zero for some deployments is a very appealing feature for us. To be useful I think we would need to be able to specify the amount of time before k8s would initiate scale down. Especially for apps which require a significant initialization delay, like 30 seconds for instance, this could make scaling to zero tolerable. For instance, for a customer training instance, relatively infrequently used, we could configure to scale between 0-1. But scaling down would take 3 hours, and we could explain that this inconvenience would only be experienced by the first user. They could ping the app before the students get in and nobody would notice.
Would an app wake up when any traffic at all comes in? I wonder how many “false positive” wakeup requests would in reality be received and cause an instance to wake up even though no users care. Noodling on what ideal might look like (at least for our apps), it might be to present a login screen without waking up the app. And initiating wakeup once some user actually clicked login. Seems like that would filter out lots of the random noise, where misc scanning background noise from the internet might otherwise keep the endpoint needlessly active.
…I’d be very interested in any more details around how this feature works, how effective it might be given the above, etc. And if devs think there is sufficient merit in specifying a ‘cool down’ time for scaling events, this might be a prerequisite for adoption for some of us. If it is 2 minutes for instance, I think it would happen many times throughout the day.