I posted before about 2.5 weeks ago, and got no help.
We have been “using” Rancher for 2+ months, in production, and committed to it for production, which is feeling like a mistake.
We can put a light load on it, and it seems to work fine… but once you push anything to it (30 to 50 services), it starts really misbehaving… deployments taking 1 hour+, a lot of services not starting or working, the front end not responding.
And, the UI goes unresponsive, forcing a reboot.
I’m not sure what the problem is with it. We’ve tried a myriad of things. I posted 2+ weeks ago, filled in a bunch of details for people, and got no answer.
So, Is Rancher BS and not really ready for this? I need to know. It’s killing my little startup at this point, with a simple “code push” causing massive instability, downtime, and an 1 day+ of cleaning crap up to get things running again.
I worked all morning to reduce the image size to around 1Gb. We managed to get a subset of the system up but now, as is becoming usual with Rancher… the UI is giving us the grey spinning and no actual UI now. The “work” of setup is already done (I think), and now it just died.
To give even more data. this is an 8 core, 64GB RAM machine with a 1Tb disk. Database is hosted on same machine. The machine doesn’t show any load at all.
What version of Rancher? What version of Docker? is the DB internal to Rancher server, or do you have a mysql service running on the box which holds the DB? Where are your servers hosted? What are the specs of the Rancher hosts? What is the disk I/O of the Rancher server and hosts when it takes 1 hour to deploy something.
Is there any firewall or proxy between the hosts and the Rancher server?
We are so sorry about the problems you are suffering. Obviously, this behavior doesn’t seem normal.
Additionally to @Phillip_Ulberg questions, could you please tell us, how many servers do you have?? Your system is comming from an upgrade or is a fresh installation?? Your rancher server is installed in HA or standalone??
Is there any possibility that we could connect to your system to take a deeper view??
(FYI @ibuildthecloud and I went through a bunch of stuff with @William_Flanagan on IRC Friday and at least one problem is the external DB being non-responsive)
So, update. Totally wiping and starting from scratch, doing a bunch of tuning to get the system as clean and pristine as possible, changing our deploy strategy to catalog updates and a manual, 1 by 1, push, we were able to get through an initial deploy.
However, on upgrade, we died again… exact same symptoms. And, the system how now reverted to the behavior above.
FYI Vincent i tried to reach out to you on IRC as well to give you an update yesterday… and now today, i’m back to totally wiping and rebuilding things from scratch.
This is miserable. As a startup, my app is DOWN right now, as I got most of my back end deployed, but can’t get my front end deployed as Rancher has locked up/crashed.
@rawmind. 1 server (standalone). Fresh, 1.2.0 installation (no upgrade from 1.1.4 from a DB perspective). I have 10 “hosts”, mysql is cohosted.The rancher server is on a machine with 64GB RAM and a 2TB HDD.
And, I’m happy to give someone access to look around… I offered that to Vincent last Friday.
root@69209d48bf71:/etc/mysql# service mysql restart
Stopping MySQL database server mysqld /usr/sbin/mysqld: error while loading shared libraries: libaio.so.1: cannot open shared object file: Permission denied
Not sure if that’s a red herring, or something legitimate. So, I thought I’d add it here.
And last bit, more about the “environment”. There are 10 nodes, each 64GB RAM nodes, running 13 to 30 containers. Each container has at least 2GB potentially available to it. Its hard to image that this would need more hardware.
This is from our Kibana logserver (Kibana is NOT running inside Rancher).
Like I said before, could be a red herring. But, its does correlate that my app’s existing, running deployed processes are being locked down by Rancher, and not running.
I’d love to login and look, but my app is currently down because i’m the middle of a busted config.
Hi @rawmind we sent a key, usernames and passwords for you to get in. Please let me know when you are done so we can remove… as we do not typically have user-based login enabled.