HELP - Rancher Standalone - Crash - How to recovery AKS Cluster

Hi all,

i have installed a Rancher Stand Alone version and i have created a AKS Cluster using that Rancher.

I have deployed many applications, but today i found that the Rancher UI is not reachable.

So i try to restart the Rancher with the usual command:

docker run -d --restart=unless-stopped -p 80:80 -p 443:443 -v /root/cert.pem:/etc/rancher/ssl/cert.pem -v /root/key.pem:/etc/rancher/ssl/key.pem rancher/rancher:latest --no-cacerts

But i can’t see my AKS in Rancher.

PLEASE HELP ME! Is there any chance to recovery my cluster?

Re-create it and re-install all application will take many time and i can’t do my DEMO.

HELP ME!!

What was the state of the host/Docker/rancher container when it was not reachable? That is probably the best lead on what happened.

The command shown won’t restart the container, but it will create a new one. Which shouldn’t be an issue as long as the previous container had a persistent volume mounted to store Rancher’s data. (see https://rancher.com/docs/rancher/v2.x/en/installation/other-installation-methods/single-node/advanced/#persistent-data). If you have that, adding the -v parameter and mounting the data in, should result in Rancher being restored. If not, you will have to check if the old container is still there and start that one. If that container is broken (and also starting a new one with --volumes-from does’t work), you can look inside the Docker data directory to find the files with the data.

Hi superseb,

the docker was not running and if i tried to restart the docker it can bind the port 443 but it doesn’t show any certificate:

[root@AIRETAIL-RANCHER _data]# openssl s_client -connect 127.0.0.1:443
CONNECTED(00000003)
139923439417232:error:140790E5:SSL routines:ssl23_write:ssl handshake failure:s23_lib.c:177:

no peer certificate available

No client certificate CA names sent

SSL handshake has read 0 bytes and written 289 bytes

New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : 0000
Session-ID:
Session-ID-ctx:
Master-Key:
Key-Arg : None
Krb5 Principal: None
PSK identity: None
PSK identity hint: None
Start Time: 1579202638
Timeout : 300 (sec)
Verify return code: 0 (ok)

so we tried to restart with the command that you say create another rancher (and i didn’t know that comand start a new rancher)

The docker process is running in this way:

/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

and so there isn’t any persistent volumes.

Do i have lost everything? :frowning:

Please share the output of:

docker ps -a
docker inspect id_of_original_rancher/rancher_container
docker logs id_of_original_rancher/rancher_container

Make sure to redact any sensitive info.

Hi superseb, below the output of command (now it’s running the original rancher):

[root@AIRETAIL-RANCHER ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fe804af609da rancher/rancher:latest “entrypoint.sh --no-…” 2 hours ago Exited (1) 40 minutes ago tender_dirac
35ee616eb36c rancher/rancher:latest “entrypoint.sh --no-…” 4 weeks ago Up 21 seconds 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp flamboyant_wright

[root@AIRETAIL-RANCHER ~]# docker inspect 35ee616eb36c
[
{
“Id”: “35ee616eb36c5465330da6feeeb73a885dc17fe13fafbf6f3e10c09e64ecab3e”,
“Created”: “2019-12-13T18:00:18.099313962Z”,
“Path”: “entrypoint.sh”,
“Args”: [
“–no-cacerts”
],
“State”: {
“Status”: “running”,
“Running”: true,
“Paused”: false,
“Restarting”: false,
“OOMKilled”: false,
“Dead”: false,
“Pid”: 36061,
“ExitCode”: 0,
“Error”: “”,
“StartedAt”: “2020-01-16T20:04:02.218385076Z”,
“FinishedAt”: “2020-01-16T20:03:57.36649658Z”
},
“Image”: “sha256:5ebba94410d81660683c61c9713153374ccce19bfe333a878093c7996da41c28”,
“ResolvConfPath”: “/var/lib/docker/containers/35ee616eb36c5465330da6feeeb73a885dc17fe13fafbf6f3e10c09e64ecab3e/resolv.conf”,
“HostnamePath”: “/var/lib/docker/containers/35ee616eb36c5465330da6feeeb73a885dc17fe13fafbf6f3e10c09e64ecab3e/hostname”,
“HostsPath”: “/var/lib/docker/containers/35ee616eb36c5465330da6feeeb73a885dc17fe13fafbf6f3e10c09e64ecab3e/hosts”,
“LogPath”: “/var/lib/docker/containers/35ee616eb36c5465330da6feeeb73a885dc17fe13fafbf6f3e10c09e64ecab3e/35ee616eb36c5465330da6feeeb73a885dc17fe13fafbf6f3e10c09e64ecab3e-json.log”,
“Name”: “/flamboyant_wright”,
“RestartCount”: 3,
“Driver”: “overlay2”,
“Platform”: “linux”,
“MountLabel”: “”,
“ProcessLabel”: “”,
“AppArmorProfile”: “”,
“ExecIDs”: null,
“HostConfig”: {
“Binds”: [
“/root/cert.pem:/etc/rancher/ssl/cert.pem”,
“/root/key.pem:/etc/rancher/ssl/key.pem”
],
“ContainerIDFile”: “”,
“LogConfig”: {
“Type”: “json-file”,
“Config”: {}
},
“NetworkMode”: “default”,
“PortBindings”: {
“443/tcp”: [
{
“HostIp”: “”,
“HostPort”: “443”
}
],
“80/tcp”: [
{
“HostIp”: “”,
“HostPort”: “80”
}
]
},
“RestartPolicy”: {
“Name”: “unless-stopped”,
“MaximumRetryCount”: 0
},
“AutoRemove”: false,
“VolumeDriver”: “”,
“VolumesFrom”: null,
“CapAdd”: null,
“CapDrop”: null,
“Capabilities”: null,
“Dns”: [],
“DnsOptions”: [],
“DnsSearch”: [],
“ExtraHosts”: null,
“GroupAdd”: null,
“IpcMode”: “private”,
“Cgroup”: “”,
“Links”: null,
“OomScoreAdj”: 0,
“PidMode”: “”,
“Privileged”: false,
“PublishAllPorts”: false,
“ReadonlyRootfs”: false,
“SecurityOpt”: null,
“UTSMode”: “”,
“UsernsMode”: “”,
“ShmSize”: 67108864,
“Runtime”: “runc”,
“ConsoleSize”: [
0,
0
],
“Isolation”: “”,
“CpuShares”: 0,
“Memory”: 0,
“NanoCpus”: 0,
“CgroupParent”: “”,
“BlkioWeight”: 0,
“BlkioWeightDevice”: [],
“BlkioDeviceReadBps”: null,
“BlkioDeviceWriteBps”: null,
“BlkioDeviceReadIOps”: null,
“BlkioDeviceWriteIOps”: null,
“CpuPeriod”: 0,
“CpuQuota”: 0,
“CpuRealtimePeriod”: 0,
“CpuRealtimeRuntime”: 0,
“CpusetCpus”: “”,
“CpusetMems”: “”,
“Devices”: [],
“DeviceCgroupRules”: null,
“DeviceRequests”: null,
“KernelMemory”: 0,
“KernelMemoryTCP”: 0,
“MemoryReservation”: 0,
“MemorySwap”: 0,
“MemorySwappiness”: null,
“OomKillDisable”: false,
“PidsLimit”: null,
“Ulimits”: null,
“CpuCount”: 0,
“CpuPercent”: 0,
“IOMaximumIOps”: 0,
“IOMaximumBandwidth”: 0,
“MaskedPaths”: [
“/proc/asound”,
“/proc/acpi”,
“/proc/kcore”,
“/proc/keys”,
“/proc/latency_stats”,
“/proc/timer_list”,
“/proc/timer_stats”,
“/proc/sched_debug”,
“/proc/scsi”,
“/sys/firmware”
],
“ReadonlyPaths”: [
“/proc/bus”,
“/proc/fs”,
“/proc/irq”,
“/proc/sys”,
“/proc/sysrq-trigger”
]
},
“GraphDriver”: {
“Data”: {
“LowerDir”: “/var/lib/docker/overlay2/e4edfa3b0713329c861c94f318d3f8a86d73652b016a90ef866719397553ef48-init/diff:/var/lib/docker/overlay2/7a60b10ac7068010f2ad0b183bfc4e1d8e89848179e52a2c5897103b55d9dfcd/diff:/var/lib/docker/overlay2/fc4ffdf9b3bb767a193d654904c1a803628086bcfb9d918d15ad55fca5516375/diff:/var/lib/docker/overlay2/4ae1180fb0698d575b72734b51363bd5cb362b07b0078e54c7ff47955385930a/diff:/var/lib/docker/overlay2/1fef9833b6b3362c4c51cbd4c3bb256572fc69fac031e1a04cbca24f11bde643/diff:/var/lib/docker/overlay2/1c9d0f82b046628ff645225b0c7407ba99cf978c8a93de1a2c1645eecb24d278/diff:/var/lib/docker/overlay2/832365129b030a4d8e28aa28ba994f3db98554521c32304ef108f8345538d399/diff:/var/lib/docker/overlay2/724bb2fb63b8b6fba5ce51df828d2e68b9ca45444d8a9c0f08f8c633023055a2/diff:/var/lib/docker/overlay2/d521d518e38788b3dfaa1c48e9b5c1d5dfdb412141b07025e989f70168e7f5ec/diff:/var/lib/docker/overlay2/ef35d7df17ffcd47d0c00b901fd8dcd98e928dccd9650c71f2d619856e077a4a/diff:/var/lib/docker/overlay2/0365e59cbcfa46e96ef38e640bc18c02b562fe0899cfc02379ef0a1df1612513/diff:/var/lib/docker/overlay2/dc761c7e95caf290543ba255ee48f7d5160ed95e59a7896b378402fd4dc90336/diff:/var/lib/docker/overlay2/8d016831604f8ae35adcef8f9dba78cebc63448f59ec021e030241086b57d11d/diff:/var/lib/docker/overlay2/d299c5bd01af51574255be795941a5baf23b626de6a9bbd5503a313896bca9f8/diff:/var/lib/docker/overlay2/69ca9da9406e5609f62de78cd32f2530aea397c9402f5bae1473bf548deca574/diff”,
“MergedDir”: “/var/lib/docker/overlay2/e4edfa3b0713329c861c94f318d3f8a86d73652b016a90ef866719397553ef48/merged”,
“UpperDir”: “/var/lib/docker/overlay2/e4edfa3b0713329c861c94f318d3f8a86d73652b016a90ef866719397553ef48/diff”,
“WorkDir”: “/var/lib/docker/overlay2/e4edfa3b0713329c861c94f318d3f8a86d73652b016a90ef866719397553ef48/work”
},
“Name”: “overlay2”
},
“Mounts”: [
{
“Type”: “bind”,
“Source”: “/root/cert.pem”,
“Destination”: “/etc/rancher/ssl/cert.pem”,
“Mode”: “”,
“RW”: true,
“Propagation”: “rprivate”
},
{
“Type”: “bind”,
“Source”: “/root/key.pem”,
“Destination”: “/etc/rancher/ssl/key.pem”,
“Mode”: “”,
“RW”: true,
“Propagation”: “rprivate”
},
{
“Type”: “volume”,
“Name”: “6df4f76640ddb4a81b3dd9aae3b5d592a68d093a85573d05cc6cd45a018726ff”,
“Source”: “/var/lib/docker/volumes/6df4f76640ddb4a81b3dd9aae3b5d592a68d093a85573d05cc6cd45a018726ff/_data”,
“Destination”: “/var/lib/rancher”,
“Driver”: “local”,
“Mode”: “”,
“RW”: true,
“Propagation”: “”
},
{
“Type”: “volume”,
“Name”: “9a532ee45ac72031b6c614e4a9e7a37f6c730e667cf8c8e8c1608e7b71387413”,
“Source”: “/var/lib/docker/volumes/9a532ee45ac72031b6c614e4a9e7a37f6c730e667cf8c8e8c1608e7b71387413/_data”,
“Destination”: “/var/log/auditlog”,
“Driver”: “local”,
“Mode”: “”,
“RW”: true,
“Propagation”: “”
}
],
“Config”: {
“Hostname”: “35ee616eb36c”,
“Domainname”: “”,
“User”: “”,
“AttachStdin”: false,
“AttachStdout”: false,
“AttachStderr”: false,
“ExposedPorts”: {
“443/tcp”: {},
“80/tcp”: {}
},
“Tty”: false,
“OpenStdin”: false,
“StdinOnce”: false,
“Env”: [
“PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”,
“CATTLE_SYSTEM_CHART_DEFAULT_BRANCH=release-v2.3”,
“CATTLE_HELM_VERSION=v2.14.3-rancher1”,
“CATTLE_K3S_VERSION=v0.8.0”,
“CATTLE_MACHINE_VERSION=v0.15.0-rancher20-1”,
“CATTLE_ETCD_VERSION=v3.3.14”,
“LOGLEVEL_VERSION=v0.1.2”,
“TINI_VERSION=v0.18.0”,
“TELEMETRY_VERSION=v0.5.10”,
“KUBECTL_VERSION=v1.16.1”,
“DOCKER_MACHINE_LINODE_VERSION=v0.1.8”,
“LINODE_UI_DRIVER_VERSION=v0.3.0”,
“TINI_URL_amd64=https://github.com/krallin/tini/releases/download/v0.18.0/tini”,
“TINI_URL_arm64=https://github.com/krallin/tini/releases/download/v0.18.0/tini-arm64”,
“TINI_URL=TINI_URL_amd64”,
“HELM_URL_amd64=https://github.com/rancher/helm/releases/download/v2.14.3-rancher1/rancher-helm”,
“HELM_URL_arm64=https://github.com/rancher/helm/releases/download/v2.14.3-rancher1/rancher-helm-arm64”,
“HELM_URL=HELM_URL_amd64”,
“TILLER_URL_amd64=https://github.com/rancher/helm/releases/download/v2.14.3-rancher1/rancher-tiller”,
“TILLER_URL_arm64=https://github.com/rancher/helm/releases/download/v2.14.3-rancher1/rancher-tiller-arm64”,
“TILLER_URL=TILLER_URL_amd64”,
“K3S_URL_amd64=https://github.com/rancher/k3s/releases/download/v0.8.0/k3s”,
“K3S_URL_arm64=https://github.com/rancher/k3s/releases/download/v0.8.0/k3s-arm64”,
“K3S_URL=K3S_URL_amd64”,
“ETCD_URL_amd64=https://github.com/etcd-io/etcd/releases/download/v3.3.14/etcd-v3.3.14-linux-amd64.tar.gz”,
“ETCD_URL_arm64=https://github.com/etcd-io/etcd/releases/download/v3.3.14/etcd-v3.3.14-linux-arm64.tar.gz”,
“ETCD_URL=ETCD_URL_amd64”,
“CATTLE_UI_PATH=/usr/share/rancher/ui”,
“CATTLE_UI_VERSION=2.3.25”,
“CATTLE_CLI_VERSION=v2.3.2”,
“CATTLE_API_UI_VERSION=1.1.6”,
“AUDIT_LOG_PATH=/var/log/auditlog/rancher-api-audit.log”,
“AUDIT_LOG_MAXAGE=10”,
“AUDIT_LOG_MAXBACKUP=10”,
“AUDIT_LOG_MAXSIZE=100”,
“AUDIT_LEVEL=0”,
“CATTLE_CLI_URL_DARWIN=https://releases.rancher.com/cli2/v2.3.2/rancher-darwin-amd64-v2.3.2.tar.gz”,
“CATTLE_CLI_URL_LINUX=https://releases.rancher.com/cli2/v2.3.2/rancher-linux-amd64-v2.3.2.tar.gz”,
“CATTLE_CLI_URL_WINDOWS=https://releases.rancher.com/cli2/v2.3.2/rancher-windows-386-v2.3.2.zip”,
“CATTLE_SERVER_VERSION=v2.3.3”,
“CATTLE_AGENT_IMAGE=rancher/rancher-agent:v2.3.3”,
“CATTLE_SERVER_IMAGE=rancher/rancher”,
“ETCD_UNSUPPORTED_ARCH=amd64”,
“ETCDCTL_API=3”,
“SSL_CERT_DIR=/etc/rancher/ssl”
],
“Cmd”: [
“–no-cacerts”
],
“Image”: “rancher/rancher:latest”,
“Volumes”: {
“/var/lib/rancher”: {},
“/var/log/auditlog”: {}
},
“WorkingDir”: “/var/lib/rancher”,
“Entrypoint”: [
“entrypoint.sh”
],
“OnBuild”: null,
“Labels”: {
“org.label-schema.build-date”: “2019-11-27T19:08:53Z”,
“org.label-schema.schema-version”: “1.0”,
“org.label-schema.vcs-ref”: “a61b8c734e56bcdddc30394cb075c2203bf074f2”,
“org.label-schema.vcs-url”: “https://github.com/rancher/rancher.git
}
},
“NetworkSettings”: {
“Bridge”: “”,
“SandboxID”: “fdfc92e43d94cb86cc8179faf9a2aebb16df7ad999914a0fb13537cdc5130291”,
“HairpinMode”: false,
“LinkLocalIPv6Address”: “”,
“LinkLocalIPv6PrefixLen”: 0,
“Ports”: {
“443/tcp”: [
{
“HostIp”: “0.0.0.0”,
“HostPort”: “443”
}
],
“80/tcp”: [
{
“HostIp”: “0.0.0.0”,
“HostPort”: “80”
}
]
},
“SandboxKey”: “/var/run/docker/netns/fdfc92e43d94”,
“SecondaryIPAddresses”: null,
“SecondaryIPv6Addresses”: null,
“EndpointID”: “b63e9a51f92c52235023242ee9b94f1918bf140acdec04d70a7dcdd21c231cb4”,
“Gateway”: “172.17.0.1”,
“GlobalIPv6Address”: “”,
“GlobalIPv6PrefixLen”: 0,
“IPAddress”: “172.17.0.2”,
“IPPrefixLen”: 16,
“IPv6Gateway”: “”,
“MacAddress”: “02:42:ac:11:00:02”,
“Networks”: {
“bridge”: {
“IPAMConfig”: null,
“Links”: null,
“Aliases”: null,
“NetworkID”: “b9b16a9dbad6a16200793d68f1b2b2b4592b0b7542cc6767de634b3562521aba”,
“EndpointID”: “b63e9a51f92c52235023242ee9b94f1918bf140acdec04d70a7dcdd21c231cb4”,
“Gateway”: “172.17.0.1”,
“IPAddress”: “172.17.0.2”,
“IPPrefixLen”: 16,
“IPv6Gateway”: “”,
“GlobalIPv6Address”: “”,
“GlobalIPv6PrefixLen”: 0,
“MacAddress”: “02:42:ac:11:00:02”,
“DriverOpts”: null
}
}
}
}
]

The logs are very big, how can i share this logs?

thanks a lot!

Alessio

I assume it keeps running (and doesn’t exit)? The openssl command can take a few moments to properly respond, as Rancher needs to start and the webserver needs to start serving. Does it still reply with the same output at least 1 minute after startup?

You can place all the logs in a GitHub Gist or pastebin, or you can place the last 50 lines or something to get an indication of the issue.

If the container stays running after starting, please try the openssl command and possibly curl -sLk https://127.0.0.1 to check what the response is.

Hi superseb,

i tried with the command openssl and curl:

[root@AIRETAIL-RANCHER ~]# curl -sLk https://127.0.0.1
[root@AIRETAIL-RANCHER ~]#
[root@AIRETAIL-RANCHER ~]# openssl s_client -connect 127.0.0.1:443
CONNECTED(00000003)
140503558477712:error:140790E5:SSL routines:ssl23_write:ssl handshake failure:s23_lib.c:177:

no peer certificate available

No client certificate CA names sent

SSL handshake has read 0 bytes and written 289 bytes

New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : 0000
Session-ID:
Session-ID-ctx:
Master-Key:
Key-Arg : None
Krb5 Principal: None
PSK identity: None
PSK identity hint: None
Start Time: 1579260582
Timeout : 300 (sec)
Verify return code: 0 (ok)

So docker doesn’t expose the certificate.

Please Help me! :frowning:

Hi superseb,

after the reboot of the server of Rancher with an increase of RAM of the server, now it works again without problem.

Is it possible that the RAM can cause this problem? Before the RAM of server was 8GB and now is 16 GB.

Thanks,

Alessio

Do you know if something else on the server was using memory or just Rancher? The requirements are listed on https://rancher.com/docs/rancher/v2.x/en/installation/requirements/.

Do you have the logging from the container when it wasn’t reachable?

Hi superseb,

only rancher is running on that machine. According your link, the medium size is with 8GB but using a centos on Azure this memory was not sufficient.

I don’t have the logging :frowning: