Rancher become every slow with permissions issues


#1

Hello,

Since couple of days, with no reason my Rancher is becoming very slow to do some actions like upgrade. It take approx. 15-30 minutes to upgrade a container. Moreover all “scheduler” containers of all my environnments disappeared, it’s stuck like this:


And the container is not on the host too. I don’t know how to manually create one.

And finally, I can delete anything. My account is an Admin, and I’m owner of all the environments but I have and error message everytime I want to delete something (container, stack, API keys, environment, etc.):

Error deleting. 
Access Denied.
Description: You are not allowed to access the document you requested.

I’ve deleted my rancher container and recreated it with --env JAVA_OPTS="-Xmx4096m" but it doesn’t change anything.
I don’t find relevant error message in the logs of mysql server or rancher container.

I don’t know what to do or what to check to solve this big troubles.

Do you have ever experienced problems like that ?
Can you help me ?

Here are more information about my installation:

Rancher versions:
rancher/server: v1.6.22
rancher/rancher-agent: v1.2.11

Infrastructure Stack versions:
healthcheck: v0.3.6
ipsec: v0.13.11
network-services: v0.7.20
scheduler: v0.8.5

Docker version: (docker version,docker info preferred)

$ docker info
Containers: 16
 Running: 13
 Paused: 0
 Stopped: 3
Images: 18
Server Version: 18.03.1-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.48-coreos-r1
Operating System: Container Linux by CoreOS 1745.6.0 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 7.7GiB
Name: <serverhostname>
ID: KKPZ:NBGJ:SRL2:CQE5:KWEK:QB64:X7O2:SNIQ:UW26:GTCU:GJQ3:K6PH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 dckr-infra-01.cdsp.sciences-po.fr:5000
 127.0.0.0/8
Live Restore Enabled: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1745.6.0
VERSION_ID=1745.6.0
BUILD_ID=2018-06-08-0926
PRETTY_NAME="Container Linux by CoreOS 1745.6.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
$ uname -r
4.14.48-coreos-r1

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
VM on XenServer 7.1 hypervisor

Setup details: (single node rancher vs. HA rancher, internal DB vs. external DB)
I’m using rancher container directly linked to a mariadb container. There is no HA.

Environment Template: (Cattle/Kubernetes/Swarm/Mesos)
Cattle for all my environments.