Cattle-cluster-agent can't bind to port 6443

I’ve been using Rancher 2 in our development environment for a few months, hosted on DigitalOcean. We’re now working on a bare-metal installation in a test environment, with all sorts of restrictions and requirements, and have had a few problems, that we’ve managed to overcome.
But we don’t really know what to do with the following.

We’re running Rancher 2 in HA mode with a single node for the Rancher cluster.
Everything set up nicely, and we have pods running fine on the User cluster, communicating, everything is great.
But I suddenly noticed under the cattle-system namespace of the local (system) cluster, that the cattle-cluster-agent is constantly updating, and has thousands of restarts since we installed the environment earlier this week.
When viewing the log, it terminates on
time=“2020-06-30T13:22:03Z” level=info msg=“Refreshing all schemas”
time=“2020-06-30T13:22:03Z” level=fatal msg=“steve exited: listen tcp 127.0.0.1:6443: bind: address already in use”

The cattle-cluster-agent pod is running on the rancher cluster node, and running netstat on the node shows that port 6443 is taken by kube-apiserve.

As I mentioned before, currently we’re not seeing any problem with functionality in our cluster. Is this a problem that should be dealt with? Or when we add another node to the Rancher cluster this will go away?
Alternatively, is there anyway to change the port of the kube-apiserve through rancher, or to allow the cattle-cluster-agent to use a different port?
Thanks for making an amazing product!

We’ll take a look at this.

What version of rancher are you running?

Thanks!
I’m running Rancher v2.4.5.

Are you able to ssh onto the host where that cluster agent container is running, do a docker inspect on it and post the results here?

Hello @bicycledan.
I was not able to reproduce this issue on a 1 node HA setup running Rancher v2.4.5.

  • Ubuntu 20.04
  • Docker 19.03
  • k8s: v1.18.3

I also tried a 1 node imported RKE cluster.

  • RKE v1.1.3
  • Ubuntu 20.04
  • k8s: v1.18.3

The cattle-cluster-agent logs did not show this error in both setups.
If there are any more details that would be helpful in investigating, please post them.
Thank you.

Hi,
I am facing the same issue, & nodes cluster (3 ETCD +Control Plane and 4 Worker Node)
The [cattle-cluster-agent] Pod is crashing , some logs are mentioned below,

   time="2020-07-02T17:56:36Z" level=info msg="Watching metadata for node.k8s.io/v1beta1, Kind=RuntimeClass"

time="2020-07-02T17:56:36Z" level=info msg="Watching metadata for policy/v1beta1, Kind=PodSecurityPolicy"

time="2020-07-02T17:56:36Z" level=info msg="Watching metadata for crd.projectcalico.org/v1, Kind=BGPConfiguration"

I0702 17:56:37.267405       1 request.go:621] Throttling request took 1.197714061s, request: GET:https://10.43.0.1:443/apis/rbac.authorization.k8s.io/v1/rolebindings?limit=500

I0702 17:56:46.124085       1 trace.go:116] Trace[1437902002]: "Reflector ListAndWatch" name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-07-02 17:56:36.072931722 +0000 UTC m=+12.812229782) (total time: 10.051122322s):

Trace[1437902002]: [10.051098009s] [10.051098009s] Objects listed

I0702 17:56:46.487078       1 trace.go:116] Trace[793909336]: "Reflector ListAndWatch" name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-07-02 17:56:36.073174696 +0000 UTC m=+12.812472756) (total time: 10.413866082s):

Trace[793909336]: [10.413828361s] [10.413828361s] Objects listed

I0702 17:56:46.730959       1 trace.go:116] Trace[1149509107]: "Reflector ListAndWatch" name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-07-02 17:56:36.073037903 +0000 UTC m=+12.812335961) (total time: 10.657887413s):

Trace[1149509107]: [10.657797665s] [10.657797665s] Objects listed

I0702 17:56:46.823580       1 trace.go:116] Trace[512906503]: "Reflector ListAndWatch" name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-07-02 17:56:36.073196202 +0000 UTC m=+12.812494241) (total time: 10.75035345s):

Trace[512906503]: [10.750284868s] [10.750284868s] Objects listed

I0702 17:56:46.869402       1 trace.go:116] Trace[272199843]: "Reflector ListAndWatch" name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-07-02 17:56:36.073320518 +0000 UTC m=+12.812618575) (total time: 10.796051918s):

Trace[272199843]: [10.796031992s] [10.796031992s] Objects listed

I0702 17:56:47.142679       1 trace.go:116] Trace[881161598]: "Reflector ListAndWatch" name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-07-02 17:56:36.073392929 +0000 UTC m=+12.812690989) (total time: 11.069246511s):

Trace[881161598]: [11.069179081s] [11.069179081s] Objects listed

I0702 17:56:47.270546       1 trace.go:116] Trace[1342381351]: "Reflector ListAndWatch" name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-07-02 17:56:36.073526612 +0000 UTC m=+12.812824673) (total time: 11.196987046s):

Trace[1342381351]: [11.196933818s] [11.196933818s] Objects listed

I0702 17:56:47.467361       1 request.go:621] Throttling request took 11.393630398s, request: GET:https://10.43.0.1:443/api/v1/serviceaccounts?limit=500

I0702 17:56:47.475003       1 trace.go:116] Trace[1339189757]: "Reflector ListAndWatch" name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-07-02 17:56:36.073645358 +0000 UTC m=+12.812943416) (total time: 11.401329337s):

Trace[1339189757]: [11.401204867s] [11.401204867s] Objects listed

I0702 17:56:47.708069       1 trace.go:116] Trace[1783558010]: "Reflector ListAndWatch" name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-07-02 17:56:36.073711522 +0000 UTC m=+12.813009561) (total time: 11.634325869s):

Trace[1783558010]: [11.634278798s] [11.634278798s] Objects listed

time="2020-07-02T17:56:47Z" level=fatal msg="steve exited: listen tcp 127.0.0.1:6443: bind: address already in use"[quote="cjellick, post:2, topic:17851, full:true"]

[/quote]
and port bind with kube-apiserver,

netstat -tulpn | grep 6443
tcp6       0      0 :::6443                 :::*                    LISTEN      18979/kube-apiserve 

Please help on this.

Thanks

@cjellick
Sure, output is below.

@aaronRancher,
We’re running on

  • Ubuntu 18.04

  • Rancher 2.4.5

  • Docker 19.03.12

  • k8s: v1.17.6
    I don’t really know where to go from here… Any pointers?

    [
    {
    “Id”: “8a523d546456ccb5d7a1dc973b091503ab4f7b491f6b96e232f1ca487809cdbb”,
    “Created”: “2020-07-02T19:25:18.719926124Z”,
    “Path”: “run.sh”,
    “Args”: [],
    “State”: {
    “Status”: “running”,
    “Running”: true,
    “Paused”: false,
    “Restarting”: false,
    “OOMKilled”: false,
    “Dead”: false,
    “Pid”: 5104,
    “ExitCode”: 0,
    “Error”: “”,
    “StartedAt”: “2020-07-02T19:25:19.035656788Z”,
    “FinishedAt”: “0001-01-01T00:00:00Z”
    },
    “Image”: “sha256:2e6c7ac4e07232fd68d06ed269472cc01b179b808e1f6f5c5af43a9cae3a2778”,
    “ResolvConfPath”: “/var/lib/docker/containers/47c6c2a7a47d230c253f7fcae6cb97e247fb20e958553c638a725ac2e7682263/resolv.conf”,
    “HostnamePath”: “/var/lib/docker/containers/47c6c2a7a47d230c253f7fcae6cb97e247fb20e958553c638a725ac2e7682263/hostname”,
    “HostsPath”: “/var/lib/kubelet/pods/004b72b4-a819-450a-81c6-fc5e5cf80194/etc-hosts”,
    “LogPath”: “/var/lib/docker/containers/8a523d546456ccb5d7a1dc973b091503ab4f7b491f6b96e232f1ca487809cdbb/8a523d546456ccb5d7a1dc973b091503ab4f7b491f6b96e232f1ca487809cdbb-json.log”,
    “Name”: “/k8s_cluster-register_cattle-cluster-agent-76f6664dbf-9h6jg_cattle-system_004b72b4-a819-450a-81c6-fc5e5cf80194_562”,
    “RestartCount”: 0,
    “Driver”: “overlay2”,
    “Platform”: “linux”,
    “MountLabel”: “”,
    “ProcessLabel”: “”,
    “AppArmorProfile”: “docker-default”,
    “ExecIDs”: null,
    “HostConfig”: {
    “Binds”: [
    “/var/lib/kubelet/pods/004b72b4-a819-450a-81c6-fc5e5cf80194/volumes/kubernetes.io~secret/cattle-credentials:/cattle-credentials:ro”,
    “/var/lib/kubelet/pods/004b72b4-a819-450a-81c6-fc5e5cf80194/volumes/kubernetes.io~secret/cattle-token-sjgk5:/var/run/secrets/kubernetes.io/serviceaccount:ro”,
    “/var/lib/kubelet/pods/004b72b4-a819-450a-81c6-fc5e5cf80194/etc-hosts:/etc/hosts”,
    “/var/lib/kubelet/pods/004b72b4-a819-450a-81c6-fc5e5cf80194/containers/cluster-register/c9da478b:/dev/termination-log”
    ],
    “ContainerIDFile”: “”,
    “LogConfig”: {
    “Type”: “json-file”,
    “Config”: {}
    },
    “NetworkMode”: “container:47c6c2a7a47d230c253f7fcae6cb97e247fb20e958553c638a725ac2e7682263”,
    “PortBindings”: null,
    “RestartPolicy”: {
    “Name”: “no”,
    “MaximumRetryCount”: 0
    },
    “AutoRemove”: false,
    “VolumeDriver”: “”,
    “VolumesFrom”: null,
    “CapAdd”: null,
    “CapDrop”: null,
    “Capabilities”: null,
    “Dns”: null,
    “DnsOptions”: null,
    “DnsSearch”: null,
    “ExtraHosts”: null,
    “GroupAdd”: null,
    “IpcMode”: “container:47c6c2a7a47d230c253f7fcae6cb97e247fb20e958553c638a725ac2e7682263”,
    “Cgroup”: “”,
    “Links”: null,
    “OomScoreAdj”: 1000,
    “PidMode”: “”,
    “Privileged”: false,
    “PublishAllPorts”: false,
    “ReadonlyRootfs”: false,
    “SecurityOpt”: [
    “seccomp=unconfined”
    ],
    “UTSMode”: “host”,
    “UsernsMode”: “”,
    “ShmSize”: 67108864,
    “Runtime”: “runc”,
    “ConsoleSize”: [
    0,
    0
    ],
    “Isolation”: “”,
    “CpuShares”: 2,
    “Memory”: 0,
    “NanoCpus”: 0,
    “CgroupParent”: “/kubepods/besteffort/pod004b72b4-a819-450a-81c6-fc5e5cf80194”,
    “BlkioWeight”: 0,
    “BlkioWeightDevice”: null,
    “BlkioDeviceReadBps”: null,
    “BlkioDeviceWriteBps”: null,
    “BlkioDeviceReadIOps”: null,
    “BlkioDeviceWriteIOps”: null,
    “CpuPeriod”: 100000,
    “CpuQuota”: 0,
    “CpuRealtimePeriod”: 0,
    “CpuRealtimeRuntime”: 0,
    “CpusetCpus”: “”,
    “CpusetMems”: “”,
    “Devices”: [],
    “DeviceCgroupRules”: null,
    “DeviceRequests”: null,
    “KernelMemory”: 0,
    “KernelMemoryTCP”: 0,
    “MemoryReservation”: 0,
    “MemorySwap”: 0,
    “MemorySwappiness”: null,
    “OomKillDisable”: false,
    “PidsLimit”: null,
    “Ulimits”: null,
    “CpuCount”: 0,
    “CpuPercent”: 0,
    “IOMaximumIOps”: 0,
    “IOMaximumBandwidth”: 0,
    “MaskedPaths”: [
    “/proc/acpi”,
    “/proc/kcore”,
    “/proc/keys”,
    “/proc/latency_stats”,
    “/proc/timer_list”,
    “/proc/timer_stats”,
    “/proc/sched_debug”,
    “/proc/scsi”,
    “/sys/firmware”
    ],
    “ReadonlyPaths”: [
    “/proc/asound”,
    “/proc/bus”,
    “/proc/fs”,
    “/proc/irq”,
    “/proc/sys”,
    “/proc/sysrq-trigger”
    ]
    },
    “GraphDriver”: {
    “Data”: {
    “LowerDir”: “/var/lib/docker/overlay2/1d0927acc94263b26f60b3964cd06e406fd89ca4ec19e943eaa96468b314a1ea-init/diff:/var/lib/docker/overlay2/8809c31a426fb7cfcc22538c7ccadf867f4ba4e1abf8ae4d5d86600205c29c36/diff:/var/lib/docker/overlay2/2f54d7f67f1d52898ccdeee683d40970782556063ffa1859a43a55eefe07d03d/diff:/var/lib/docker/overlay2/a81dcdb176095bac6cf729826526cbaec24971c3402d836ec667b36356ddd49d/diff:/var/lib/docker/overlay2/5e4a6ba6d959aab4a3523dc02ac3c05e6ad116288e2bacad32775c2f5a0ede6b/diff:/var/lib/docker/overlay2/e9c021af2a8c08718d866413423cd305348e2b502d882b080fdf37dd8f4bf901/diff:/var/lib/docker/overlay2/0ebb35d8a275e588ac11e22303b4a6fd6ef534624f2744cd73c67bec5a180c42/diff:/var/lib/docker/overlay2/7591839110bbf2a7c8701b51af776f1391e835769b0707b5bde3683a8fdbe191/diff:/var/lib/docker/overlay2/308988c7d45a97ea13a2abcfd333b4970d0b5e99f18b20c1f36a85a7b845674b/diff:/var/lib/docker/overlay2/f1dfe61d952fff8c6e06f07241383bfb1019b0a42658a25e71e16963247a7071/diff:/var/lib/docker/overlay2/7e055978fea5105859cd95439e6abe685d2aaf01dd766decf976829f40df076e/diff”,
    “MergedDir”: “/var/lib/docker/overlay2/1d0927acc94263b26f60b3964cd06e406fd89ca4ec19e943eaa96468b314a1ea/merged”,
    “UpperDir”: “/var/lib/docker/overlay2/1d0927acc94263b26f60b3964cd06e406fd89ca4ec19e943eaa96468b314a1ea/diff”,
    “WorkDir”: “/var/lib/docker/overlay2/1d0927acc94263b26f60b3964cd06e406fd89ca4ec19e943eaa96468b314a1ea/work”
    },
    “Name”: “overlay2”
    },
    “Mounts”: [
    {
    “Type”: “bind”,
    “Source”: “/var/lib/kubelet/pods/004b72b4-a819-450a-81c6-fc5e5cf80194/containers/cluster-register/c9da478b”,
    “Destination”: “/dev/termination-log”,
    “Mode”: “”,
    “RW”: true,
    “Propagation”: “rprivate”
    },
    {
    “Type”: “bind”,
    “Source”: “/var/lib/kubelet/pods/004b72b4-a819-450a-81c6-fc5e5cf80194/volumes/kubernetes.io~secret/cattle-credentials”,
    “Destination”: “/cattle-credentials”,
    “Mode”: “ro”,
    “RW”: false,
    “Propagation”: “rprivate”
    },
    {
    “Type”: “bind”,
    “Source”: “/var/lib/kubelet/pods/004b72b4-a819-450a-81c6-fc5e5cf80194/volumes/kubernetes.io~secret/cattle-token-sjgk5”,
    “Destination”: “/var/run/secrets/kubernetes.io/serviceaccount”,
    “Mode”: “ro”,
    “RW”: false,
    “Propagation”: “rprivate”
    },
    {
    “Type”: “bind”,
    “Source”: “/var/lib/kubelet/pods/004b72b4-a819-450a-81c6-fc5e5cf80194/etc-hosts”,
    “Destination”: “/etc/hosts”,
    “Mode”: “”,
    “RW”: true,
    “Propagation”: “rprivate”
    }
    ],
    “Config”: {
    “Hostname”: “rancher-test-01”,
    “Domainname”: “”,
    “User”: “0”,
    “AttachStdin”: false,
    “AttachStdout”: false,
    “AttachStderr”: false,
    “Tty”: false,
    “OpenStdin”: false,
    “StdinOnce”: false,
    “Env”: [
    “CATTLE_CA_CHECKSUM=6216ee904f2aee3983923907f85f118420d5d3102261ae81c76b9e753abf582d”,
    “CATTLE_CLUSTER=true”,
    “CATTLE_FEATURES=dashboard=true”,
    “CATTLE_K8S_MANAGED=true”,
    “CATTLE_SERVER=https://rancher-test-01”,
    “KUBERNETES_SERVICE_PORT_HTTPS=443”,
    “KUBERNETES_PORT=tcp://10.43.0.1:443”,
    “KUBERNETES_PORT_443_TCP=tcp://10.43.0.1:443”,
    “RANCHER_PORT=tcp://10.43.116.134:80”,
    “RANCHER_PORT_80_TCP_PORT=80”,
    “KUBERNETES_SERVICE_PORT=443”,
    “RANCHER_SERVICE_HOST=10.43.116.134”,
    “RANCHER_PORT_80_TCP_PROTO=tcp”,
    “KUBERNETES_PORT_443_TCP_PORT=443”,
    “KUBERNETES_PORT_443_TCP_ADDR=10.43.0.1”,
    “RANCHER_SERVICE_PORT=80”,
    “RANCHER_SERVICE_PORT_HTTP=80”,
    “RANCHER_PORT_80_TCP=tcp://10.43.116.134:80”,
    “KUBERNETES_SERVICE_HOST=10.43.0.1”,
    “KUBERNETES_PORT_443_TCP_PROTO=tcp”,
    “RANCHER_PORT_80_TCP_ADDR=10.43.116.134”,
    “PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”,
    “DOCKER_URL_amd64=https://get.docker.com/builds/Linux/x86_64/docker-1.10.3”,
    “DOCKER_URL_arm64=https://github.com/rancher/docker/releases/download/v1.10.3-ros1/docker-1.10.3_arm64”,
    “DOCKER_URL=DOCKER_URL_amd64”,
    “KUBECTL_VERSION=v1.17.6”,
    “LOGLEVEL_VERSION=v0.1.3”,
    “KUBEPROMPT_VERSION=v1.0.10”,
    “DOCKER_API_VERSION=1.24”,
    “AGENT_IMAGE=rancher/rancher-agent:v2.4.5”,
    “SSL_CERT_DIR=/etc/kubernetes/ssl/certs”
    ],
    “Cmd”: null,
    “Healthcheck”: {
    “Test”: [
    “NONE”
    ]
    },
    “Image”: “sha256:2e6c7ac4e07232fd68d06ed269472cc01b179b808e1f6f5c5af43a9cae3a2778”,
    “Volumes”: null,
    “WorkingDir”: “”,
    “Entrypoint”: [
    “run.sh”
    ],
    “OnBuild”: null,
    “Labels”: {
    “annotation.io.kubernetes.container.hash”: “4eebb640”,
    “annotation.io.kubernetes.container.restartCount”: “562”,
    “annotation.io.kubernetes.container.terminationMessagePath”: “/dev/termination-log”,
    “annotation.io.kubernetes.container.terminationMessagePolicy”: “File”,
    “annotation.io.kubernetes.pod.terminationGracePeriod”: “30”,
    “io.cattle.agent”: “true”,
    “io.kubernetes.container.logpath”: “/var/log/pods/cattle-system_cattle-cluster-agent-76f6664dbf-9h6jg_004b72b4-a819-450a-81c6-fc5e5cf80194/cluster-register/562.log”,
    “io.kubernetes.container.name”: “cluster-register”,
    “io.kubernetes.docker.type”: “container”,
    “io.kubernetes.pod.name”: “cattle-cluster-agent-76f6664dbf-9h6jg”,
    “io.kubernetes.pod.namespace”: “cattle-system”,
    “io.kubernetes.pod.uid”: “004b72b4-a819-450a-81c6-fc5e5cf80194”,
    “io.kubernetes.sandbox.id”: “47c6c2a7a47d230c253f7fcae6cb97e247fb20e958553c638a725ac2e7682263”,
    “org.label-schema.build-date”: “2020-06-20T01:39:56Z”,
    “org.label-schema.schema-version”: “1.0”,
    “org.label-schema.vcs-ref”: “c642f1caa0ba4be2a27ebec1e1a0a6f3e6a482d5”,
    “org.label-schema.vcs-url”: “https://github.com/rancher/rancher.git
    }
    },
    “NetworkSettings”: {
    “Bridge”: “”,
    “SandboxID”: “”,
    “HairpinMode”: false,
    “LinkLocalIPv6Address”: “”,
    “LinkLocalIPv6PrefixLen”: 0,
    “Ports”: {},
    “SandboxKey”: “”,
    “SecondaryIPAddresses”: null,
    “SecondaryIPv6Addresses”: null,
    “EndpointID”: “”,
    “Gateway”: “”,
    “GlobalIPv6Address”: “”,
    “GlobalIPv6PrefixLen”: 0,
    “IPAddress”: “”,
    “IPPrefixLen”: 0,
    “IPv6Gateway”: “”,
    “MacAddress”: “”,
    “Networks”: {}
    }
    }
    ]

Same problem here
time=“2020-07-08T06:56:45Z” level=fatal msg=“steve exited: listen tcp 127.0.0.1:6443: bind: address already in use”

Dear developer,

We have also meet the same problem.
The version is:
“KUBECTL_VERSION=v1.17.6”,
“LOGLEVEL_VERSION=v0.1.3”,
“KUBEPROMPT_VERSION=v1.0.10”,
“DOCKER_API_VERSION=1.24”,
“AGENT_IMAGE=rancher/rancher-agent:v2.4.5”,

Do you have any idea about the root cause and how to solve it?

Thanks!

@all
It seems that I find the reason, in the following commit

The port is changed from 8443 to 6443, and it is hardcoded (pkg/agent/steve/steve.go).
image

So the version released after this commit should have the same problem.

@ibuildthecloud can you help to check if this is the root cause?
Thanks!

we have same problem…
Trace[1987427276]: [36.17856718s] [36.17856718s] Objects listed

8.09.2020 19:48:19 I0908 17:48:19.255608 1 trace.go:116] Trace[892820556]: “Reflector ListAndWatch” name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-09-08 17:47:40.312043551 +0000 UTC m=+38.716087818) (total time: 38.943512313s):

8.09.2020 19:48:19 Trace[892820556]: [38.94070566s] [38.94070566s] Objects listed

8.09.2020 19:48:20 I0908 17:48:20.007100 1 trace.go:116] Trace[1936230580]: “Reflector ListAndWatch” name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-09-08 17:47:40.238731172 +0000 UTC m=+38.642775339) (total time: 39.768331334s):

8.09.2020 19:48:20 Trace[1936230580]: [39.762462724s] [39.762462724s] Objects listed

8.09.2020 19:48:32 I0908 17:48:32.316815 1 trace.go:116] Trace[552022175]: “Reflector ListAndWatch” name:github.com/rancher/steve/pkg/clustercache/controller.go:164 (started: 2020-09-08 17:47:40.252903138 +0000 UTC m=+38.656947305) (total time: 52.063836427s):

8.09.2020 19:48:32 Trace[552022175]: [52.024943395s] [52.024943395s] Objects listed

8.09.2020 19:48:32 time=“2020-09-08T17:48:32Z” level=fatal msg=“steve exited: listen tcp 127.0.0.1:6443: bind: address already in use”

we have the same problem. this problem occurred suddenly.

The simplest workaround would be to disable the Dashboard feature in settings. I’ll look and see why the agent seems to be double binding the same port (I suspect a TCP disconnect is causing it).

1 Like

THX its working!!! this solution works!!