Hello,
I encountered SSH error on RHEL 8.5. The same configuration was tested on RHEL 7.6 and it worked smoothly. The problem only occurred on RHEL 8
RKE version: v1.2.19
Docker version: (docker version
,docker info
preferred) Docker version 20.10.12, build e91ed57
Operating system and kernel: (cat /etc/os-release
, uname -r
preferred)
NAME="Red Hat Enterprise Linux"
VERSION="8.5 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.5"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.5 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.5
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.5"
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) VMware vSphere
cluster.yml file:
# If you intened to deploy Kubernetes in an air-gapped environment,
# please consult the documentation on how to configure custom RKE images.
nodes:
- address: machineadd
port: "22"
internal_address: ""
role:
- controlplane
- worker
- etcd
hostname_override: ""
user: docker
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: /home/docker/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
taints: []
services:
etcd:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
win_extra_args: {}
win_extra_binds: []
win_extra_env: []
external_urls: []
ca_cert: ""
cert: ""
key: ""
path: ""
uid: 0
gid: 0
snapshot: null
retention: ""
creation: ""
backup_config: null
kube-api:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
win_extra_args: {}
win_extra_binds: []
win_extra_env: []
service_cluster_ip_range: 10.43.0.0/16
service_node_port_range: ""
pod_security_policy: false
always_pull_images: false
secrets_encryption_config: null
audit_log: null
admission_configuration: null
event_rate_limit: null
kube-controller:
image: ""
# extra_args: {}
# Source : https://rancher.com/docs/rke/latest/en/os/#flatcar-container-linux
extra_args:
flex-volume-plugin-dir: /opt/kubernetes/kubelet-plugins/volume/exec/
extra_binds: []
extra_env: []
win_extra_args: {}
win_extra_binds: []
win_extra_env: []
cluster_cidr: 10.42.0.0/16
service_cluster_ip_range: 10.43.0.0/16
scheduler:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
win_extra_args: {}
win_extra_binds: []
win_extra_env: []
kubelet:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
win_extra_args: {}
win_extra_binds: []
win_extra_env: []
cluster_domain: cluster.local
infra_container_image: ""
cluster_dns_server: 10.43.0.10
fail_swap_on: false
generate_serving_certificate: false
kubeproxy:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
win_extra_args: {}
win_extra_binds: []
win_extra_env: []
network:
plugin: calico
# options: {}
# Source : https://rancher.com/docs/rke/latest/en/os/#flatcar-container-linux
options:
calico_flex_volume_plugin_dir: /opt/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
flannel_backend_type: vxlan
mtu: 0
node_selector: {}
update_strategy: null
tolerations: []
authentication:
strategy: x509
sans: []
webhook: null
addons: ""
addons_include: []
# Do not configure system_image because we already configure private_registries.
# Source : https://github.com/rancher/rke/issues/2720#issuecomment-950768397
# system_images:
# etcd: rancher/mirrored-coreos-etcd:v3.4.15-rancher1
# alpine: rancher/rke-tools:v0.1.80
# nginx_proxy: rancher/rke-tools:v0.1.80
# cert_downloader: rancher/rke-tools:v0.1.80
# kubernetes_services_sidecar: rancher/rke-tools:v0.1.80
# kubedns: rancher/mirrored-k8s-dns-kube-dns:1.15.10
# dnsmasq: rancher/mirrored-k8s-dns-dnsmasq-nanny:1.15.10
# kubedns_sidecar: rancher/mirrored-k8s-dns-sidecar:1.15.10
# kubedns_autoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.1
# coredns: rancher/mirrored-coredns-coredns:1.8.0
# coredns_autoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.1
# nodelocal: rancher/mirrored-k8s-dns-node-cache:1.15.13
# kubernetes: rancher/hyperkube:v1.20.15-rancher1
# flannel: rancher/mirrored-coreos-flannel:v0.15.1
# flannel_cni: rancher/flannel-cni:v0.3.0-rancher6
# calico_node: rancher/mirrored-calico-node:v3.17.2
# calico_cni: rancher/mirrored-calico-cni:v3.17.2
# calico_controllers: rancher/mirrored-calico-kube-controllers:v3.17.2
# calico_ctl: rancher/mirrored-calico-ctl:v3.17.2
# calico_flexvol: rancher/mirrored-calico-pod2daemon-flexvol:v3.17.2
# canal_node: rancher/mirrored-calico-node:v3.17.2
# canal_cni: rancher/mirrored-calico-cni:v3.17.2
# canal_controllers: rancher/mirrored-calico-kube-controllers:v3.17.2
# canal_flannel: rancher/mirrored-coreos-flannel:v0.15.1
# canal_flexvol: rancher/mirrored-calico-pod2daemon-flexvol:v3.17.2
# weave_node: weaveworks/weave-kube:2.8.1
# weave_cni: weaveworks/weave-npc:2.8.1
# pod_infra_container: rancher/mirrored-pause:3.6
# ingress: rancher/nginx-ingress-controller:nginx-1.1.0-rancher1
# ingress_backend: rancher/mirrored-nginx-ingress-controller-defaultbackend:1.5-rancher1
# ingress_webhook: rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1
# metrics_server: rancher/mirrored-metrics-server:v0.5.0
# windows_pod_infra_container: rancher/mirrored-pause:3.6
# aci_cni_deploy_container: noiro/cnideploy:5.1.1.0.1ae238a
# aci_host_container: noiro/aci-containers-host:5.1.1.0.1ae238a
# aci_opflex_container: noiro/opflex:5.1.1.0.1ae238a
# aci_mcast_container: noiro/opflex:5.1.1.0.1ae238a
# aci_ovs_container: noiro/openvswitch:5.1.1.0.1ae238a
# aci_controller_container: noiro/aci-containers-controller:5.1.1.0.1ae238a
# aci_gbp_server_container: noiro/gbp-server:5.1.1.0.1ae238a
# aci_opflex_server_container: noiro/opflex-server:5.1.1.0.1ae238a
ssh_key_path: /home/docker/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
mode: rbac
options: {}
ignore_docker_version: null
kubernetes_version: ""
# private_registries: []
# Source : https://rancher.com/docs/rke/latest/en/config-options/private-registries/#default-registry
private_registries:
- url: "dockerproxy.company"
is_default: true # All system images will be pulled using this registry.
ingress:
provider: ""
options: {}
node_selector: {}
extra_args: {}
dns_policy: ""
extra_envs: []
extra_volumes: []
extra_volume_mounts: []
update_strategy: null
http_port: 0
https_port: 0
network_mode: ""
tolerations: []
default_backend: null
default_http_backend_priority_class_name: ""
nginx_ingress_controller_priority_class_name: ""
cluster_name: ""
cloud_provider:
name: ""
prefix_path: ""
win_prefix_path: ""
addon_job_timeout: 0
bastion_host:
address: ""
port: ""
user: ""
ssh_key: ""
ssh_key_path: ""
ssh_cert: ""
ssh_cert_path: ""
monitoring:
provider: ""
options: {}
node_selector: {}
update_strategy: null
replicas: null
tolerations: []
metrics_server_priority_class_name: ""
restore:
restore: false
snapshot_name: ""
rotate_encryption_key: false
dns: null
Steps to Reproduce:
Firstly, log in as user other than root but having capacity to access docker socket (being able to run docker ps
successfully).
Here, I tried to instance a cluster on the node where I am on, as if the target was a remote machine.
# login as user 'docker'
ssh docker@machine
# make sure that id_rsa.pub is in authorized_keys
echo $(cat ~/.ssh/id_rsa.pub) > ~/.ssh/id_rsa.pub
# At this point, I can do ssh docker@localhost without entering any password
# on the folder where cluster.yml is located
rke up
Results:
[docker@mymachine terraform]$ rke -d up
DEBU[0000] Loglevel set to [debug]
INFO[0000] Running RKE version: v1.2.19
DEBU[0000] audit log policy found in cluster.yml
INFO[0000] Initiating Kubernetes cluster
DEBU[0000] metadataInitialized: [False] []
DEBU[0000] Loading data.json from local source
DEBU[0000] data.json SHA256 checksum: 74664a6ce625a6aeaef8183de2f65f289cd752a80103768c7d2d4359ac423172
DEBU[0000] metadata initialized successfully
DEBU[0000] metadataInitialized: [true] []
DEBU[0000] No DNS provider configured, setting default based on cluster version [1.20.15-rancher1-2]
DEBU[0000] DNS provider set to [coredns]
DEBU[0000] Checking if cluster version [1.20.15-rancher1-2] needs to have kube-api audit log enabled
DEBU[0000] Cluster version [1.20.15-rancher1-2] needs to have kube-api audit log enabled
DEBU[0000] Enabling kube-api audit log for cluster version [v1.20.15-rancher1-2]
DEBU[0000] No input provided for maxUnavailableWorker, setting it to default value of 10 percent
DEBU[0000] No input provided for maxUnavailableControlplane, setting it to default value of 1
DEBU[0000] Host: mymachine.fqdn has role: controlplane
DEBU[0000] Host: mymachine.fqdn has role: worker
DEBU[0000] Host: mymachine.fqdn has role: etcd
DEBU[0000] [state] previous state not found, possible legacy cluster
INFO[0000] [dialer] Setup tunnel for host [mymachine.fqdn]
DEBU[0000] Connecting to Docker API for host [mymachine.fqdn]
DEBU[0000] FIXME: Got an status-code for which error does not match any expected type!!!: -1 module=api status_code=-1
WARN[0000] Failed to set up SSH tunneling for host [mymachine.fqdn]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed)
WARN[0000] Removing host [mymachine.fqdn] from node lists
WARN[0000] [state] can't fetch legacy cluster state from Kubernetes: Cluster must have at least one etcd plane host: failed to connect
to the following etcd host(s) [mymachine.fqdn]
INFO[0000] [certificates] Generating CA kubernetes certificates
INFO[0000] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates
INFO[0000] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
INFO[0000] [certificates] Generating Kubernetes API server certificates
INFO[0000] [certificates] Generating Service account token key
INFO[0000] [certificates] Generating Kube Controller certificates
INFO[0000] [certificates] Generating Kube Scheduler certificates
INFO[0001] [certificates] Generating Kube Proxy certificates
INFO[0001] [certificates] Generating Node certificate
INFO[0001] [certificates] Generating admin certificates and kubeconfig
INFO[0001] [certificates] Generating Kubernetes API server proxy client certificates
INFO[0001] Successfully Deployed state file at [./cluster.rkestate]
DEBU[0001] Checking if cluster version [1.20.15-rancher1-2] needs to have kube-api audit log enabled
DEBU[0001] Cluster version [1.20.15-rancher1-2] needs to have kube-api audit log enabled
DEBU[0001] Enabling kube-api audit log for cluster version [v1.20.15-rancher1-2]
INFO[0001] Building Kubernetes cluster
FATA[0001] Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config
However, note that the user ‘docker’ has access to docker socket and it can connect as docker onto the target machine (the same) with its public key (which means ssh private key is added).
# docker command OK
[docker@mymachine terraform]$ ll /var/run/docker.sock
srw-rw----. 1 root docker 0 Apr 14 16:44 /var/run/docker.sock
[docker@mymachine terraform]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# SSH ok
[docker@mymachine terraform]$ ssh -i ~/.ssh/id_rsa docker@mymachine.fqdn
Last login: Wed Apr 20 17:50:47 2022 from 1.2.3.4
[docker@mymachine ~]$
Could somebody give some hints ? I have no idea which part of my OS conf causes this. sshd_config seems ok to me since I can connect via SSH with public key. It seems like rke
command does not take into account properly sshd_config.
Thank you in advance for your help.
Regards,
Rahenda