Failed calling webhook (cert-manager)

Hi ranchers !

I’ve started recently setting up my HA RKE cluster, and got stuck in nearly last step :frowning_face: . The issue is related to topic below , but there was no explanation in that topic of how to resolve the problem.

Briefly, when I was installing rancher on RKE cluster, an error popped out
Error: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook .cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.189.166:443: i/o timeout

at
helm install rancher rancher-latest/rancher \ --namespace cattle-system \ --set hostname=rancher.my.org \ --set ingress.tls.source=letsEncrypt \ --set letsEncrypt.email=me@example.org
I tried to resolve the issue, but no luck, can’t understand what did I do wrong. As a note, instead of following outdated cert-manager installation process at rancher docs, I used the one in cert manager itself.

Here is my cluster.yml:
`
nodes:
- address: 65.21.7.132
internal_address: 10.0.3.2
hostname_override: rke-node-1
user: root
role:
- controlplane
- etcd
- worker
- address: 65.21.1.94
internal_address: 10.0.3.1
hostname_override: rke-node-2
user: root
role:
- controlplane
- etcd
- worker
- address: 95.217.221.160
internal_address: 10.0.1.1
hostname_override: rke-node-3
user: root
role:
- controlplane
- etcd
- worker

If set to true, RKE will not fail when unsupported Docker version

are found

ignore_docker_version: false

Cluster level SSH private key

Used if no ssh information is set for the node

ssh_key_path: ~/.ssh/id_rsa

Enable use of SSH agent to use SSH private keys with passphrase

This requires the environment SSH_AUTH_SOCK configured pointing

#to your SSH agent which has the private key added

ssh_agent_auth: true

List of registry credentials

If you are using a Docker Hub registry, you can omit the url

or set it to docker.io

is_default set to true will override the system default

registry set in the global settings

private_registries:
# - url: registry.com
# user: Username
# password: password
# is_default: true

Bastion/Jump host configuration

bastion_host:
# address: x.x.x.x
# user: ubuntu
# port: 22
# ssh_key_path: /home/user/.ssh/bastion_rsa

or

ssh_key: |-

-----BEGIN RSA PRIVATE KEY-----

-----END RSA PRIVATE KEY-----

Set the name of the Kubernetes cluster

cluster_name: octollenge-dev-hel

kubernetes_version: v1.17.17-rancher1-1

system_images:
# kubernetes: rancher/hyperkube:v1.10.3-rancher2
# etcd: rancher/coreos-etcd:v3.1.12
# alpine: rancher/rke-tools:v0.1.9
# nginx_proxy: rancher/rke-tools:v0.1.9
# cert_downloader: rancher/rke-tools:v0.1.9
# kubernetes_services_sidecar: rancher/rke-tools:v0.1.9
# kubedns: rancher/k8s-dns-kube-dns-amd64:1.14.8
# dnsmasq: rancher/k8s-dns-dnsmasq-nanny-amd64:1.14.8
# kubedns_sidecar: rancher/k8s-dns-sidecar-amd64:1.14.8
# kubedns_autoscaler: rancher/cluster-proportional-autoscaler-amd64:1.0.0
# pod_infra_container: rancher/pause-amd64:3.1

services:
etcd:
kube_api:
# IP range for any services created on Kubernetes
# This must match the service_cluster_ip_range in kube-controller
service_cluster_ip_range: 10.43.0.0/16
# Expose a different port range for NodePort services
service_node_port_range: 30000-32767
pod_security_policy: false
# Add additional arguments to the kubernetes API server
# This WILL OVERRIDE any existing defaults
extra_args:
# Enable audit log to stdout
audit-log-path: “-”
# Increase number of delete workers
delete-collection-workers: 3
# Set the level of log output to debug-level
v: 4
# Note for Rancher 2 users: If you are configuring Cluster Options
# using a Config File when creating Rancher Launched Kubernetes,
# the names of services should contain underscores only:
# kube_controller. This only applies to Rancher v2.0.5 and v2.0.6.
kube_controller:
# CIDR pool used to assign IP addresses to pods in the cluster
cluster_cidr: 10.42.0.0/16
# IP range for any services created on Kubernetes
# This must match the service_cluster_ip_range in kube-api
service_cluster_ip_range: 10.43.0.0/16
kubelet:
# Base domain for the cluster
cluster_domain: cluster.local
# IP address for the DNS service endpoint
cluster_dns_server: 10.43.0.10
# Fail if swap is on
fail_swap_on: false
generate_serving_certificate: true
extra_args:
max-pods: 150
enforce-node-allocatable: “pods”
# as only the pods rule is enforced above, system+kube reservation is zeroed for 8-1=7 allocatable memory
system-reserved: “cpu=300m,memory=5Mi,ephemeral-storage=1Gi”
kube-reserved: “cpu=200m,memory=5Mi,ephemeral-storage=1Gi”
eviction-hard: “memory.available<1Gi,nodefs.available<10%”
# Optionally define additional volume binds to a service
extra_binds:
- “/usr/libexec/kubernetes/kubelet-plugins:/usr/libexec/kubernetes/kubelet-plugins”

Currently, only authentication strategy supported is x509.

You can optionally create additional SANs (hostnames or IPs) to

add to the API server PKI certificate.

This is useful if you want to use a load balancer for the

control plane servers.

authentication:
strategy: x509
sans:
- “10.0.0.7”
- “167.233.14.241”

Kubernetes Authorization mode

Use mode: rbac to enable RBAC

Use mode: none to disable authorization

authorization:
mode: rbac

If you want to set a Kubernetes cloud provider, you specify

the name and configuration

cloud_provider:
# name: aws

Add-ons are deployed using kubernetes jobs. RKE will give

up on trying to get the job status after this timeout in seconds…

addon_job_timeout: 60

Specify network plugin-in (canal, calico, flannel, weave, or none)

network:
plugin: canal

Specify DNS provider (coredns or kube-dns)

dns:
provider: coredns

Currently only nginx ingress provider is supported.

To disable ingress controller, set provider: none

node_selector controls ingress placement and is optional

ingress:
provider: nginx
default_backend: true
# node_selector:
# app: ingress

All add-on manifests MUST specify a namespace

addons:

addons: |-

apiVersion: v1

kind: Pod

metadata:

name: my-nginx

namespace: default

spec:

containers:

- name: my-nginx

image: nginx

ports:

- containerPort: 80

addons_include:
`

I have the same issue.
I recreated the RKE cluster and changed network from “canal” to “calico”. Then this issue did not happen again.

1 Like