Kubelet in restart loop when I change cloud provider to aws

Hello,

I am trying to get a simple Rancher/Kubernetes environment set up within my AWS instance.

I had a simple environment running with the Rancher server (v1.6.10) and a host running in AWS. I changed the Kubernetes cloud provider from ‘rancher’ to ‘aws’ to take advantage of ELB and EBS but since that I have ran into a number of issues.

I did a fresh install of rancher, set ‘aws’ as the cloud provider in Kubernetes, and provisioned a new host (ubuntu) ec2 via the console. The host started without issue however the Kubernetes Infrastructure Stack does not start.

The process kubelet and Kubernetes switches between Unhealthy state and active. I have drilled into the logs and I see the following:

01/12/2017 16:45:09+ exec kubelet --kubeconfig=/etc/kubernetes/ssl/kubeconfig --api_servers=https://kubernetes.kubernetes.rancher.internal:6443 --allow-privileged=true --register-node=true --cloud-provider=aws --healthz-bind-address=0.0.0.0 --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --network-plugin=cni --cni-conf-dir=/etc/cni/managed.d --pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 --cgroup-driver=cgroupfs --hostname-override lab01-01
01/12/2017 16:45:11Flag --api-servers has been deprecated, Use --kubeconfig instead. Will be removed in a future version.
01/12/2017 16:45:11I1201 05:45:11.036835 31191 feature_gate.go:144] feature gates: map[]
01/12/2017 16:45:11I1201 05:45:11.037051 31191 aws.go:806] Building AWS cloudprovider
01/12/2017 16:45:11I1201 05:45:11.037125 31191 aws.go:769] Zone not specified in configuration file; querying AWS metadata service
01/12/2017 16:45:11error: failed to run Kubelet: could not init cloud provider “aws”: error finding instance i-094f9d182903958df: error listing AWS instances: NoCredentialProviders: no valid providers in chain. Deprecated.
01/12/2017 16:45:11 For verbose messaging see aws.Config.CredentialsChainVerboseErrors

From digging around I see that this was a known issue https://github.com/rancher/rancher/issues/7829 and resolved, but unfortunately not for me. From following the recommendations I believe my IAM on the ec2 has the correct privileges

{
“Action”: “ec2:",
“Effect”: “Allow”,
“Resource”: "

},
{
“Effect”: “Allow”,
“Action”: “elasticloadbalancing:",
“Resource”: "

},
{
“Effect”: “Allow”,
“Action”: “cloudwatch:",
“Resource”: "

},
{
“Effect”: “Allow”,
“Action”: “autoscaling:",
“Resource”: "

},

I am unsure if the issue is something to do with the AWS VPC I am deploying the host into. I did not create the VPC.

My next step is to start build a new VPC but before I do that, can anybody suggest anything?

I only see this when the IAM roles are not properly configured. Are they added to the host? And according to the docs (http://rancher.com/docs/rancher/v1.6/en/kubernetes/providers/), this is the minimal IAM role for using aws cloudprovider.

{
  "Effect": "Allow",
  "Action": "ec2:Describe*",
  "Resource": "*"
}

Thanks Seb.

With your advice I have managed to resolve the issue by

1 - Creating an IAM profile

  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::kubernetes-*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": "ec2:Describe*",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "ec2:AttachVolume",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "ec2:DetachVolume",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": ["ec2:*"],
      "Resource": ["*"]
    },
    {
      "Effect": "Allow",
      "Action": ["elasticloadbalancing:*"],
      "Resource": ["*"]
    },
    {
        "Effect": "Allow",
        "Action": [
            "route53:ChangeResourceRecordSets",
            "route53:GetChange",
            "route53:GetHostedZone",
            "route53:ListHostedZones",
            "route53:ListHostedZonesByName",
            "route53:ListResourceRecordSets"
        ],
        "Resource": [
            "*"
        ]
    },
    {
        "Effect": "Allow",
        "Action": [
            "acm:DescribeCertificate"
        ],
        "Resource": "*"
    }
  ]
}

2 - Also creating the node manually, and not via the Rancher Admin UI, and then adding via the custom method when adding a node.

Cheers