Hello everyone,
i have been trying to build a basic kubernetes cluster using rancher on AWS.
I can deploy on host fine but as soon as i add second hosts i seems to have troubles with etcd and system containers keep on restarting. I’m not quite sure what logs you would need to give me a hand but this is what i find for etcd :
20/09/2016 12:13:09Get http://10.42.239.235:2379/health: dial tcp 10.42.239.235:2379: getsockopt: no route to host
20/09/2016 12:13:27Get http://10.42.239.235:2379/health: dial tcp 10.42.239.235:2379: getsockopt: no route to host
20/09/2016 12:13:45Get http://10.42.239.235:2379/health: dial tcp 10.42.239.235:2379: getsockopt: no route to host
20/09/2016 12:14:03Get http://10.42.239.235:2379/health: dial tcp 10.42.239.235:2379: getsockopt: no route to host
20/09/2016 12:14:21Get http://10.42.239.235:2379/health: dial tcp 10.42.239.235:2379: getsockopt: no route to host
this is what i have in the 10.42.239.235 logs :
20/09/2016 12:02:47++ giddyup service scale etcd
20/09/2016 12:02:47+ SCALE=3
20/09/2016 12:02:47++ giddyup ip myip
20/09/2016 12:02:47+ IP=10.42.239.235
20/09/2016 12:02:47+ META_URL=http://rancher-metadata.rancher.internal/2015-12-19
20/09/2016 12:02:47++ wget -q -O - http://rancher-metadata.rancher.internal/2015-12-19/self/stack/name
20/09/2016 12:02:47+ STACK_NAME=Kubernetes
20/09/2016 12:02:47++ wget -q -O - http://rancher-metadata.rancher.internal/2015-12-19/self/container/create_index
20/09/2016 12:02:47+ CREATE_INDEX=34
20/09/2016 12:02:47++ wget -q -O - http://rancher-metadata.rancher.internal/2015-12-19/self/container/service_index
20/09/2016 12:02:47+ SERVICE_INDEX=1
20/09/2016 12:02:47++ wget -q -O - http://rancher-metadata.rancher.internal/2015-12-19/self/host/uuid
20/09/2016 12:02:47+ HOST_UUID=ec5e47d2-9345-44c9-a863-83849ae01dcb
20/09/2016 12:02:47+ LEGACY_DATA_DIR=/data
20/09/2016 12:02:47+ DATA_DIR=/pdata
20/09/2016 12:02:47+ DR_FLAG=/pdata/DR
20/09/2016 12:02:47+ export ETCD_DATA_DIR=/pdata/data.current
20/09/2016 12:02:47+ ETCD_DATA_DIR=/pdata/data.current
20/09/2016 12:02:47+ export ETCDCTL_ENDPOINT=http://etcd.Kubernetes:2379
20/09/2016 12:02:47+ ETCDCTL_ENDPOINT=http://etcd.Kubernetes:2379
20/09/2016 12:02:47++ tr . -
20/09/2016 12:02:47++ echo 10.42.239.235
20/09/2016 12:02:47+ NAME=10-42-239-235
20/09/2016 12:02:47+ '[' 1 -eq 0 ']'
20/09/2016 12:02:47+ eval node
20/09/2016 12:02:47++ node
20/09/2016 12:02:47++ mkdir -p /pdata/data.current
20/09/2016 12:02:47++ '[' -d /data/member ']'
20/09/2016 12:02:47++ '[' -d /data/data.current ']'
20/09/2016 12:02:47++ '[' -f /pdata/DR ']'
20/09/2016 12:02:47++ '[' -d /pdata/data.current/member ']'
20/09/2016 12:02:47+++ cat /pdata/data.current/ip
20/09/2016 12:02:47++ '[' 10.42.239.235 == 10.42.239.235 ']'
20/09/2016 12:02:47++ restart_node
20/09/2016 12:02:47++ ++ healthcheck_proxyrolling_backup
20/09/2016 12:02:47
20/09/2016 12:02:47++ ++ etcd --name 10-42-239-235 --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://10.42.239.235:2379 --listen-peer-urls http://0.0.0.0:2380 --initial-advertise-peer-urls http://10.42.239.235:2380 --initial-cluster-state existing
20/09/2016 12:02:47++ WAIT=60s
20/09/2016 12:02:47++ etcdwrapper healthcheck-proxy --port=:2378 --wait=60s --debug=false
20/09/2016 12:02:47EMBEDDED_BACKUPS=true
20/09/2016 12:02:47++ '[' true == true ']'
20/09/2016 12:02:47++ BACKUP_PERIOD=15m
20/09/2016 12:02:47++ BACKUP_RETENTION=24h
20/09/2016 12:02:47++ giddyup leader elect --proxy-tcp-port=2160 etcdwrapper rolling-backup --period=15m --retention=24h --index=1
20/09/2016 12:02:47time="2016-09-20T10:02:47Z" level=info msg="Listening on 0.0.0.0:2160"
20/09/2016 12:02:47time="2016-09-20T10:02:47Z" level=info msg="Forwarding setup to: :2160"
20/09/2016 12:02:482016-09-20 10:02:48.049350 I | flags: recognized and used environment variable ETCD_DATA_DIR=/pdata/data.current
20/09/2016 12:02:482016-09-20 10:02:48.049574 I | etcdmain: etcd Version: 2.3.7
20/09/2016 12:02:482016-09-20 10:02:48.049632 I | etcdmain: Git SHA: fd17c91
20/09/2016 12:02:482016-09-20 10:02:48.049652 I | etcdmain: Go Version: go1.6.2
20/09/2016 12:02:482016-09-20 10:02:48.049686 I | etcdmain: Go OS/Arch: linux/amd64
20/09/2016 12:02:482016-09-20 10:02:48.049698 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
20/09/2016 12:02:482016-09-20 10:02:48.049738 W | etcdmain: found invalid file/dir ip under data dir /pdata/data.current (Ignore this if you are upgrading etcd)
20/09/2016 12:02:482016-09-20 10:02:48.049753 N | etcdmain: the server is already initialized as member before, starting as etcd member...
20/09/2016 12:02:482016-09-20 10:02:48.049845 I | etcdmain: listening for peers on http://0.0.0.0:2380
20/09/2016 12:02:482016-09-20 10:02:48.049901 I | etcdmain: listening for client requests on http://0.0.0.0:2379
20/09/2016 12:02:482016-09-20 10:02:48.207867 I | etcdserver: recovered store from snapshot at index 50005
20/09/2016 12:02:482016-09-20 10:02:48.207893 I | etcdserver: name = 10-42-239-235
20/09/2016 12:02:482016-09-20 10:02:48.207899 I | etcdserver: data dir = /pdata/data.current
20/09/2016 12:02:482016-09-20 10:02:48.207905 I | etcdserver: member dir = /pdata/data.current/member
20/09/2016 12:02:482016-09-20 10:02:48.207909 I | etcdserver: heartbeat = 100ms
20/09/2016 12:02:482016-09-20 10:02:48.207913 I | etcdserver: election = 1000ms
20/09/2016 12:02:482016-09-20 10:02:48.207916 I | etcdserver: snapshot count = 10000
20/09/2016 12:02:482016-09-20 10:02:48.207930 I | etcdserver: advertise client URLs = http://10.42.239.235:2379
20/09/2016 12:02:482016-09-20 10:02:48.435244 I | etcdserver: restarting member a113af6263612296 in cluster 758a82db1924ffd2 at commit index 57370
20/09/2016 12:02:482016-09-20 10:02:48.435534 I | raft: a113af6263612296 became follower at term 2
20/09/2016 12:02:482016-09-20 10:02:48.435556 I | raft: newRaft a113af6263612296 [peers: [a113af6263612296], term: 2, commit: 57370, applied: 50005, lastindex: 57370, lastterm: 2]
20/09/2016 12:02:482016-09-20 10:02:48.438018 I | etcdserver: added member a113af6263612296 [http://10.42.239.235:2380] to cluster 758a82db1924ffd2 from store
20/09/2016 12:02:482016-09-20 10:02:48.438040 I | etcdserver: set the cluster version to 2.3 from store
20/09/2016 12:02:482016-09-20 10:02:48.438218 I | etcdserver: starting server... [version: 2.3.7, cluster version: 2.3]
20/09/2016 12:02:48time="2016-09-20T10:02:48Z" level=info msg="Initializing Rolling Backups" period=15m0s retention=24h0m0s
20/09/2016 12:02:492016-09-20 10:02:49.138421 I | raft: a113af6263612296 is starting a new election at term 2
20/09/2016 12:02:492016-09-20 10:02:49.138458 I | raft: a113af6263612296 became candidate at term 3
20/09/2016 12:02:492016-09-20 10:02:49.138465 I | raft: a113af6263612296 received vote from a113af6263612296 at term 3
20/09/2016 12:02:492016-09-20 10:02:49.138599 I | raft: a113af6263612296 became leader at term 3
20/09/2016 12:02:492016-09-20 10:02:49.138618 I | raft: raft.node: a113af6263612296 elected leader a113af6263612296 at term 3
20/09/2016 12:02:492016-09-20 10:02:49.139075 I | etcdserver: published {Name:10-42-239-235 ClientURLs:[http://10.42.239.235:2379]} to cluster 758a82db1924ffd2
20/09/2016 12:17:49time="2016-09-20T10:17:49Z" level=info msg="Created backup" name="2016-09-20T10:17:48Z_etcd_1" runtime=426.536346ms
the hosts are in the rancher created security groups, i even tried to manually add them in the default aws security group but no luck.
thanks alot !