Up until yesterday I was able to deploy stacks both through rancher-compose
and through the UI (by uploading my docker-compose.yml
and rancher-compose.yml
configurations).
I am running rancher server v1.6.6 on the same server as a single host (running docker v17.03.1-ce). Shortly after running rancher-compose up -d
, I observe the following logs for the rancher server (through docker logs
):
time="2017-08-11T15:32:38Z" level=info msg="Stack Create Event Received" eventId=5834314d-73fe-42e1-a2b9-f0df5f53965b resourceId=1st104
time="2017-08-11T15:32:38Z" level=info msg="Stack Create Event Done" eventId=5834314d-73fe-42e1-a2b9-f0df5f53965b resourceId=1st104
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007fda911ea3de, pid=9, tid=140575876556544
#
# JRE version: OpenJDK Runtime Environment (8.0_72-b15) (build 1.8.0_72-b15)
# Java VM: OpenJDK 64-Bit Server VM (25.72-b15 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V [libjvm.so+0x8d33de] PerfLongVariant::sample()+0x1e
#
# Core dump written. Default location: /var/lib/cattle/core or core.9
#
# An error report file with more information is saved as:
# /var/lib/cattle/hs_err_pid9.log
#
# If you would like to submit a bug report, please visit:
# http://www.azulsystems.com/support/
#
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60908->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60906->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60888->[::1]:8081: read: connection reset by peer
time="2017-08-11T15:33:27Z" level=fatal msg="Exiting go-machine-service: Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [] from [http://localhost:8080/v2-beta]" service=gms
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60892->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60890->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60880->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60866->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60850->[::1]:8081: read: connection reset by peer
time="2017-08-11T15:33:27Z" level=fatal msg="Unable to create event router" error="Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [] from [http://localhost:8080/v2-beta]"
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60840->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60790->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: dial tcp [::1]:8081: getsockopt: connection refused
time="2017-08-11T15:33:27Z" level=info msg="Shutting down backend e2830dda-9ef9-4e5a-695e-09a9d73ebeb5. Connection closed because: websocket: close 1006 unexpected EOF."
time="2017-08-11T15:33:27Z" level=info msg="Removed backend. Key: e2830dda-9ef9-4e5a-695e-09a9d73ebeb5. Session ID c4c6fd5a-1f29-48e5-a0a7-f7145c603f71 ."
s6-svscanctl: fatal: unable to control /service: supervisor not listening
time="2017-08-11T15:33:27Z" level=fatal msg="Parent process went away. Shutting down."
I also notice that the scheduler (v0.8.2) crashes, with the following logs:
time="2017-08-11T15:46:39Z" level=debug msg="Processing event." event={scheduler.reserve 55dabcd7-55ef-44f5-8d19-83d6e652ee8c reply.2117500817702455091 1i808 instance map[schedulerRequest:map[resourceRequests:[map[resource:instanceReservation amount:1 type:computePool]] phase:instance.start context:[map[state:starting accountId:143 imageId:809 token:GZiTQTzSr8N4qo1VLf9KABZ7aQimHieMu4FAFPZY2OhiltfXAmVUIwg5bMjR3IKfRq8DC5OE6gEXZX1NPCLsQ nativeContainer:false allocationState:active system:false kind:container data:map[fields:map[dnsSearch:[torque.rancher.internal fasttext-service.torque.rancher.internal rancher.internal] dns:[169.254.169.250] networkMode:managed vcpu:1 runInit:false stdinOpen:false allocatedIpAddress:<nil> dataVolumeMounts:map[] transitioningMessage:Starting transitioningProgress:84 startOnCreate:true tty:true dataVolumes:[] serviceIndex:1 internalVolumes:[] imageUuid:docker:docker.int.thomsonreuters.com:5001/tr-rd/fasttext_models hostId:20 logConfig:map[] publishAllPorts:false readOnly:false labels:map[io.rancher.project.name:torque io.rancher.project_service.name:torque/fasttext-service/fasttext-models io.rancher.stack.name:torque io.rancher.service.hash:5355140935fcb1d1e1e518b7d23ad99b9606bce9 io.rancher.service.deployment.unit:ac4d3840-bae2-4b1e-acd5-8efaa5c3f242 io.rancher.stack_service.name:torque/fasttext-service/fasttext-models io.rancher.service.launch.config:fasttext-models] privileged:false networkIds:[45]]] description:<nil> created:1.502466269e+12 name:torque-fasttext-service-fasttext-models-1 zoneId:1 memoryMb:<nil> serviceId:<nil> stackId:<nil> memoryReservation:<nil> compute:<nil> externalId:<nil> healthState:<nil> firstRunning:<nil> deploymentUnitUuid:ac4d3840-bae2-4b1e-acd5-8efaa5c3f242 healthUpdated:<nil> networkContainerId:<nil> dnsSearchInternal:torque.rancher.internal,fasttext-service.torque.rancher.internal,rancher.internal registryCredentialId:192 milliCpuReservation:<nil> removeTime:<nil> dnsInternal:169.254.169.250 domain:<nil> id:808 serviceIndexId:463 hostname:<nil> version:c0536133-3512-466a-a640-165ef839e177 userdata:<nil> uuid:58df7077-36bf-4cb9-9959-aa1d91026e4b createIndex:1 agentId:<nil> removed:<nil> startCount:0 instanceTriggeredStop:stop]] hostID:1b3ab50c-aa3f-4e0a-a5f6-8d0c2f86e0ca]] 1.502466399502e+12}
time="2017-08-11T15:46:39Z" level=info msg="Received event: Name: scheduler.reserve, Event Id: 55dabcd7-55ef-44f5-8d19-83d6e652ee8c, ResourceId : 1i808"
time="2017-08-11T15:46:39Z" level=info msg="Reserving [] for 1b3ab50c-aa3f-4e0a-a5f6-8d0c2f86e0ca. Force=false"
time="2017-08-11T15:46:39Z" level=info msg="Reply: Name: reply.2117500817702455091, PreviousIds: [55dabcd7-55ef-44f5-8d19-83d6e652ee8c], ResourceId: 1i808, Data: map[]."
time="2017-08-11T15:46:49Z" level=debug msg="No changes in metadata version"
time="2017-08-11T15:46:55Z" level=error msg="Exiting scheduler with error: <nil>"
I’ve also noticed that, on startup of the server, I get the following logs which I have not noticed before:
time="2017-08-11T15:25:59Z" level=info msg="Downloading key from http://localhost:8081/v1/scripts/api.crt"
time="2017-08-11T15:25:59Z" level=fatal msg="Error getting config." error="Invalid key content"
and
time="2017-08-11T15:26:00Z" level=warning msg="Couldn't load install uuid: Get http://localhost:8080/v2-beta: dial tcp [::1]:8080: getsockopt: connection refused"
time="2017-08-11T15:26:00Z" level=fatal msg="Unable to create event router" error="Get http://localhost:8080/v2-beta: dial tcp [::1]:8080: getsockopt: connection refused"
time="2017-08-11T15:26:00Z" level=fatal msg="Failed to configure cattle client: Get http://localhost:8080/v2-beta: dial tcp [::1]:8080: getsockopt: connection refused"
Any ideas why this might be happening?
Thanks,
Aidan