Cannot deploy stack

aidanoldershaw · August 11, 2017, 3:57pm

Up until yesterday I was able to deploy stacks both through rancher-compose and through the UI (by uploading my docker-compose.yml and rancher-compose.yml configurations).

I am running rancher server v1.6.6 on the same server as a single host (running docker v17.03.1-ce). Shortly after running rancher-compose up -d, I observe the following logs for the rancher server (through docker logs):

time="2017-08-11T15:32:38Z" level=info msg="Stack Create Event Received" eventId=5834314d-73fe-42e1-a2b9-f0df5f53965b resourceId=1st104 
time="2017-08-11T15:32:38Z" level=info msg="Stack Create Event Done" eventId=5834314d-73fe-42e1-a2b9-f0df5f53965b resourceId=1st104 
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007fda911ea3de, pid=9, tid=140575876556544
#
# JRE version: OpenJDK Runtime Environment (8.0_72-b15) (build 1.8.0_72-b15)
# Java VM: OpenJDK 64-Bit Server VM (25.72-b15 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x8d33de]  PerfLongVariant::sample()+0x1e
#
# Core dump written. Default location: /var/lib/cattle/core or core.9
#
# An error report file with more information is saved as:
# /var/lib/cattle/hs_err_pid9.log
#
# If you would like to submit a bug report, please visit:
#   http://www.azulsystems.com/support/
#
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60908->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60906->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60888->[::1]:8081: read: connection reset by peer
time="2017-08-11T15:33:27Z" level=fatal msg="Exiting go-machine-service: Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [] from [http://localhost:8080/v2-beta]" service=gms 
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60892->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60890->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60880->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60866->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60850->[::1]:8081: read: connection reset by peer
time="2017-08-11T15:33:27Z" level=fatal msg="Unable to create event router" error="Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [] from [http://localhost:8080/v2-beta]" 
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60840->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: read tcp [::1]:60790->[::1]:8081: read: connection reset by peer
2017/08/11 15:33:27 http: proxy error: dial tcp [::1]:8081: getsockopt: connection refused
time="2017-08-11T15:33:27Z" level=info msg="Shutting down backend e2830dda-9ef9-4e5a-695e-09a9d73ebeb5. Connection closed because: websocket: close 1006 unexpected EOF." 
time="2017-08-11T15:33:27Z" level=info msg="Removed backend. Key: e2830dda-9ef9-4e5a-695e-09a9d73ebeb5. Session ID c4c6fd5a-1f29-48e5-a0a7-f7145c603f71 ." 
s6-svscanctl: fatal: unable to control /service: supervisor not listening
time="2017-08-11T15:33:27Z" level=fatal msg="Parent process went away. Shutting down."

I also notice that the scheduler (v0.8.2) crashes, with the following logs:

time="2017-08-11T15:46:39Z" level=debug msg="Processing event." event={scheduler.reserve 55dabcd7-55ef-44f5-8d19-83d6e652ee8c    reply.2117500817702455091 1i808 instance     map[schedulerRequest:map[resourceRequests:[map[resource:instanceReservation amount:1 type:computePool]] phase:instance.start context:[map[state:starting accountId:143 imageId:809 token:GZiTQTzSr8N4qo1VLf9KABZ7aQimHieMu4FAFPZY2OhiltfXAmVUIwg5bMjR3IKfRq8DC5OE6gEXZX1NPCLsQ nativeContainer:false allocationState:active system:false kind:container data:map[fields:map[dnsSearch:[torque.rancher.internal fasttext-service.torque.rancher.internal rancher.internal] dns:[169.254.169.250] networkMode:managed vcpu:1 runInit:false stdinOpen:false allocatedIpAddress:<nil> dataVolumeMounts:map[] transitioningMessage:Starting transitioningProgress:84 startOnCreate:true tty:true dataVolumes:[] serviceIndex:1 internalVolumes:[] imageUuid:docker:docker.int.thomsonreuters.com:5001/tr-rd/fasttext_models hostId:20 logConfig:map[] publishAllPorts:false readOnly:false labels:map[io.rancher.project.name:torque io.rancher.project_service.name:torque/fasttext-service/fasttext-models io.rancher.stack.name:torque io.rancher.service.hash:5355140935fcb1d1e1e518b7d23ad99b9606bce9 io.rancher.service.deployment.unit:ac4d3840-bae2-4b1e-acd5-8efaa5c3f242 io.rancher.stack_service.name:torque/fasttext-service/fasttext-models io.rancher.service.launch.config:fasttext-models] privileged:false networkIds:[45]]] description:<nil> created:1.502466269e+12 name:torque-fasttext-service-fasttext-models-1 zoneId:1 memoryMb:<nil> serviceId:<nil> stackId:<nil> memoryReservation:<nil> compute:<nil> externalId:<nil> healthState:<nil> firstRunning:<nil> deploymentUnitUuid:ac4d3840-bae2-4b1e-acd5-8efaa5c3f242 healthUpdated:<nil> networkContainerId:<nil> dnsSearchInternal:torque.rancher.internal,fasttext-service.torque.rancher.internal,rancher.internal registryCredentialId:192 milliCpuReservation:<nil> removeTime:<nil> dnsInternal:169.254.169.250 domain:<nil> id:808 serviceIndexId:463 hostname:<nil> version:c0536133-3512-466a-a640-165ef839e177 userdata:<nil> uuid:58df7077-36bf-4cb9-9959-aa1d91026e4b createIndex:1 agentId:<nil> removed:<nil> startCount:0 instanceTriggeredStop:stop]] hostID:1b3ab50c-aa3f-4e0a-a5f6-8d0c2f86e0ca]] 1.502466399502e+12} 
time="2017-08-11T15:46:39Z" level=info msg="Received event: Name: scheduler.reserve, Event Id: 55dabcd7-55ef-44f5-8d19-83d6e652ee8c, ResourceId : 1i808" 
time="2017-08-11T15:46:39Z" level=info msg="Reserving [] for 1b3ab50c-aa3f-4e0a-a5f6-8d0c2f86e0ca. Force=false" 
time="2017-08-11T15:46:39Z" level=info msg="Reply: Name: reply.2117500817702455091, PreviousIds: [55dabcd7-55ef-44f5-8d19-83d6e652ee8c], ResourceId: 1i808, Data: map[]." 
time="2017-08-11T15:46:49Z" level=debug msg="No changes in metadata version" 
time="2017-08-11T15:46:55Z" level=error msg="Exiting scheduler with error: <nil>"

I’ve also noticed that, on startup of the server, I get the following logs which I have not noticed before:

time="2017-08-11T15:25:59Z" level=info msg="Downloading key from http://localhost:8081/v1/scripts/api.crt" 
time="2017-08-11T15:25:59Z" level=fatal msg="Error getting config." error="Invalid key content"

and

time="2017-08-11T15:26:00Z" level=warning msg="Couldn't load install uuid: Get http://localhost:8080/v2-beta: dial tcp [::1]:8080: getsockopt: connection refused"
time="2017-08-11T15:26:00Z" level=fatal msg="Unable to create event router" error="Get http://localhost:8080/v2-beta: dial tcp [::1]:8080: getsockopt: connection refused" 
time="2017-08-11T15:26:00Z" level=fatal msg="Failed to configure cattle client: Get http://localhost:8080/v2-beta: dial tcp [::1]:8080: getsockopt: connection refused"

Any ideas why this might be happening?

Thanks,
Aidan

aidanoldershaw · August 11, 2017, 5:57pm

Please ignore this, I found the source of my problem: some services in the stack mount large volumes, and docker’s data directory was filling up and causing rancher and my other services to crash.

Topic		Replies	Views
View Standalone Containers Rancher 1.x	0	1380	January 3, 2017
Unable to create stack in Rancher Rancher 1.x	0	1208	January 4, 2017
Rancher upgrade from 1.1.4 to 1.2 is not safe, stacks becomes unusable Rancher 1.x	1	1257	December 6, 2016
How to troubleshoot a template Rancher 1.x	5	964	April 15, 2016
Rancher cant deploy stack to second host Rancher 1.x	0	906	March 23, 2020

Cannot deploy stack

Related topics