I’m currently experiencing an issue where the Longhorn Manager is unable to connect to the Instance Manager on the same node. Or at the very least it has transient failures to connect. Here’s an example of a log I’m seeing in the Longhorn Manager.
longhorn-manager-q4x8p longhorn-manager W1215 03:35:22.010195 1 logging.go:59] [core] [Channel #197 SubChannel #198] grpc: addrConn.createTransport failed to connect to {
longhorn-manager-q4x8p longhorn-manager "Addr": "10.42.10.18:8502",
longhorn-manager-q4x8p longhorn-manager "ServerName": "10.42.10.18:8502",
longhorn-manager-q4x8p longhorn-manager "Attributes": null,
longhorn-manager-q4x8p longhorn-manager "BalancerAttributes": null,
longhorn-manager-q4x8p longhorn-manager "Type": 0,
longhorn-manager-q4x8p longhorn-manager "Metadata": null
longhorn-manager-q4x8p longhorn-manager }. Err: connection error: desc = "transport: Error while dialing: dial tcp 10.42.10.18:8502: operation was canceled"
Due to this it appears that Longhorn is unstable. With volumes mounted into Pods periodically experiencing an I/O error which causes the Pod to restart.
I’ve deployed Longhorn 1.5.3 into a Rancher managed RKE v1.27.6 cluster using Ubuntu 22.04 nodes. I’ve also reconfigured the taints and tolerations as I only want one set of nodes disks to be used and I want the UI / driver on the tools nodes. I’ve used the following values.yaml for this deployment.
defaultSettings:
createDefaultDiskLabeledNodes: true
taintToleration: type=tools:NoSchedule; katonic.ai/node-pool:NoSchedule
longhornManager:
tolerations:
- key: "type"
operator: "Equal"
value: "tools"
effect: "NoSchedule"
- key: "katonic.ai/node-pool"
operator: "Exists"
effect: "NoSchedule"
longhornDriver:
tolerations:
- key: "type"
operator: "Equal"
value: "tools"
effect: "NoSchedule"
nodeSelector:
type: "tools"
longhornUI:
tolerations:
- key: "type"
operator: "Equal"
value: "tools"
effect: "NoSchedule"
nodeSelector:
type: "tools"
I’ve also created two debug Pods on the same node and confirmed using nc
that I am able to communicate between two Pods on the same node.
Any help on this would be greatly appreciated
Des