Persistent volumes are growing in size while idle

Hi Guys,

I’m observing some odd behaviour with 3 volumes mounted to 3 nodes running CockroachDB. Here is the thread on CRDB forum with all the info:

What can drive this (and how to see/verify it)?

Have you tried to check the size is increased of path /var/lib/longhorn on the host?

Doesn’t look like it is, here is the output:

$ du -ch /var/lib/longhorn/
40M /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v0.8.0
0 /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v0.8.1
0 /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.0.0
0 /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.0.1
26M /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.0.2
26M /var/lib/longhorn/engine-binaries/rancher-mirrored-longhornio-longhorn-engine-v1.1.0
26M /var/lib/longhorn/engine-binaries/rancher-mirrored-longhornio-longhorn-engine-v1.1.1
116M /var/lib/longhorn/engine-binaries
0 /var/lib/longhorn/replicas
116M /var/lib/longhorn/
116M total

Had you taken the snapshots of these 3 volumes periodically?
Could you please check the Snapshots and Backups history?
Like this:

No, there are no scheduled backups (there is no db schema yet) and no snapshots. Each volume is scaled back to 1 replica.

The actual size is calculated by the sum of all engine snapshots.

The strange thing is looks like your Longhorn cluster is upgraded from the oldest version v0.8.0, and upgrade to v1.1.1.

$ du -ch /var/lib/longhorn/
40M /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v0.8.0
0 /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v0.8.1
0 /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.0.0
0 /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.0.1
26M /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.0.2
26M /var/lib/longhorn/engine-binaries/rancher-mirrored-longhornio-longhorn-engine-v1.1.0
26M /var/lib/longhorn/engine-binaries/rancher-mirrored-longhornio-longhorn-engine-v1.1.1

Could you please check which longhorn-engine version the 3 volumes are using?
If possible, please make sure they all upgrade to the longhorn-engine version v1.1.1.
After that, the old longhorn-engine binary should be clean up on the host if I remember correctly.

The engine version was 1.1.0 upgraded to 1.1.1
Will report in several days. Thanks!

I had a similar problem with a volume that was used by prometheus. Since there is constant change on the volume the underlying longhorn filesystem usage on the node was increasing constantly while the real prometheus used data ( i.e. its size ) was rather constant.

I had to enable frequent snapshots of that longhorn volume ( not necessarily backups though ) to trigger cleanups done by longhorn on that volume - I guess it consolidates the changes between “now” and the latest snapshot and thus merges changes written to head of volume and releases diskspace on the underlying filesystem on the node.

1 Like

@manuel-koch Do you mean that enabling frequent snapshots of Longhorn volume reduces the amount of data compared to keeping only 1 snapshot in the Longhorn volume?

I guess the key is creating frequent snapshots to allow Longhorn to consolidate recent changes on the volume and don’t keep many intermediate states for longer time.
I.e. I create a snapshot once a day and just keep at most 4 recent snapshots.

1 Like