Soooo many practical Gluster/Convoy questions, help needed :)

Hello fellow Ranchers, I am stuck, and we’re already running production systems on Rancher, which we might have to move back to “conventional” hosts if I don’t figure this out fairly soon. But I don’t want to …

So, I am trying to set up a system so that the container data moves with the container when it changes hosts. But I have a couple of requirements, and questions, which I seem to be unable to answer for myself, so my last hope are some smarter people than me here (shameless charming attempt :wink: )

Let’s start.

  • An open convoy ticket states that convoy design is limited to one host. I just assume this is no longer the case with GlusterFS … right?

  • If I use EBS, do the storages move with the container when using Convoy? (If yes, has anyone done it yet? Just for exchange of information?)

  • If I use the GlusterFS catalog item, what do I get? One volume only? Can I use this with several different container types? (Let’s say uploaded files for Owncloud and a Wiki and a ticketing system, for example). I’m also new to Gluster, so this might be a stupid question

  • Also if I use the GlusterFS catalog item, can I backup the contained data? Or access it on the underlying host? And if yes, how? Anybody done that? I still want separate backups, really, which is why I’d find EBS use charming.

  • Can Convoy snapshot GlusterFS mounts?

  • If I use an external GlusterFS cluster, how do I tell convoy about it?

  • Is there anyone who has set up a moving persistent container storage system with Rancher + “X” (whatever X is) with whom I can swap information?

So, that’s a lot, I know, but … HELP PLEASE! :wink:

Cheers,
Axel.

1 Like
  • I’m not sure about what you are asking for the first question, but the Gluster Fs is setup in 3 host, and the convoy part is active on all the hosts. The limitation is one pool by environment.
  • The Gluster Fs will create a 3 host Gluster FS with replication on each host. You get only one gluster volume but you can virtually put all the docker volume you want inside.
  • Yes you can do backup but I don’t know how exactly.

Actually the Gluster FS is really really slow with Rancher. So I can’t recommend it. Better using convoy with NFS for example : Convoy-gluster on specific storage hosts

Hello Seb2411, thanks for your input! Let me clarify:

  • The lead dev basically says in the ticket that convoy does not care about storage across multiple hosts AFAIU.

  • How? :slight_smile: Can I mount a “sub-volume”? Or a directory inside the volume? I don’t really want to put all files from several containers in the very same top level … “directory”. (which is what happens if I just use the volume, right?)

  • :smile: :smile: :smile: (and how do you know if you don’t know how?)

thanks a lot!

+1 for using Convoy NFS. Unless you are very familiar with optimising and operationally managing Gluster, I would stick with NFS. Whilst The Rancher catalogue provides a ‘quick start’ to get you and riunning, this isn’t really an OOTB production ready solution (nor does it claim to be).

There’s a lot more mature experience in setting up NFS, it’s pretty simple to look after, and there are plenty of comparable services on popular Cloud platforms (for example EFS on AWS is essentially a managed NFS v4 offering). If you really want your block storage to ‘follow’ your container, then you could look at something like Flocker.

  1. Convoy in Rancher is limited to one storage pool at the present time. For most cases, there shouldn’t be any issue with this.
  2. Convoy essentially connects to storage on other servers. For example, I have a PostgreSQL database on host1, but GlusterFS is running on host2. Convoy provides the connection between them.
  3. GlusterFS provides once main volume that is used by the Convoy-GlusterFS image. This volume wraps the other volumes. I have GitLab running on my servers and have gitlab-config, gitlab-data, and gitlab-logs volumes, all of which are under my glusterfs volume. To connect to gitlab-config, I just set the volume to gitlab-config:/path/to/mount. If I want to create a new volume, I just do new-volume:/path/to/mount and set the volume driver to convoy-gluster
  4. For data backup, I just connect another container to the volume and perform my sync operations via it. Often I’ll just do an s3 sync using awscli. If you want to connect to the underlying host, you do that in as normal (in rancher by downloading the machine config and using the id_rsa key to ssh in). Once on the host, check the mounts to see where gluster is mounted and that will lead you to the data.
  5. According to the docs on Convoy, yes.
  6. You may have to do some extra configuration. I haven’t messed with it yet.
  7. I have one working…mostly. It is Convoy/GlusterFS on DigitalOcean using Rancher. It still has some hiccups (90% operator error), but it serves our uses well.

Distributed storage is always a fun problem. You will tear your hair out. Test it out first before you lose data in production. And if you decide to test in production, may the high availability gods have mercy upon you.

1 Like

awesome, thanks for this long answer. I’ll try a couple of things out with gluster now and see what happens. I might come back at you with a couple of more questions, or a personal progress report :wink: .

To correct myself from earlier, your backups should be done by mounting the volumes, probably via Rancher or Docker. If you go straight to the host and backup the brick while GlusterFS is in distributed mode, you are going to have a bad time. I just finished this bad time.

:smiley: awesome. I was wondering myself, still didnt get around to trying, thanks for the heads up! (but I have automatic routing sorted out now - I emulated the openshift router in rancher :wink: )

How can you ensure that the PostgreSQL storage has a coherent state and that all the data are flushed to disk? when you back it up via snapshotting?

You shouldn’t be doing file-level snapshotting of your PostgreSQL databases. If you wish to snapshot PostgreSQL, you must shut it down first before performing the operation. This is covered in their manual at https://www.postgresql.org/docs/9.5/static/backup-file.html. Instead backups should be done via SQL dump, continuous archiving, or just running a hot standby.

As I’m aware, PostgreSQL writes the data in a few steps. First it is buffered in memory. The in-memory buffer is then written to a write-ahead log (WAL). Once the database on the filesystem is consistent with the current version, the WAL is then stored to the database. Across a GlusterFS cluster, this poses no issue as the cluster writes to bricks immediately and confirms writes.