Home privately used lab environment - Overkill hardware but

Background info on myself and the environment which drives this thread.
I am a contractor working in Cloud Computing with mainly Microsoft based technology. I have always ran some form of SuSE linux (starting with 7.x 32bit) for my home needs where I have virtualized or used WINE to cover my Windows needs. My home lab has been where I have been able to explore and expand my IT skills. Now all these years later, I have a son (starting college) who is following in my foot steps, I find myself rebuilding my lab environment to meet todays Enterprise needs.

The physical hardware at our home includes the following
Main Household Infra
2 Dell R710 2U (XEN)
dual proc quad core 2.4Ghz
16Gb RAM
3 SAS 15k 300GB drives
1 SAS 800GB SSD
4 1G ethernet
2 10Gb SFP network
1 Dell R610 1U (firewall/Router)
dual proc quad core 2.2Ghz
16Gb RAM
2 SAS 10k 300GB drives
4 1G ethernet
1 Dell T610 5U (storage)
single proc quad core 2.2Ghz
12Gb RAM
2 SATA 250GB os drives
4 SATA 2TB data drives
2 1G ethernet
2 10GB SFP network
1 Layer 3 Netgear 48port 1G with 4 SFP

My sons setup in his room
1 Dell R300 1U (firewall)
single proc dual core 2.4Ghz
8GB RAM
2 SATA 250GB drives
2 1G ethernet
1 MicroSystem 5U (XEN and Storage)
dual proc dual core 1.8Ghz
32Gb RAM
8 SATA 2TB drives
2 1G ethernet
1 Layer 3 Cisco 48port 1G with 4 SFP

1G Fiber between Household and Son’s switches

The VM network services include
LDAP (I had this working back in SLES 10 days, but never could set it up since. Need to figure this out again)
DNS
DHCP
PXE
NFS
Proxy
Nagios
Plex Media Server (this is the performance taxing vm)

I plan for my son’s environment to be a bit more volatile as he goes thru the learning process, breaking things and rebuilding. Thus this thread is about getting my environment setup and reliable for him to see as an example as well as make use of when his is down.

Though I know its not necessary, I create single purpose VMs. Mainly so that if I am performing some form of update and/or change, I only risk 1 service (ex: DNS) and not the entire home environment. Again this is way overkill, I know. But its about concepts and not practical. My firewall alone could host this entire environment with out VMs, but that looses the essence of what I am trying to setup. As I go thru setting up LDAP again and try integrating DNS in LDAP, if I make an error and need to rebuild, I am only touching the LDAP vm and maybe the the DNS vm. Or more likely, applying the last good snap shot and trying again.

What I am looking for is failover between xen servers and trying to remove single point of failure. So as my son (or myself) are learning new skill sets which could harm the physical XEN server stability, we can migrate all the VMs to the other XEN server and rebuild the physical node. Then redistribute the VMs back between the 2 nodes. Not only do these 2 XEN servers host the house hold Infrastructure (including DNS, DHCP, PXE, Proxy, and Nagios), it also will be used for my son’s home school lab environment where he can continue his school studies at home.

  1. What is the Need?
    I am in need of a Virtualization environment which has failover ability. Live vm migration is needed. Automatic failover would be nice, but not needed.

  2. What situation am I trying to protect from? Disk failure, node failure, both?
    I am looking to protect my self from both Disk and Node Failure. (more on this later)

  3. a. How much resources can you spend?
    As far as financial resources, I am tapped out. As much as I would love to buy 2 small SAS storage arrays, that’s not in the budget. Regarding my Server Resources, I can afford to spend a lot as what I have is 100 times over kill for my needs.
    b. Would it be possible to split up storage and compute?
    The only way I see it possible to break up the storage and compute needs is with 2 SAS storage arrays, and as said above that’s not in the budget at this time.
    c. Will you need to expand to three or more Xen nodes in the foreseeable future?
    There is no need to add a 3rd XEN server as 2 is way overkill for my computational needs. But 2 servers meet the needs of uptime.

  4. How do you want to distribute your compute load?
    I would like to run the “Xen part” in active/active mode (that is running VMs on both nodes simultaneously).

  5. Plans for fail over? Manually or automatically?
    In most cases, I have no need for automatic fail over. Not that it wouldn’t be a nice addition, but not needed.

  6. What I am looking for in this thread?
    I am starting this thread to start a discussion on how this technology can or should be utilized as well as discussing the difference between how it “Should” be done versus how it “Can” be done to meet non mission critical needs.

When I was running SLES 11 SP2, I had DRBD setup between 2 servers which allowed me to manually migrate a VM between the nodes with no problems. DRBD was setup in Active/Active mode. This works great and for the most part was reliable. I only had to recover from a split brain situation once or twice which went pretty smoothly. The difference now is I want to make use of LVM. In the past the VMs hd’s were RAW file format. This time with LVM I want to give the VG to XEN (I believe thats the correct term), and create partitions for each VM to use as HD’s. When I made the move to SLES 12 SP3, I tried this in stand alone and really liked the performance gain on the VMs showed. Which is why I am trying to run my media server needs as VMs.

Some of the documentation I am trying to follow includes…
https://docs.linbit.com/docs/users-guide-8.4/#ch-lvm

One of the issues I get into is that when I try to work with these services alone, SLES 12 gets upset and tries to tell me how I am missing cluster roles so it can handle the DRBD environment. Example, enable DRBD in Services. SLES 12 gets cranky and tells you that you need to install another service to manage DRBD. SLES 11 didnt care, you could setup DRBD and turn it on. It worked. As it did in the past.

Like I said, its way overkill for our computational needs. But serves a purpose in teaching tech skills to my son and myself. And this is just the start, lets see where this thread goes.

I know this is a novel
Thanks for your interest and help
John

Johnfm3 wrote:
[color=blue]

What I am looking for is failover between xen servers and trying to
remove single point of failure.[/color]

Hi John,

You’ve covered quite a bit and I expect you’ll get a number of opinions.

I’ll likely provide more feedback as time permits but, for now, I’d
like to address this one point.

These are two articles that express my point of view. You may find them
interesting.

THE INVERTED PYRAMID OF DOOM
http://www.smbitjournal.com/2013/06/the-inverted-pyramid-of-doom/

MAKING THE BEST OF YOUR INVERTED PYRAMID OF DOOM
http://www.smbitjournal.com/2015/10/making-the-best-of-your-inverted-pyramid-of-doom/


Kevin Boyle - Knowledge Partner
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below this post.
Thank you.

More documentation I am trying to follow…
https://www.suse.com/documentation/sle-ha-12/singlehtml/book_sleha_techguides/book_sleha_techguides.html

Again, it seems to instruct me to follow tasks which are not needed.

[QUOTE=KBOYLE;50731]
THE INVERTED PYRAMID OF DOOM
http://www.smbitjournal.com/2013/06/the-inverted-pyramid-of-doom/

MAKING THE BEST OF YOUR INVERTED PYRAMID OF DOOM
http://www.smbitjournal.com/2015/10/making-the-best-of-your-inverted-pyramid-of-doom/


Kevin Boyle - Knowledge Partner
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below this post.
Thank you.[/QUOTE]

Your documentation is exactly what I am trying to avoid. As much as I would love to have 2 SAS 60 drive storage arrays, each array connected to 2 storage nodes (becoming a huge mess or mesh (lol) storage cluster), which provide storage for all the XEN servers. As much as I would love to implement this on my infrastructure, its way out of budget. And there are no small drive count SAS storage arrays. Small meaning 4 drives per array would be more than enough to meet my needs.

Because of the financial cost of doing it the right way being out of my range, my solution to this situation is as follows (assuming it seems solid on this thread)…

2 XEN servers.

Each server has a Mirror set of 2 drives for OS. And a single drive for VM storage DATA. Unsure how I will use the SSD at this time. Performance comparison to SAS 15k first.

The Single Drive in each XEN server are replicated between each other via DRBD over 10gb network

LVM is laid over DRBD resource and given to XEN as a storage pool

XEN can carve out the needed partitions for the VMs to have direct access too.

The reason for the 5U server is to store all my ISO’s and install media, act as the Physical layer PXE server used to build out all physical machines and both ports on the 10GB nic have been bridged then each port has been connect to each XEN server for DRBD replication. I will have a VM to act as the BUILD server for all VMs, but will have a sub set of the ISOs which the 5U has.

I believe the R710s have a PCIx version 2 bus, and are capable of a max transfer rate of 5GB between cards on the pcix bus. I will never make full use of the 10G capability, but I believe for my small environment should be suffice.

In this situation, since storage is replicated, if I loose a single XEN host I should be able to bring up the “down” VMs on the other remaining host. A situation such as when updating firmware on a Server Node.

Johnfm3 wrote:
[color=blue]

Thanks for your interest and help[/color]

Hi John,

Have you made any progress with your quest?

I’m a bit surprised that others haven’t chimed in.

What you are attempting is what I would like to implement. At the
moment, though, I don’t have any spare equipment to “play”.


Kevin Boyle - Knowledge Partner
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below this post.
Thank you.

[QUOTE=KBOYLE;50875]Johnfm3 wrote:
[color=blue]

Thanks for your interest and help[/color]

Hi John,

Have you made any progress with your quest?

Kevin Boyle - Knowledge Partner[/QUOTE]

So due to my working hours during the week, my free time is on the weekend. And most of my time to research is done during the week. This weekend is taken up working on my Jeep for my son who needs to drive it to college while his car is being repaired.

Last weekend I had a situation which causes me to need this. I have 2 APC 600 battery backup’s. And I have broken up my infra hardware needs between the 2 battery backups. For an unknown reason, one of my backups failed after a new battery replacement. As such I had to talk my son over the phone how to start up the VMs on the remaining XEN machine. That’s not a problem, but keep in mind issues such as dhcp db updates not being sync’d, so when clients come back for a IP renewal, the running dhcp server VM has no clue as to prior IP issued. Same with DNS records kept. Luckily at this time I am not setup with dynamic dns yet.

Thanks,
John

Hi John,

[QUOTE=Johnfm3;50733][…]
The Single Drive in each XEN server are replicated between each other via DRBD over 10gb network

LVM is laid over DRBD resource and given to XEN as a storage pool

XEN can carve out the needed partitions for the VMs to have direct access too.
[…]
[/QUOTE]

please keep in mind that LVM (the one without “c” in its name :wink: ) is not cluster-aware and hence changing the LVM on one node won’t reflect to the other, despite the DRBD replication of the underlying disk.

From experience with a (partly) similar stack, I’d recommend to create LVs (per VM) on each node and sync those two via an explicit DRBD resource (giving you a DRBD resource per VM). That way, you might even get away with an active/passive setup per DRBD resource, toggling “active” manually when switching over the VM to the other node.

Unsure how I will use the SSD at this time.

I had been using bcache for quite some time, with pretty good results. The “stack” was:

  • multiple HDDs in an MD-RAID6 configuration per node, as the “backing store”

  • two SSDs in an MD-RAID1 configuration per node, as the caching tier

  • bcache using these two MD devices to create a “fast” bcache0 device

  • LVM (with bcache0 as the only PV) per node

  • one LV per virtual machine, created on both Xen nodes

  • a DRBD resource per virtual machine, to keep its LV pair in sync

Of course, you can substitute “MD-RAID” with “single device”. By introducing bcache, all those small writes, caused by even idle VMs, were sent to the SSD cache first and hence quickly completed, giving a responsive user experience.

Unless you introduce cluster-aware file systems, you need to stick to “active/passive” setups, making sure you use each VM’s LV only on a single node. Otherwise you’ll wreck your data. I strongly suggest to look into the locking mechanisms provided by Xen.

Regards,
J