Cluster

System1 · March 20, 2012, 12:29am

Hi,

One of my customers suffered a power outage when a branch knocked down a
power line and blew up a transformer. The surge took out lots of stuff.

Their server lost 2 hard drives and the RAID controller. I’m blaming it
on the branch even though I don’t really know what really took them out.

They want a 3 node cluster to improve redundancy. This is a hardware
question. They don’t want a single point of failure (RAID controller).
I believe the easy answer is 3 individual computers. Are there other
configurations that will satisfy their need? Blade maybe?

They said they lost almost a $1,000.00 and hour. The server went down
Tuesday and was finally functional Friday. They are a 24/7 operation.

Thanks

system · March 20, 2012, 12:56am

On 19.03.2012 23:29, Bob Crandell wrote:[color=blue]

Hi,

One of my customers suffered a power outage when a branch knocked down a
power line and blew up a transformer. The surge took out lots of stuff.

Their server lost 2 hard drives and the RAID controller. I’m blaming it
on the branch even though I don’t really know what really took them out.

They want a 3 node cluster to improve redundancy. This is a hardware
question. They don’t want a single point of failure (RAID controller).[/color]

There is no such thing as no single point of failure. In a cluster, no
matter how many nodes, it’s the shared storage. Of course, that can be
“mirrored” too, but still there’s some SPOF, somewhere. Next time,
whatver murphy finds, will corrupt the data, which will mirror to all
online copies.
[color=blue]

They said they lost almost a $1,000.00 and hour. The server went down
Tuesday and was finally functional Friday. They are a 24/7 operation.[/color]

They don’t have a single point of failure problem, but a massive
disaster recovery problem. Apparently, they have had no (working) DR
plan. Three days+ qualifies as “not working”.

CU,

Massimo Rosen
Novell Knowledge Partner
No emails please!
http://www.cfc-it.de

system · March 20, 2012, 2:44am

On Mon, 19 Mar 2012 22:56:44 +0000, Massimo Rosen wrote:
[color=blue]

On 19.03.2012 23:29, Bob Crandell wrote:[color=green]

Hi,

One of my customers suffered a power outage when a branch knocked down
a power line and blew up a transformer. The surge took out lots of
stuff.

Their server lost 2 hard drives and the RAID controller. I’m blaming
it on the branch even though I don’t really know what really took them
out.

They want a 3 node cluster to improve redundancy. This is a hardware
question. They don’t want a single point of failure (RAID controller).[/color]

There is no such thing as no single point of failure. In a cluster, no
matter how many nodes, it’s the shared storage. Of course, that can be
“mirrored” too, but still there’s some SPOF, somewhere. Next time,
whatver murphy finds, will corrupt the data, which will mirror to all
online copies.[/color]
So a cluster consists of 2 or more computers and shared storage?
(Teaching moment) I thought it could be done that way or each node handle
it’s own copy of the data. Well, that changes things.
[color=blue]
[color=green]

They said they lost almost a $1,000.00 and hour. The server went down
Tuesday and was finally functional Friday. They are a 24/7 operation.[/color]

They don’t have a single point of failure problem, but a massive
disaster recovery problem. Apparently, they have had no (working) DR
plan. Three days+ qualifies as “not working”.[/color]
This is true.
[color=blue]

CU,[/color]

So if we were to start over from the beginning then it would be better to
build a server, clustered or not and take a snap shot once every 6 months
to a year and keep replacement parts on hand in case of branches.
Yes? No?

system · March 20, 2012, 12:25pm

On 20/03/12 00:44, Bob Crandell wrote:[color=blue]

On Mon, 19 Mar 2012 22:56:44 +0000, Massimo Rosen wrote:
[color=green]

On 19.03.2012 23:29, Bob Crandell wrote:[color=darkred]

Hi,

One of my customers suffered a power outage when a branch knocked down
a power line and blew up a transformer. The surge took out lots of
stuff.

Their server lost 2 hard drives and the RAID controller. I’m blaming
it on the branch even though I don’t really know what really took them
out.

They want a 3 node cluster to improve redundancy. This is a hardware
question. They don’t want a single point of failure (RAID controller).[/color]

There is no such thing as no single point of failure. In a cluster, no
matter how many nodes, it’s the shared storage. Of course, that can be
“mirrored” too, but still there’s some SPOF, somewhere. Next time,
whatver murphy finds, will corrupt the data, which will mirror to all
online copies.[/color]
So a cluster consists of 2 or more computers and shared storage?
(Teaching moment) I thought it could be done that way or each node handle
it’s own copy of the data. Well, that changes things.
[color=green]
[color=darkred]

They said they lost almost a $1,000.00 and hour. The server went down
Tuesday and was finally functional Friday. They are a 24/7 operation.[/color]

They don’t have a single point of failure problem, but a massive
disaster recovery problem. Apparently, they have had no (working) DR
plan. Three days+ qualifies as “not working”.[/color]
This is true.
[color=green]

CU,[/color]

So if we were to start over from the beginning then it would be better to
build a server, clustered or not and take a snap shot once every 6 months
to a year and keep replacement parts on hand in case of branches.
Yes? No?
[/color]

I would have a solution where you have 2 clusters that have mirrored
data at SAN level or log shipping if it is a DB.
each of the clusters should be in a different racks with separate power
supply if in the same datacenter, a different datacenter if possible
would be better.

you should have to different network paths to the different racks/
datacenters to allow for carrier/ switch/ power failure

I agree that they need to have DR plan worked out.

Without more info it is difficult to make a call.

Lance

system · March 20, 2012, 4:20pm

On Tue, 20 Mar 2012 10:25:40 +0000, Lance Haig wrote:
[color=blue]

On 20/03/12 00:44, Bob Crandell wrote:[color=green]

On Mon, 19 Mar 2012 22:56:44 +0000, Massimo Rosen wrote:
[color=darkred]

On 19.03.2012 23:29, Bob Crandell wrote:

Hi,[/color][/color]
SNIP <[color=green]

So if we were to start over from the beginning then it would be better
to build a server, clustered or not and take a snap shot once every 6
months to a year and keep replacement parts on hand in case of
branches. Yes? No?

[/color]
I would have a solution where you have 2 clusters that have mirrored
data at SAN level or log shipping if it is a DB. each of the clusters
should be in a different racks with separate power supply if in the same
datacenter, a different datacenter if possible would be better.

you should have to different network paths to the different racks/
datacenters to allow for carrier/ switch/ power failure

I agree that they need to have DR plan worked out.

Without more info it is difficult to make a call.

Lance[/color]

At least I have a better understanding of what I think I know. Now I get
to see how much they really want to spend.

Thanks

system · March 20, 2012, 6:03pm

Hi.

On 20.03.2012 11:25, Lance Haig wrote:[color=blue]

I would have a solution where you have 2 clusters that have mirrored
data at SAN level or log shipping if it is a DB.[/color]

Still a (many!) SPOF. If the data gets corrupted for whatever reason
(broken RAID controller, OS running wild, user error), the data is still
lost on both sides.

CU,

Massimo Rosen
Novell Knowledge Partner
No emails please!
http://www.cfc-it.de

system · March 20, 2012, 7:32pm

They said they lost almost a $1,000.00 and hour. The server went down[color=blue]
Tuesday and was finally functional Friday. They are a 24/7 operation.[/color]

Everyone cries the blues when the system goes down, but if you hand them
a bill for the amount of services and hardware required to prevent it, they
usually quiet down a good bit and find it more as ‘darned inconvenient’
rather than some catastrophic loss.

system · March 20, 2012, 10:23pm

Bob Crandell wrote:
[color=blue]

So if we were to start over from the beginning then it would be
better to build a server, clustered or not and take a snap shot once
every 6 months to a year and keep replacement parts on hand in case
of branches. Yes? No?[/color]

It really depends on:
1). How quickly they need to be running
2). How much data they are prepared to loose
3). How much they are willing to spend

With virutalisation you can build up some pretty reasonable DR
solutions inexpensively to reduce the risk.

I’m just wrapping up a DR project and was frankly stunned at what we
could achieve for the money we spent.

system · March 20, 2012, 10:46pm

On 20/03/12 16:03, Massimo Rosen wrote:[color=blue]

Hi.

On 20.03.2012 11:25, Lance Haig wrote:[color=green]

I would have a solution where you have 2 clusters that have mirrored
data at SAN level or log shipping if it is a DB.[/color]

Still a (many!) SPOF. If the data gets corrupted for whatever reason
(broken RAID controller, OS running wild, user error), the data is still
lost on both sides.

CU,[/color]

Agreed completely.

We used to have our db logs shipped over to the second DB server but not
imported. so if we had a DB corruption we would be able to import logs
upto the corruption and then do the rest manually.

It saved my bacon twice.

Lance

system · March 21, 2012, 12:16am

Depending on the Hardware involved.
You could cluster two of the Servers and use the third as a ÂSnapshot
serverÂ for routine ÂsnapsÂ of the Clustered Volumes.
If you have a SAN in place, even better Â Create a Virtual environment
as a Pseudo ÂDRÂ for the Clustered Servers. Of course you would need
some type of replication software and/or a backup solution in place to
replicate the Clustered data over to the ÂDRÂ site.
You could use Operating Systems like SLES10/11 OES2/OES11 for
your Virtual ÂDRÂ site on your SAN or if a Server is ÂpowerfulÂ enough,
that can be used for the ÂmuscleÂ required to run multiple guests within
the host.

Leroy Joseph
Visual Click Software
(eDirectory Management and Reporting)
‘eDirectory Management | DSRAZOR for eDirectory’
(http://www.visualclick.com/content/dsrazor-for-edirectory.htm)

–
leroyjjr

leroyjjr’s Profile: http://forums.novell.com/member.php?userid=75462
View this thread: http://forums.novell.com/showthread.php?t=453643

system · March 21, 2012, 4:54pm

On Tue, 20 Mar 2012 17:32:46 +0000, GofBorg wrote:
[color=blue][color=green]

They said they lost almost a $1,000.00 and hour. The server went down
Tuesday and was finally functional Friday. They are a 24/7 operation.[/color]

Everyone cries the blues when the system goes down, but if you hand them
a bill for the amount of services and hardware required to prevent it,
they usually quiet down a good bit and find it more as ‘darned
inconvenient’ rather than some catastrophic loss.[/color]

Yeah. I offered the advice I’m getting here and haven’t heard a peep.
Maybe they are trying to make up for last week.

Topic		Replies	Views
Private message from Johnfm3	0	177	January 26, 2018
Tinkering Chatting over the Back Fence	16	217	March 30, 2012
Fencing question	1	195	June 18, 2019
Cluster resource stop on 3 node cluster SLES High Availability Extension	1	550	November 15, 2016
Cluster not ready after power failure Harvester	0	972	April 23, 2023

Cluster

CU,

CU,

– leroyjjr

Related topics

–
leroyjjr