Server help (raid died)

ERU

ERU

Associate
Joined
8 Nov 2003
Posts
350
Location
Caerdydd
Hi all,

I’m looking for some advice and guidance as we contract ‘in’ a local IT company to sort IT issues.
Recently the server: had a single raid HDD failure, but when the ‘new’ HDD was put in to rebuild, it seems the Domain Controller (catalogue of all PCs and users accounts) also packed in… The server is setup with x2 HDD drives for a (raid) Windows Server OS and x2 HDD drives for (raid) data/virtual servers. The IT techs said they would let us know if the Domain Controller becomes usable again! Thereafter, the Domain Controller restore didn’t work...

They were unable to unable to fix the Domain Controller. As such, they suggested a new Domain must be created, all (200 ish) user accounts recreated, all PC and laptops visited manually and joined to the new Domain and everyone’s user data moved to their new accounts. The latest news suggest the backup was a backup (in another building) of the 'broken' array.

By now we were on the fourth day of ‘day rate’… We had no choice but to agree, as people couldn’t work.

The above was setup onto their loan server, where they managed to restore the data back from the other single working HDD. They could browse through the data etc, but the restore, and subsequent turning on of the Virtual server produced a lot of errors - so potentially the odd file might not open / behave unexpectedly.

The latest is these two quotes, for a new server we now apparently need (despite buying a bespoke one from another IT company about 3 years ago)… I guess they are claiming the raid controller isn’t working? As a Domain Controller is just software?

Quote 1 (2 servers recommended by IT company - one for redundancy):
-HPE ProLiant ML110 Gen10 Performance - tower - Xeon Silver 4110 2.1 GHz - 16 GB (1,422.00 x 2 = £2,844.00)
-HPE SmartMemory - DDR4 - 16 GB (211.14 x 2 = £422.28)
-HPE Midline - hard drive - 1 TB - SATA 6Gb/s (121.99 x 8 = £975.92)
-Microsoft®WindowsServerSTDCORE 2016 Sngl Academic OLP 16Licenses NoLevel CoreLic (196.55 x 2 = £393.10)
@ £5576.76

Quote 2 (possibly best with some sort of cloud based service?):
-HPE ProLiant ML110 Gen10 Performance - tower - Xeon Silver 4110 2.1 GHz - 16 GB (1,422.00 x 1 = £1,422.00)
-HPE SmartMemory - DDR4 - 16 GB (211.14 x 1 = £211.14)
-HPE Midline - hard drive - 1 TB - SATA 6Gb/s (121.99 x 4 = £487.96)
-Microsoft®WindowsServerSTDCORE 2016 Sngl Academic OLP 16Licenses NoLevel CoreLic (196.55 x 1 = £196.55)
@ £2795.58

What should I be worried about? If anything?
 
Domain Controller is just a role being performed by a Windows server. You should normally have more than one (to provide redundancy in the event of failure), and they should be in different physical boxes so that things like a array failure or power failure don't bring down your network.

From your OP it sounds like your Domain Controller was a virtual server running on top of a physical host.
In which case - what was backing up your physical host, and did this include point in time snapshots of the Virtual machines running on it?

Are you an academic organisation, because that's what type of licence they're selling you.
I'm concerned that you can't reuse your existing Windows server licences, but its been a long long time since I dealt with windows licencing, so I'm no expert.

Those are eye watering costs for 1TB hard drives, but I'm going to assume they include onsite 24/7 replacement or something. Otherwise you are paying purely for the HPE certified sticker on them.

There are tools available to recover from a Raid-5 or Raid-6 failure. They're slow, and you're probably more concerned about time to get back online at this point.
If you had Raid 1, then rebuilding should have been transparent (assuming they pulled the broken drive and not the good one!).

What does Quote 2 include that is "cloud based" ??
You need details before betting 200 [currently annoyed] users on that.

Good luck.
 
Ouch, a single DC for 200 users?!

it may be worth getting MS involved in getting the original DC up and running depending on its current state, ie does it boot but fail due to errors or have they lost the virtual disks completely?, there expensive but they are bloody good.
rebuilding a domain would be my very last choice and not exactly the time scales/pressure you would want to be given when setting one up from scratch.


Those are eye watering costs for 1TB hard drives, but I'm going to assume they include onsite 24/7 replacement or something. Otherwise you are paying purely for the HPE certified sticker on them.

Yet still cheaper than Dell with a normal NBD warranty. Its phenomenal just how much they must make on HD's in servers.
 
What does Quote 2 include that is "cloud based" ??
You need details before betting 200 [currently annoyed] users on that.
That was my sugestion. I will need an backup if I continue with one server.


The latest blurb I've recieved:

“In theory we can reuse the old server, but need to double check if we need to replace any parts in it. It may be that it is not useable or to replace the components it is not cost effective.

The old server could be used as an Active Directory and file replication partner. This would ensure there is a copy of the Domain if one of the servers fails – rebuilding the domain is what took 2 days.

Just a note, there is absolutely no guarantee the old server wont fail again in the near future though. You cannot make a copy of the domain and then restore the server away. Bringing it back online will cause serious issues to the domain, defeating the purpose of a replica.

Using the old server as a replication partner is a possibility but with limitations as mentioned.

Having a 2 new servers as partners is a much safer and reliable idea, plus they are covered for the next 3 years as well with a hardware warranty.

Perhaps it’s worth considering the current costs of this scenario vs. the cost of two servers and what that impacts this causes if this happens again.”

I have attached both quotations for you again and can hold this pricing for 2 weeks. I may have to requote after that due to HP prices going up and down on a weekly basis. I will do my best to hold them for as long as possible though, it is normally a standard 7 days."
 
Future worries include the backup software that was in use and a likely replacement called "Cobian" - as well as how we can backup the "Active Directory" as well,
 
Sounds like this IT company has taken over a turd to start with, obviously backups were not working of the virtual DC although you could find out if the original DC was virtual and if it was what was backing up that virtual server, if they took over from another IT company they should have confirmed this virtual DC was restorable in a disaster.

I would be worried with allowing this IT company to continue in supporting you if they couldn't restore the original virtual DC (if it was virtual) as they obviously never checked the backup was usable.

Those costs seem ok from the quotes but I would want a low level scope of works written up with exactly what they are going to do. I also not sure I agree with what they are saying, some of it doesn't make sense or is hard to understand.
 
Sounds like this IT company has taken over a turd to start with, obviously backups were not working of the virtual DC although you could find out if the original DC was virtual and if it was what was backing up that virtual server, if they took over from another IT company they should have confirmed this virtual DC was restorable in a disaster.

I would be worried with allowing this IT company to continue in supporting you if they couldn't restore the original virtual DC (if it was virtual) as they obviously never checked the backup was usable.

Those costs seem ok from the quotes but I would want a low level scope of works written up with exactly what they are going to do. I also not sure I agree with what they are saying, some of it doesn't make sense or is hard to understand.

I am certainly no server expert but I think the only bit that doesn't make sense is this bit

Just a note, there is absolutely no guarantee the old server wont fail again in the near future though. You cannot make a copy of the domain and then restore the server away. Bringing it back online will cause serious issues to the domain, defeating the purpose of a replica.


and Having read it a few times I think what they are saying is that if you use the old server as replication/backup there is a risk of future failure,
where they have said you cannot restore the server away, I think what they mean is you cannot just copy the domain and then store the server away and switch it off then power it back on again in the event of the new server failing for any reason.



As a side note, how do you go about protecting your domain against failure in an environment where running 2 servers isn't feasible due to costs (Small domain with a handful of users) ?
In my mind the best way is to run a 2nd VM in the same box, but off a different HDD array ?
If that is the best way, is there anyway to make the replication server mostly dormant and only to "kick-in" if the main DC fails.
 
Backups = Schrödingers cat :D

re a 2nd DC, if you dont want the hardware and you have a decentish firewall/internet that can do site2site vpns then it would be worth looking into setting up an Azure environment just to house the 2nd dc. costs would be 100 or so a month as you dont need a huge VM if its just doing AD.

Theres no point running 2 on the same box, different raid array may mitigate that but its still putting all eggs in that basket.
 
Sounds like an ideal candidate for an Azure setup. depending on the internet speeds you can set up a DC in Azure for a relatively low running cost. Depending on your business requirements you could even move all of your business to Azure (Office365, Sharepoint, teams etc).

Who was managing your backups?

If the current IT company are being paid to manage your backups then the inability to restore from them is their fault! However is that's not the case and they are trying to recover from a DC failure without backups then it's expensive and time consuming.
 
As a side note, how do you go about protecting your domain against failure in an environment where running 2 servers isn't feasible due to costs (Small domain with a handful of users) ?
In my mind the best way is to run a 2nd VM in the same box, but off a different HDD array ?
If that is the best way, is there anyway to make the replication server mostly dormant and only to "kick-in" if the main DC fails.

Reliable, consistent, and monitored backups. That's the ONLY way to protect any Domain properly (Or any data in general). Even with multiple Domain Controllers corruption can occur, user error can break things, patches can corrupt things. There all unlikely I agree, but I've seen it happen multiple times, and the only sure fire way to protect it is good backups. I've seen so many companies put a low priority on backups and assume replicas and copies are the same thing.. they often regret it in the end.
 
As a side note, how do you go about protecting your domain against failure in an environment where running 2 servers isn't feasible due to costs (Small domain with a handful of users) ?
In my mind the best way is to run a 2nd VM in the same box, but off a different HDD array ?
If that is the best way, is there anyway to make the replication server mostly dormant and only to "kick-in" if the main DC fails.

A cheap Dell server with license can be had for well under £1k, it's probably worth that in business costs to save hassle in the future if the server goes down. Or when the server is upgraded, keep the old server as a second DC. I've re-purposed old servers as Veeam backup machines or second DCs several times.
 
If they're having do do a rebuild rather than just restore a backup, it's an opportunity to get things right. You are now discovering the cost of downtime. This should help you determine your budget.

I'm a great believer in business critical IT equipment being on-site. You don't want to be vulnerable to someone digging through a cable.

Now, I reckon you need four boxes. Two DCs, one fileserver, and one spare. The good news is that these do not have to be especially powerful: AD and fileserving are not demanding tasks. You want the spare so that if one server goes pop you can unplug the disks and RAM and plug them into the spare server. Having a cloud DC and a cloud backup of the fileserver is also a good idea.

BTW are you entitled to academic licensing rates? You don't want to find out the hard way.
 
Now, I reckon you need four boxes. Two DCs, one fileserver, and one spare. The good news is that these do not have to be especially powerful: AD and fileserving are not demanding tasks. You want the spare so that if one server goes pop you can unplug the disks and RAM and plug them into the spare server. Having a cloud DC and a cloud backup of the fileserver is also a good idea.

I'd virtualise the four servers onto two boxes, one running DC and fileserver, the other DC and Veeam. If the main server goes down, you can spin the fileserver up on the second box relatively easily and it gives you time to replace or fix the other server. A Veeam Essentials and VMware Essentials license wouldn't add too much to the initial total cost but in the event of problems they'd probably pay for themselves with the downtime saved.
 
I don't think that sort of virtualisation has a place in this scenario. ISTM that virtualisation is really more for a larger data centre; this is just going to be a small rack in a cupboard or corner. I would, however, consider virtualising each server on a 1:1 basis so it's easier to port across hardware, but I'd keep each server on separate physical hardware.
 
ISTM that virtualisation is really more for a larger data centre;

Not at all, the ease of management and reduced costs in event of downtime make it worth doing for most cases. You would only be paying for two physical boxes instead of four, and you would also only need to license those two boxes with Server 2016 to get four VOSE, instead of having to license three physical servers. You'd be paying for some sort of backup solution anyway, so the cost of Veeam balances out, the only extra cost would be for VMware which is offset by the need for one less Server 2016 license and one less server.
 
ITYM three boxes as I advocated getting one as a spare.
Now, I reckon you need four boxes. Two DCs, one fileserver, and one spare.

You'd still be buying four boxes even if one was a spare.

But in this situation ISTM that minimising the number of boxes would lead to the OP getting into the same mess all over again.

Not really, as it seems they only had one DC, I'm recommending getting two.
 
Seems counter-intuitive to suggest a small business buys 4 physical servers with the intent of one being there 'just incase' one of the others outright dies, even more so if you've then got all this equipment sitting in the same physical location - how do you know whatever kills it isn't going to have taken the drives or RAID controller with it (or all the servers!).

Street's two server suggestion sounds the most sensible. The single most important thing is that your backups work. If they'd worked in the first place you could have just restored everything once you'd rebuilt your RAID array within a reasonable time frame. You need to be asking this IT company how they propose backing up your new equipment, and how they can demonstrate to you on an on-going basis that what they are backing up is actually restorable in a disaster.
 
Seems counter-intuitive to suggest a small business buys 4 physical servers with the intent of one being there 'just incase' one of the others outright dies,

The OP is currently finding out the cost of not doing this.

even more so if you've then got all this equipment sitting in the same physical location - how do you know whatever kills it isn't going to have taken the drives or RAID controller with it (or all the servers!).

Then there are bigger problems - like a fire or the whole lot being stolen (and they came back 2 weeks later). That's why you have off-site backups.

Street's two server suggestion sounds the most sensible.

Yes, it does, from a technical perspective, but I have been there and done that and you need to look at the issue from a business perspective. Loss of business or downtime is the critical issue here. Businesses that suffer catastrophic IT failures have an increased chance of going out of business. Yes, the up-front costs are likely greater, but the OP will be glad when things go wrong again.
 
Yes, it does, from a technical perspective, but I have been there and done that and you need to look at the issue from a business perspective. Loss of business or downtime is the critical issue here. Businesses that suffer catastrophic IT failures have an increased chance of going out of business. Yes, the up-front costs are likely greater, but the OP will be glad when things go wrong again.

With two virtualised servers, there would be very minimal downtime. Simply spin up the fileserver backup on the second host while the first host is repaired or replaced. Domain services would never have any interruption and the fileserver could be back up and running fairly quickly without me having to even visit the site. When the first host is repaired/replaced, restore the DC backup onto it and move the fileserver back over.
 
Last edited:
Back
Top Bottom