
|
View Full Version : Major disaster today - I need your input
Major disaster today. The hard drive on my server failed today. So for the last 4 hours, the server has been down. My host Myles (prioritycolo.com) is down at the datacenter right now trying to solve the problem. But there are some major issues.
The first is that the main hard drive will not boot up again. It times out during bios, so the data on the drive is not retrievable.
Second issue. I have a backup hard drive that performs daily, weekly and monthly backups. Myles tried to access the files on that drive, but no luck. He got multiple errors when mounting the drive on a local machine. So it's not even an issue of getting into it. He can't even mount it.
So seemingly, I have no backups of any of the sites on the server.
This is just one major disaster. I would very much appreciate your advice.
Vito
ExtremeIS 10-27-2004, 01:48 PM This is horrible Vito. I hope you have off-site backups that you can pull from?
Has he just tried mounting either of the drives as slave on another system to see if he could access the data that way?
Perhaps worst case you could send it to a data recovery specialist?
I hope things bet back up and running for you asap, I know how devastating a drive loss can be.
Steven 10-27-2004, 01:55 PM Been said time after time. Remote backups are a must.
Yes, it is horrible, Ryan.
No, there are no off site backups. Although I don't understand how that would be any different. As someone who is not well versed in server management, I must not understand the difference. My understanding was always that the backup can be to a separate drive in the same DC or it can be remote. Either way, it's on a separate hard drive, isn't it? What difference does the location make?
After attempting many different things, it looks like we will have to resort to a data recovery specialist.
Steve, as I said, given my lack of knowledge, I don't understand the benefit of remote backups vs local ones. Either way, I would prefer to hear advice on how to solve the current problem rather than an I-told-you-so type of comment. But thanks for the advice. I'll be sure to use it once this issue is resolved.
Vito
nickn 10-27-2004, 02:22 PM Originally posted by vito
Major disaster today. The hard drive on my server failed today. So for the last 4 hours, the server has been down. My host Myles (prioritycolo.com) is down at the datacenter right now trying to solve the problem. But there are some major issues.
The first is that the main hard drive will not boot up again. It times out during bios, so the data on the drive is not retrievable.
Have you tried to put a fresh drive in the primary drive slot and than put the bad drive as slave? Often times even if it won't boot up, the /home and other important partitions are still fine.
Usually when we have this issue (they do happen) the first thing we do is put a brand new drive in and than boot the BAD drive as secondary, and do our best to transfer the data from the bad to good. This typically works fine. If this doesn't work, than we break out the backup drive and do restores from there.
Thanks, Nick. Yes, we tried all of what you suggested. No go.
Some days I just really hate computers.
Vito
tpetersen 10-27-2004, 02:37 PM Originally posted by thelinuxguy
Been said time after time. Remote backups are a must.
Testing your backups on a regular basis is also another must. Its all good until you have to restore...
TP
Unknown_User 10-27-2004, 02:51 PM Originally posted by vito
Yes, it is horrible, Ryan.
No, there are no off site backups. Although I don't understand how that would be any different. As someone who is not well versed in server management, I must not understand the difference. My understanding was always that the backup can be to a separate drive in the same DC or it can be remote. Either way, it's on a separate hard drive, isn't it? What difference does the location make?
After attempting many different things, it looks like we will have to resort to a data recovery specialist.
Steve, as I said, given my lack of knowledge, I don't understand the benefit of remote backups vs local ones. Either way, I would prefer to hear advice on how to solve the current problem rather than an I-told-you-so type of comment. But thanks for the advice. I'll be sure to use it once this issue is resolved.
Vito
Vito,
Backing up remotely makes a lot of difference, take your case for example, you have backed up to a secondry drive, and your primary failed to function, and you cannot mount your secondry drive, now "what if" you had also backed up to a remote location, all you would have needed to do is get a new server in and rsync the /home, /var and your config files (If needed).
Then you would not be in as much a panic as you are now.
A am sorry to here about your problem, but it is advised that you go about backing up to remote locations, and if you are worried about degration in your servers performance when backing up so much then do it in non peak hours :)
Wish you the best of luck mate,
DislexiK
BitOMagic 10-27-2004, 03:18 PM Sorry for the problems you are having.
ExtremeIS 10-27-2004, 03:21 PM No, there are no off site backups. Although I don't understand how that would be any different. As someone who is not well versed in server management, I must not understand the difference. My understanding was always that the backup can be to a separate drive in the same DC or it can be remote. Either way, it's on a separate hard drive, isn't it? What difference does the location make?
Off-site backups are essential for any site-owner. The reason being if there is a catastrophic loss of data due to a system malfunction, electrical problem, power supply surge, etc... the chances that the 2nd drive (normally the backup drive) being damaged are very high.
I personally download all of my own sites (complete backups through cpanel) to my personal computer once a week (thank god for high speed internet). For my client backups I do monthly off-site backups to a remote computer at a separate physical location. I also frequently encourage my clients to make their own backups and have even started downloading some of the big sites on my servers and putting them on cd just in case.
In any case vito I hope you get this resolved soon, I hate days like this.
mikaelhg 10-27-2004, 03:21 PM One alternative is to send the hard drives to a data recovery service. That would probably get your customer data back, but would possibly also cost thousands of dollars.
Like many people mentioned, pros do off-site HD and tape backups for a reason.
Babushka99 10-27-2004, 03:29 PM Or even invest in a small NAS box. That can even do wonders for you. Nowadays, you can get a snap-on kind of a box for less than $1000 for about 1/3 to 1/2 TB. Well worth the investment if I may say so.
But like everyone has said, nothing like off-site back-ups!
terran11355@ 10-27-2004, 03:47 PM Originally posted by vito
Major disaster today. The hard drive on my server failed today. So for the last 4 hours, the server has been down. My host Myles (prioritycolo.com) is down at the datacenter right now trying to solve the problem. But there are some major issues.
The first is that the main hard drive will not boot up again. It times out during bios, so the data on the drive is not retrievable.
Second issue. I have a backup hard drive that performs daily, weekly and monthly backups. Myles tried to access the files on that drive, but no luck. He got multiple errors when mounting the drive on a local machine. So it's not even an issue of getting into it. He can't even mount it.
So seemingly, I have no backups of any of the sites on the server.
This is just one major disaster. I would very much appreciate your advice.
Vito
And now i checked your website www.prioritycolo.com is up now, so how did you do it?
any solutions to share with us?
Thanks
sean
sailor 10-27-2004, 03:48 PM vito - sorry ot hear this - another thing not mentioned yet is hacking.
if you get compromised and they decide to erase your data - they will erase you backup ddrive most likely.
if you ahve it backed up somewhere - they will not know this and you can restore immediately.
coight 10-27-2004, 03:51 PM Sorry to hear of your problems Vito, I'm sure Myles and team will do everything possible to get you back up. Hard Drives issues are not a fun thing to deal with :(
ExtremeIS 10-27-2004, 03:51 PM And now i checked your website www.prioritycolo.com is up now, so how did you do it?
any solutions to share with us?
Thanks
sean
I think this is where vito has his server at, it is not his company.
Pheaton 10-27-2004, 03:51 PM Priority colo is the host's site, not vito's.
What a friggin nightmare.
Both drives are now in the hands of the blood sucking leeches - data recovery specialists. They have a set of "regular" prices. Then they have a set of "emergency" prices that are exponentially higher. And I mean exponentially higher.
Bloody leeches. They clearly exploit people in dire times. How can you possibly justify prices 10x regular rates just because you push the customer up in the queue? Bloody leeches. :mad:
Vito
Two-A-T 10-27-2004, 04:35 PM Originally posted by vito
...Either way, I would prefer to hear advice on how to solve the current problem rather than an I-told-you-so type of comment...
Everyone please honor Vito's request!
Vito, I am very sorry to hear about the HD failure. I was just getting ready to contact you and tell you your sites were down when I found this thread.
Please do not hesitate to let me know if there is any way I can help. I can't offer anything in the data recovery area but if there is anything else I can do, let me know.
I sincerely hope you recover quickly! The world is a better place with Vito and his services! :)
Thanks, Chuck. I appreciate it.
Not much can be done now since the drives are in the hands of the recovery specialists/bloodsucking leeches.
I think I'll take your suggestion from our phone session an hour ago. Not much more I can do at this point. So I'll just waddle over to my mini bar and pour myself a stiff vodka Martini.
Hic...
Vito
mikaelhg 10-27-2004, 06:48 PM What'd they quote you, if you don't mind me asking?
Let's just say it ranged from $500 (haha, not likely) to $15,000.
Vito
nickn 10-27-2004, 07:05 PM Are you sure it's not the motherboard? It seems weird both drives would die at the same time, and neither would work on a restore.
No it's not the motherboard. We installed a new hard drive and it performs properly.
Vito
porcupine 10-27-2004, 07:13 PM I just thought I'd post in here so people knew what was being done. Obviously Vito is a very important customer, and what he's going through simply shouldn't have happened. As any host knows, you setup customers on a best fit scenario/solution, and thats what I had thought at the time we'd done.
Basically it's a reasonably powerfull CPanel server ,with daily backups performed from the primary drive to the secondary, then the secondary is unmounted after backups to hide it from potential script kiddies. Considering our power is *very* heavily conditioned, we're in a secure facility, etc. we've always issued this as the safest path to go.
I've always found that with remote backups on a windows desktop, if someones targetting your site, (as Vito has been victim of many times), they're far more likely to get into your home windows box, and work their way back with the help of your backups (and passwords therein), then they are to get into a secured server where they have no access, at least IMHO). And the average script kiddie couldn't find an unmounted drive in most cases, nor would most maliciously clean out a backups folder for the fun of it (note most).
Anyhow, Vito's primary drive begun to fail audiably earlier this morning. I called him to discuss it and had a fairly relaxed tone as he had a full compliment of backups on the secondary drive. Upon swapping in a spare primary drive, and prepping fedora, we suddenly found that the IDE drive wouldn't mount, with no warning at all (SMART status indicated all was a-ok).
The drives in Vito's system were:
Maxtor Atlas 73.4GB u160 10k rpm SCSI - 1,200,000 hr. mtbf
Western Digital 80GB 7200 RPM / 8mb cache IDE - 500,000 hr. mtbf
If we assume those numbers are accurate (and negate external factors), we can derive that the likelyhood of loosing both drives in the same day (as these both would appear to have gone this morning) are roughly are just over one in a billion (assuming I did my math correctly, you be the judge). (1,000,000,000 : 0.96...)
As with any solution (raid, non raid, backups, etc.) you base everything on percentages, and likelyhoods, I thought this was a sure bet, but I've obviously been proven wrong, needless to say, Vito should play the lottery tonight if ever.
For input on how to fix the problem (for anyone who has positive input, suggestions, etc. , as it's always welcome in a situation like this), here's a symopsis of the troubleshooting:
- SCSI, loud, not grinding ,heads have not hit the platter as it ran for awhile before it died, where that would have killed it, probably a drive bearing or the motor going south. Tried swapping it, warm starts, different power supply, different controller, cable, system, etc. No dice ,cant get it to detect in BIOS
- IDE, IO errors, tried swapping to different system, (it was
already a slave notably) Booting with just the redhat rescue CD's (Fedora and Redhat 9), seperate power supply, fsck, e2fsck, dd (to raw copy the data to another drive), all without any luck.
Both drives are currently at a local data recovery firm waiting for an estimate, suggestions are still obviously welcome, and if the data recovery firm can do anything within reason, PC will be covering a chunk of it, as we managed this server for Vito, and as I'm sure anyone can understand reading this, I feel as responsible as any of you would.
Anyhow, heres for hope, I'm wishing for good luck, as I'm sure anyone reading this is. Needless to say, this is going to have to bring our recommended backup policy under review, as this is what I've recommended to most users looking for simple and cost effective backups that they dont have to "baby sit".
porcupine 10-27-2004, 07:17 PM Oh incidentally, Adaptec 29160 Controller for the SCSI (PCI for anyone not familiar with the brand) and Onboard IDE controller for the IDE, Tyan Dual MP motherboard, and DDR ECC Registered RAM (in case anyone was considering ECC/fault).
Unknown_User 10-27-2004, 07:20 PM porcupine,
You seem like a good and respectable man, nice to know companies care about there clients and there clients data as much as you do, and to feel partially responcible is enough, but feeling responcible - good man...
Vito, good luck, let us know the outcome
DislexiK
Yes, Myles is truly a top notch host who really puts out for his customers.
Thanks for the detailed post, Myles. I'm glad that you posted all the technical details. I could never have done that, and I am hopeful that someone in the WHT community can possibly shed some light on a solution. Although, I fear that it will now have to depend on the abilities of the data recovery folks.
Staggering over to the bar for his 4th Martini...
Vito
Darth 10-27-2004, 08:02 PM Hi, from here your site is loading :) maby I missed a topic about your drives being restored?
Edit: maby not, I notice the forum is a 404 :(
Myles installed a new hard drive. So I have uploaded my site files so that at least my visitors can see the site. However, we are still waiting for all the db files from the backup drive. Those are the ones that are difficult (impossible) to replace.
Vito
Two-A-T 10-27-2004, 08:08 PM vito said in a submissive tone:
I think I'll take your suggestion from our phone session an hour ago. Not much more I can do at this point.
Wish I was in a financial position to take you up on your suggestion regarding a specific type of business establishment in your area ;)
vito said on his way to the bar:
So I'll just waddle over to my mini bar and pour myself a stiff vodka Martini.
Hic...
Go for it!
It was good to speak with you again, Chuck. You helped me calm down, which was good. Although, the vodka helped me a lot more... ;)
At this point, it's a waiting game to see what we can retrieve from the drives. Not much more can be done.
Vito
Two-A-T 10-27-2004, 08:54 PM Originally posted by vito
It was good to speak with you again, Chuck. You helped me calm down, which was good.
Glad I could help :)
Don't hesitate to call any time!
Good luck in the restoration tomorrow, Vito and Myles.
The restoration guys up in TO do come highly recommended, even if highly strung.
Simon
nickn 10-27-2004, 10:00 PM Agreed, Good luck Vito. It definitely sounds like you guys exhausted every possible option.
I can only hope that we will be able to extract files from (one of the) drives tomorrow. Time now to get some sleep to prepare for a full day tomorrow.
Ah well, at least one thing is going right today. The Red Sox are leading 3-0 in the bottom of the 5th. Looks like they're gonna take the series tonight.
Vito
Two-A-T 10-28-2004, 08:21 AM Originally posted by vito
Staggering over to the bar for his 4th Martini...
HEY VITO...
DON'T FEEL LIKE :crap: , HAVE TO :puke: , OR HAVE A HANGOVER THIS MORNING, DO YOU????
Sorry, was I talking too loud? :D
I hope you have a much better day today and things start to get back to normal! :)
Thanks, Chuck. The day is starting off with a few problems already. While I wait for them to try to recover the data, I uploaded a couple of sites to the new drive, and set up my email for those sites. Email worked fine until about 2:30 this morning. But for some reason, now it is not being delivered. I can see in the Mail Queue in WHM that mail is sitting there (both incoming and outgoing), but it's not being delivered.
:(
Vito
Originally posted by vito
Thanks, Chuck. The day is starting off with a few problems already. While I wait for them to try to recover the data, I uploaded a couple of sites to the new drive, and set up my email for those sites. Email worked fine until about 2:30 this morning. But for some reason, now it is not being delivered. I can see in the Mail Queue in WHM that mail is sitting there (both incoming and outgoing), but it's not being delivered.
:(
Vito
If you need any free admin help feel free to ask vito.
Adam
BTW - you going to HostingCon?
Thanks for the offer, Adam. Myles is on top of it right now.
HostingCon? Yes, I plan on attending (perhaps exhibiting).
Vito
I'll be driving up there, if you don't plan to fly let me know :)
Adam
In all likelihood I'll be flying.
Vito
Pheaton 10-28-2004, 04:15 PM How's the recovery coming along vito?
Very slowly. Should know more in a few hours.
Vito
PHEW!!!
Well, we couldn't have asked for better results. We had taken the drives to one data recovery place and they were going to charge us $4000+ and they wanted 2-3 days. So we decided to grab the drives and take them elsewhere. We ended up at $2700 and same day service.
They were able to retrieve all the files. I just finished restoring all sites via WHM. I was very worried that the backup drive didn't have "current" backup. As it turns out, we were able to access backups from Oct 27. That's about as good as it gets. :D I had a suspicion that the backup drive may have been "broken" for some time, since it seemed implausible that 2 drives could go down within hours of each other. But as it turns out, based on the backups I was able to get off the backup drive, they did indeed go south within hours of each other. Amazing odds. :eek:
Myles, as expected, went the extra mile. He pretty much dedicated his entire day to driving around delivering/picking up the drives, negotiating prices, helping me restore the files. As we speak, I'm still on MSN with him (14 hours after we started our day together) as he sits at the datacenter double-checking everything. Very cool. What a guy. :agree:
After this experience, I feel there is a definite need to address what should be done to avoid such incidents, as well as other issues. Tomorrow, I plan on starting a thread about what it takes to have the "proper" server setup. This will include issues like local vs remote backup, RAID vs no RAID, firewall vs no firewall, and so on. Please look for the thread. I will truly appreciate your input. I trust Myles implicitly. But I also value the vast knowledge that you all have, so it would be great to get a bunch of people sharing their wisdom.
Vito
nickn 10-28-2004, 11:28 PM Glad it all worked out...definitely an expensive lesson though. :)
I'll just have to bump up the price of my tutorials for a few months to cover it... :D
Vito
Originally posted by vito
I'll just have to bump up the price of my tutorials for a few months to cover it... :D
Vito
For sale:
Set of tutorials finely crafted in the heart of Canada. Whisk yourself in to a technical dreamland with our stunning FTP tutorial set.
Asking price: $2700 or nearest offer.
Simon
Something like that, Simon. :emlaugh:
Vito
Great to hear that everything went well for you vito, good things do happen to good people (minus that $2700 bill)...which you could right off as a company expense on taxes ;)
porcupine 10-29-2004, 12:48 AM BTW for anyone who was curious (I dont think Vito will mind), the SCSI drive's bearing was going/going/going/gone and was rated at a 5% chance of recovery, and the price quote was $10,000 - $15,000 minimum). They noted that most 10k RPM SCSI's they find are considerably more difficult to recover, and often "just full of dust" because the heads crash into the platters, and tear themselves apart at 10-15k rpm's. Notably the heads had *not* crashed in this case (as I was there for the drives final moments, 2 feet away, and you can usually tell when that hits).
The IDE Drive had corrupted its firmware. Apparently newer drives store the firmware information directly on the platter instead of seperate chips, and the heads can pass over it and damage it at random just like any other drive sectors. Notably the firmware is *not* duplicated in multiple areas of the disk, simply "2-3 copies in the same system sector area" according to the people I spoke with. Also notably, the drive manufacturers do not provide firmwares upon request, or even for sale to the recovery centers, and change them every few weeks, which seems to be a key factor in getting data back (eg. sheer luck) and the costs involved. The firmware in question apparently acts like the cambers on a car, and lets the factory producing the drive fine-tune where the heads park, read, seek, etc. to achieve maximum performance, and repair minor glitches without having to remanufacture units/change the production line. In this case we were in luck, they had the firmware on hand in their Database (thank God) and this is/was a very common drive obviously.
Anyhow, in the end, whole thing stunk, wasted a few days, and really sucked, the odds of two drives going out like that within what would appear to be a 6-12 hour period are just absolutly rediculously low.
(notably the drives were physically seperated, we do monitor/graph power internally on several devices, have done extensive checks of vito's power supply outs, taken temperature readings [16-17 degrees celcius ambient], and verified there were no recorded input spikes or sags recorded system-wide, etc. etc. etc.)
Mirage-ISP 10-29-2004, 01:27 AM BTW for anyone who was curious (I dont think Vito will mind), the SCSI drive's bearing was going/going/going/gone and was rated at a 5% chance of recovery, and the price quote was $10,000 - $15,000 minimum).
Oh my God, my heart just skipped a couple of beats!
Sorry to hear this Vito
BRMatt 10-29-2004, 01:33 AM I'm glade you got everything working for you Vito. When chuck told me the news I about passed out. Good luck with everything Vito.
"Myles, as expected, went the extra mile. He pretty much dedicated his entire day to driving around delivering/picking up the drives, negotiating prices, helping me restore the files. As we speak, I'm still on MSN with him (14 hours after we started our day together) as he sits at the datacenter double-checking everything. Very cool. What a guy."
It's great to have people on your side when something like this happens. I have to agree that what he did for you was awsome!
:D Great job Myles!:D
nickn 10-29-2004, 02:35 AM Yep, while we all wish this didn't happen, I don't think it could've been in a better provider's hands when it did happen, most would've said "sorry, we'll throw your OS install in for free!"
ExtremeIS 10-29-2004, 02:45 AM Glad to see your back online Vito. I'll be needing some new tutorials soon, I might have to wait till that $2700 special ends though.
Unknown_User 10-29-2004, 05:34 AM Glad to hear it all got sorted, was pretty expensive but if that company didn't sort the data out for you, would you be in a loss big time?
Kind Regards
DislexiK
Stacie 10-29-2004, 09:15 AM Vito what company performed data recovery?
Critical Data Recovery Services - located in Toronto.
Vito
Perfecthost 10-29-2004, 09:20 AM I have been following this thread for a couple of days.
Vito, I am glad that you are back and have recovered your data.
Myles, there will be at least one company considering doing business with you in the future.;) Good job.
-Lamar
Pheaton 10-29-2004, 09:45 AM Originally posted by vito
Critical Data Recovery Services - located in Toronto.
Vito
Great! :D
Now I know where I need to go if something like this happens. :)
Originally posted by Pheaton
Great! :D
Now I know where I need to go if something like this happens. :) You can see what they're all about at http://www.cdrdatarecovery.com/
Vito
Babushka99 10-30-2004, 06:49 PM Vito,
Been following the thread closely. Good to know you're back up online.
Job well done to all those involved.
Babs.
Stacie 10-30-2004, 06:51 PM Thank you Vito.
Sheps 11-01-2004, 01:13 AM CDR is a good company. I once got a quote from them for my WD HD, but thankfully, I uh, fixed it myself... :)
MattF 11-01-2004, 05:43 AM Vito,
I'm glad to hear you were able to get everything back up and running, it must have been a difficult/tiring/stressful few days. We suffered an similar faith back in December with the backup drive failing too so I have some idea of what you went through. Best of luck.
Yes, it wasn't a pleasant couple of days. I guess the kicker was both drives failing at the same time. Otherwise it wouldn't have been a major problem.
I'm just glad I'm not a web host. I don't know how all of you do it. The pressure would get to me.
Vito
elementip 11-03-2004, 06:56 AM It looks like all of this can be attributed to a manufacturing defect. If these drives came out of the same lot (serial numbers are close), I would expect that other drives in the same batch will begin to fail as well.
This situation would be next to impossible to predict. Was the data recovery as expensive as I think?
Yes it was pretty expensive. Suffice to say we are all in the wrong business. We should get into the data recovery business. :eek:
Nope, they were not from the same lot. They were 2 totally different makes, models, etc. Believe it or not, it comes down to freakishly insane odds of 2 drives failing within hours of each other.
Vito
dynamicnet 11-03-2004, 11:38 AM Greetings Vito:
Some tidbits:
1. RAID, especially hardware RAID can be very helpful.
While not a substitute for backup, RAID can save the day.
2. One of the lessons we learned early on that was a hard lesson is that any backup --- EVEN remote backup --- should be tested on a regular basis for the ability to restore.
It is better to test restoring individual files and directories on a regular basis and find failures that need to be corrected rather than finding out during an emergency that the recovery part of the backup and restore doesn't work.
Thank you.
webink 11-03-2004, 05:12 PM if you had RAID0, where there are 2 drives and one is an exact copy of the master drive, how would you restore it?
Would you make the good drive the new master and simply slide in a new formatted HD? How would you make it copy all the data over to the new HD? Is it automatically done or what?
I'm assuming we are using a RAID controller, not software.
--
besides that, having your clients make their own backups and save them to their home/work PCs would have saved a lot of frustration. I hope you can recover your data! :(
Netivex 11-03-2004, 06:26 PM If it makes you feel any better Vito, I had run into this exact situation last year. The primary drive in the server decided to up and fail (80GB WD 8MB Cache). Being that the drives were so cheap, we ended up swapping it out and put a Maxtor 80GB 8MB Cache in its place (quickest thing we could get our hands on). Anyway, client also relied on WHM backup feature w/ unmounting and when we went to pull the backups, the drive didn't work, there was bad sectors / corruption everywhere. I never did get to the bottom of what had caused it, a part of me just thought it was an entire fluke and some extremely bad luck. But the other part of me has always thought that there has been something very wrong with the cPanel backup feature... I haven't used it ever since.
dollah 11-03-2004, 06:50 PM Dear Vito,
I encounter the same problem as you, HDD crashed. It is because the datacenter people fell it when taking out other people server below us.
However, the the bios could not detect the HDD and just hang.
So I use RLinux (for windows) to extract the files from the curropt HDD in my wpersonal windows. It worked, but takes around 1 day to recover all the data.
Some files are MIA, but still we are able to get 97% of the data. The other 3%, we just restore it with older backup, and send sorry mail to the client
Thank you
porcupine 11-03-2004, 07:56 PM Just A FYI guys,
RAID1 is redundancy, RAID0 is not, you loose a drive in a RAID0 array, you loose the array. RAID1 doesen't save you from multiple simultanious drive failures (as it needs at least one functional drive to work on), nor does RAID5 (needs time to rebuild to spares if available, or fails).
Vito did get back 100% of his content AFAIK, and the CPanel backup feature works wonderfully in 99% of the cases. Here, the drive had physically failed (p-list table in the firmware was corrupted). Any backups truly suck when the media they're performed onto fails.
And webink, when a RAID1 array fails (assuming thats what you meant), it totally depends on the controller. Less expensive ones continue to operate, but need you to manually shutdown the server, and replace the drive, then rebuild in BIOS. More expensive ones will require you to shutdown the system, but once the new drive is in, will rebuild in the background while the server is running. Better ones yet allow you to replace drives while the system is 100% live, and rebuild the data in the background. And the next level above that have "hot spare" drives which are used to rebuild a broken array in the background and you can simply replace the failed drive at your leisure.
2uantuM 11-03-2004, 08:42 PM The fact that both drives failed within the same hour makes me wonder if your PSU could be having power fluctuations.
webink 11-03-2004, 09:45 PM yes, sorry, i meant RAID1 - disk mirroring - not RAID0 - disk striping.
FYI,striping provides no data redundancy, like what was said, but it increases performance by being able to rad both drives as if it were one continuous drive.
porcupine 11-03-2004, 09:46 PM Originally posted by 2uantuM
The fact that both drives failed within the same hour makes me wonder if your PSU could be having power fluctuations.
Within a 12 hour period, but no, we tested the PSU, and it seems to be fine, that and no amount of power fluctuation is going to blow a bearing (nor is a dead bearing going to affect power, the circuitry in the scsi will detect a "stick" and shutdown power to the platter motor).
Using the vernacular of Frank Barone from Everyone Loves Raymond, "Holy crap!!!"
Myles, who lives over an hour away from me, just left my home. He drove up and spent an hour and a half sitting in my family room with me explaining all the extra measures he plans to take to better protect my box from any future disasters. And at no additional cost to me.
UNREAL. I don't know about the rest of you. But in my mind, a host that puts in this amount of time/effort to take care of his customers deserves honorable mention. Myles, THANK YOU for the incredible customer service.
:beer:
Vito
nickn 11-09-2004, 10:13 PM RAID...$150
Offsite Backup...$75
60 mile trip....$10
Constant praise on WHT...priceless.
:beer:
Haha, well put, Nick. :emlaugh:
And the praise is well deserved.
Vito
porcupine 11-10-2004, 05:34 AM :blush: I try to do what I can, just like any host I work for the greater good of my customer base. Keeping the most people happy, works well for everyone involved on so many levels (lets face it, happy customers, are inexpensive customers, unhappy customers, ultimately cost you a *lot* more).
Getting this sorted lets me get a good nights sleep too of course, which is really whats priceless (granted coming from me at 4:30am that doesen't indicate it to most people, but this is early for me)!
Mirage-ISP 11-10-2004, 06:49 AM Originally posted by vito
Haha, well put, Nick. :emlaugh:
And the praise is well deserved.
Vito
Its in fact a refreshing change from all the host bashing I've been seeing lately.
And to you Vito, may Any future freakishly insane odds ALWAYS be in your favor :)
StueyB 11-10-2004, 08:17 AM Id buy a lottery ticket if I were you vito, your bound to win !
kloch 11-10-2004, 05:16 PM Originally posted by 2uantuM
The fact that both drives failed within the same hour makes me wonder if your PSU could be having power fluctuations.
If you want to get even more out on a limb there have been several major solar flare events this past week. This includes significant geomagnetic storms. I've heard people susped solar storms for unusual increases in power supply failures so the may be something to this.
Were these servers in Canada? Latitudes closer to the poles experience greater effects during solar storms.
Solar storm info graphs:
http://www.n3kl.org/sun/noaa.html
Edit: I just looked at the original post timestamp and it was 2004-10-27. There wasn't any major geomagnetic activity on that day:
ftp://ftp.sec.noaa.gov/pub/plots/2004_plots/kp/20041027_kp.gif
bizness 11-10-2004, 09:44 PM Listen to what i have done in the past....
Drive one might not be 100% dead.....
do the following
Find ANOTHER drive IDENTICAL to drive one.... NOW, their are two approaches...
first approach... if the drive spins up fine and doesnt have what is called the click of death sound, simply replace the Hard Drive controller of the Hard drive Itself and it will probably fix your problem....
HOWEVER... if you do hear the click of Death, you will need to swap the drive Platters with another DRIVE.... NOTE.... once the switch is done, dont waste any time and copy the data from the drive to another NEW drive as the old drive is on a ticking time bomb as it was exposed to the regular AIR and the Hard Disk cavity now contains Dust and Humidity... Humidity will cause Oxidation of the platters and Dust can cause crators.... Think of a asteroid hitting the earth kind of thing where the asteroid is the dust and the earth is the platter....
NOTE: When you exchange platters, make sure you do it in a closed room under some type of hood to cover any type of wind or breathing on it....
What I do to get around this is i have a chamber i made especial for these kind of jobs as i restore drives for Doctors.... Simply, go to Walmart and purchase a 10-20gallon fish tank... then purchase some kitchen/bathroom gloves.... make a circular hole in tank's top (acrylic) and attach the gloves to it in a fashion that you can put your ands through them .... seal the gloves to the tank by using some qualking and a acrylioc O ring... make another hole that you can seal a vacuum cleaner hose to and attach and seal one to it.... finally, make sure that the seal of the tank is secure.... and walla, you have a airless Vaccuum condition to proceed with care...
This is sorta the idea and what its suppose to look like.... http://www.opri.net/assets/images/autogen/a_Vacuum_Hood.jpg
Good Luck.
porcupine 11-10-2004, 09:57 PM you've got guts, but read up. We ended up forking over the cash to a professional as Vito couldn't risk it :(. Needless to say, neither fixes in this case would have worked (p-list table corruption on the actual platter, just needed to replace the firmware).
Swapping the circuitry *assumedly* would have dumped in a new firmware, with a new (and revised p-list, if they do indeed get changed every 2 weeks or so as we were told) which in turn would have led to the drive heads seeking slightly off the proper sectors, possibly/probably destroying the data in this case.
bizness 11-10-2004, 10:22 PM I have done it many times... The important part is to get a drive that is identical interms of everything.... if you get an identical dirve, the sector counts will be the same and no mis-alignment... Have done this MULTIPLE times during this kind of situation... with no problems at all.
porcupine 11-10-2004, 10:29 PM Originally posted by bizness
I have done it many times... The important part is to get a drive that is identical interms of everything.... if you get an identical dirve, the sector counts will be the same and no mis-alignment... Have done this MULTIPLE times during this kind of situation... with no problems at all.
Not to argue, as I know very little about how it's done (I have an in-depth understanding of how they work granted). I was told the p-list table in the firmware was corrupted, this was used to make minute adjustments to recover from small factory defects in fine-tuning the location of the heads, how far they sweep, etc.
The tech indicated this was stored on the platter in the firmware, and thus swapping the circuitry would do nothing. Based on that, they over-wrote the firmware with a version they had in their database pulled from an identical drive months ago, as they keep that on record, and it worked from there on in.
|