Web Hosting Talk

Web Hosting Talk (http://www.webhostingtalk.com/index.php)
-   Providers and Network Outages and Updates (http://www.webhostingtalk.com/forumdisplay.php?f=59)
-   -   Colo4 Service Entrance 2 issue (http://www.webhostingtalk.com/showthread.php?t=1072692)


freethought 08-10-2011 03:38 PM

Is the power out on both the A side and the B side, or is there a network issue because a device somewhere upstream of you only sources power from one of the independent feeds?

Tobarja 08-10-2011 03:39 PM

Quote:

Originally Posted by teh_lorax (Post 7620022)
If your server is down, its because your host (or YOU) didn't opt for A/B power. Its an option for... well... when things like this happen!

Quote:

Originally Posted by teh_lorax (Post 7620035)
And you could have that uptime, IF you opted for A/B power. It's pretty simple.

Some customers are reporting being down even with A/B power.

Quote:

Originally Posted by tmax100 (Post 7620025)
Can you please tell me what happened to backup power system? Clients want explanation.

My understanding is grid power and backup power run into this ATS. The ATS is broken, so NO power is passing through it grid or backup. Am I mistaken?

Quote:

Originally Posted by wynnyelle (Post 7620062)
Why isn't it mirrored for disaster recovery? You know, since this is a disaster.

Why didn't you mirror it to another datacenter for disaster recovery?

boskone 08-10-2011 03:40 PM

We csn't tell if our machines are powered or not as, like colo4.com itself until recently, there is no network route past the colo4 edge.

icoso 08-10-2011 03:42 PM

http://www.oncor.com/community/outages/#
https://maps.oncor.com/summary.asp

Formas 08-10-2011 03:43 PM

Quote:

Originally Posted by Colo4-Paul (Post 7620104)
Just wanted to give a quick update that was just given to the customers on site:


We have determined that the repairs for the ATS will take more time than anticipated, so we are putting into service a backup ATS that we have on-site. We are working with our power team to safely bring the replacement ATS into operation. We will update you as soon as we have an estimated time that the replacement ATS will be online.

Later, once we have repaired the main ATS, we will schedule an update window to transition from the temporary power solution. We will provide advance notice and timelines to minimize any disruption to your business.

Again, we apologize for the loss of connectivity and impact to your business. We are working diligently to get things back online for our customers.

LOL. Sorry for LOL paul, but I read this same post in https://accounts.colo4.com/status/ +-1 hour ago.

Seems that 1 hours past and nothing was done.

Dedicatedone 08-10-2011 03:44 PM

We get it, everybody is angry. I, like the majority of these other people following this thread are following it to get updates from Colo4, not to read about your frustration. It's business, you can only hope for the best and plan for the worst. We are currently working on lighting up another facility in Toronto to provide data center redundancy for our cloud clients. I wish we already had this in place, but it happens, welcome to the tech world.

We're all in the same boat here, but let's please keep the posts to a minimum so we can concentrate on updates from Colo4. I hope everything is back up as soon as possible for all of us.

teh_lorax 08-10-2011 03:45 PM

Quote:

Originally Posted by boskone (Post 7620108)
We, for one, have full A/B - we've been down in dallas the whole time.

Suggesting having A/B will 'fix' this is ignorance of the facts.

It's likely due to scenarios like this:

Quote:

Originally Posted by UH-Matt (Post 7619935)
It is a shame for us as we have an original old rack in the old facility with only a handful of servers left in it.

We have a *new* rack in the new "unaffected" facility, and over 30 servers in this rack are down because apparently the network (which c4d setup) is a cross connect to our old rack, rather than its own network. So we have 30 servers in the unaffected building down and nothing that can be done which sucks quite a lot.


MetworkEnterprises 08-10-2011 03:45 PM

Quote:

Originally Posted by teh_lorax (Post 7620035)
And you could have that uptime, IF you opted for A/B power. It's pretty simple.

And now it seems that though A/B servers would have power if they opted and paid for A/B, colo4's own routers are not using A/B so A/B customers would still... well... be down. Epic.

freethought 08-10-2011 03:46 PM

Quote:

Originally Posted by slade (Post 7620116)
My understanding is grid power and backup power run into this ATS. The ATS is broken, so NO power is passing through it grid or backup. Am I mistaken?

That's correct, the ATS acts as an automated switchover between the grid/utility/mains (whatever you want to call it) power and the backup power from the generator(s). It sits in front of the UPS and if the power on one input fails (or is no longer providing the right voltage and frequency etc. such as in a brownout) then it disconnects that input and switches over to the other input.

it has a couple of slightly complicated things to do - signal the auto-start panel for the generator, wait for the generator power output to stabilise and then make sure that one power input is completely disengaged before the other is engaged so that you don't short out hundreds of kilowatts or even several megawatts from two different power sources!

xolotl 08-10-2011 03:47 PM

Edit: Following Dedicatedone's lead, snarkiness removed...

boskone 08-10-2011 03:48 PM

The colo4 network devices at the edge / border and core of our range are offline / unreachable. Packets aren't getting anywhere near our racks.

teh_lorax 08-10-2011 03:49 PM

Quote:

Originally Posted by Formas (Post 7620130)
LOL. Sorry for LOL paul, but I read this same post in accounts.colo4.com/status/ +-1 hour ago.

Seems that 1 hours past and nothing was done.

Yeah, I'm sure they're all sitting around playing Sporcle and laughing.

boskone 08-10-2011 03:51 PM

Power issues at this scale are both complex and lethal.

Everyone needs to take a deep breath and remember that colo4 are the experts, have everyone they need onsite and will fix this.

sunpost 08-10-2011 03:52 PM

Quote:

Originally Posted by Colo4-Paul (Post 7620104)
Just wanted to give a quick update that was just given to the customers on site:


We have determined that the repairs for the ATS will take more time than anticipated, so we are putting into service a backup ATS that we have on-site. We are working with our power team to safely bring the replacement ATS into operation. We will update you as soon as we have an estimated time that the replacement ATS will be online.

Later, once we have repaired the main ATS, we will schedule an update window to transition from the temporary power solution. We will provide advance notice and timelines to minimize any disruption to your business.

Again, we apologize for the loss of connectivity and impact to your business. We are working diligently to get things back online for our customers.

What is preventing you from by-passing the ATS, rather than taking the time to install a backup ATS?

UH-Matt 08-10-2011 03:53 PM

Quote:

Originally Posted by Colo4-Paul (Post 7620104)
We will update you as soon as we have an estimated time that the replacement ATS will be online..

Could we please get this estimate, even if its a very rough estimate... So we have something to aim for? Is this likely to be 30 minutes, 300 minutes or 3000 minutes away?

boskone 08-10-2011 03:53 PM

ATS cant really be bypassed without endangering the systems.

RDx321 08-10-2011 03:56 PM

Any updates on the ETA?

andryus 08-10-2011 03:57 PM

4 Hours of downtime please we need ETA !

Xtrato 08-10-2011 04:00 PM

estiesio -270
 
So they have :

Quote:

Thank you for your patience as we work to address the ATS issue with our #2 service entrance. We apologize for the situation and are working as quickly as possible to restore service.

We have determined that the repairs for the ATS will take more time than anticipated, so we are putting into service a backup ATS that we have on-site as part of our emergency recovery plan. We are working with our power team to safely bring the replacement ATS into operation. We will update you as soon as we have an estimated time that the replacement ATS will be online.

Later, once we have repaired the main ATS, we will schedule an update window to transition from the temporary power solution. We will provide advance notice and timelines to minimize any disruption to your business.

Again, we apologize for the loss of connectivity and impact to your business. We are working diligently to get things back online for our customers. Please expect another update within the hour.

mindnetcombr 08-10-2011 04:00 PM

I bet more 4 or 5 hours of downtime, easy.

I have 23 servers offline.

Eleven2 Hosting 08-10-2011 04:00 PM

Why are they just putting us up for a short time and then going to take it back offline again? This makes no sense. Make the final fix now and don't create a second outage.

Dedicatedone 08-10-2011 04:02 PM

They don't have the equipment right now. Put something up now and get everybody online then work on a permanent solution when you have a proper plan in place, not while you're in an emergency situation.

user5151 08-10-2011 04:03 PM

Quote:

Originally Posted by boskone (Post 7620151)
The colo4 network devices at the edge / border and core of our range are offline / unreachable. Packets aren't getting anywhere near our racks.

I'm glad we're not the only ones that see this... we've posted this twice in the forum to paul, and to our colo4 ticket... 65.99.244.225 is a colo4 HSRP router in front of our firewalls and equipment in colo4, and that colo4 HSRP isn't responding to a ping. So it appears that at least this colo4 HSRP router that's part of their premium managed services, isn't on their own A/B solution as well.

Also... to clarify some confusion...

utilityA
> ATS-A ---- your equipment
generator


utilityB
> ATS-B ---- your equipment
generator


This is what colo4's been referencing... the building that is affected, has two "service entrances"... two points at which they deliver power into the affected building (their other building, with 4 utility entrance points, are not affected).

If you pay for A/B service, you have 2 circuits to your rack that are serviced separately by these A/B service entrances.

If you're not paying for A/B service, you may still have 2 circuits at your rack... but they will trace back to the same service entrance.... so with the service entrance's ATS down, you're completely down.

My problem is that even with A/B, it doesn't matter if the HSRP router (managed by colo4) isn't online either... my equipment could be powered up on the B power circuit, but you can't reach it, because colo4's HSRP router is down... presumably because IT isn't on an A/B service. Still no answer from them on this... but I'm apparenlty not the only colo4 customer who sees this same scenario, based on similar updates here.

Also - colo4 justupdated their site:
----------------------------------------
Current Update

Thank you for your patience as we work to address the ATS issue with our #2 service entrance. We apologize for the situation and are working as quickly as possible to restore service.

We have determined that the repairs for the ATS will take more time than anticipated, so we are putting into service a backup ATS that we have on-site as part of our emergency recovery plan. We are working with our power team to safely bring the replacement ATS into operation. We will update you as soon as we have an estimated time that the replacement ATS will be online.

Later, once we have repaired the main ATS, we will schedule an update window to transition from the temporary power solution. We will provide advance notice and timelines to minimize any disruption to your business.

Again, we apologize for the loss of connectivity and impact to your business. We are working diligently to get things back online for our customers. Please expect another update within the hour.

Xtrato 08-10-2011 04:04 PM

Quote:

Originally Posted by Eleven2 Hosting (Post 7620186)
Why are they just putting us up for a short time and then going to take it back offline again? This makes no sense. Make the final fix now and don't create a second outage.

I think having it fix right now would extend the outage.... they putting into place a temporary ATS so that it will bring all the servers online.. perhaps they will schedule a maintenance time during slow hours of traffic.. peak hours like now are very very inconvenient for everybody , especially like mindnetcombr who has 23 servers offline....

SH-Sam 08-10-2011 04:07 PM

9 servers offline for me right now- got a flood of support tickets!

Hopefully this will be resolved soon!

wynnyelle 08-10-2011 04:08 PM

Quote:

Originally Posted by freethought (Post 7620103)
You simply can't provide complete redundancy on a single power line, there always has to be a device somewhere that is a single point of failure. The laws of physics demand it.

If you could provide full redundancy on a single feed, there would be no reason to have dual feeds.


They only have one power line?

user5151 08-10-2011 04:13 PM

Quote:

Originally Posted by wynnyelle (Post 7620197)
They only have one power line?

no, they have 6 power lines... 2 into this building, 4 into their other building (which isn't affected).

The building that's affected, the ATS for ONE of those power connections (service entrace) has failed. So everyone whose racks have power from that connection is down UNLESS they paid for an apparently optional/upgrade service to have 2 circuits at their rack, serviced SEPARATELY by the two separate power connections (service entrance).

service entrance = where the power company comes into the building.

freethought 08-10-2011 04:14 PM

Quote:

Originally Posted by wynnyelle (Post 7620197)
They only have one power line?

In this case I as referring to "power line" as the A-side supply in the Colo4 facility as jumpoint was calling it a "line".

There are multiple power lines from the utility company to the facility itself. That isn't the issue here.

wynnyelle 08-10-2011 04:15 PM

Quote:

Originally Posted by user5151 (Post 7620210)
no, they have 6 power lines... 2 into this building, 4 into their other building (which isn't affected).

The building that's affected, the ATS for ONE of those power connections (service entrace) has failed. So everyone whose racks have power from that connection is down UNLESS they paid for an apparently optional/upgrade service to have 2 circuits at their rack, serviced SEPARATELY by the two separate power connections (service entrance).

service entrance = where the power company comes into the building.

Then they need to switch to the power lines that are working.

media r 08-10-2011 04:21 PM

Paul, could we please get another update? I'd just like something to send my customers to let them know progress is still being made. A simple status update would be just fine.

freethought 08-10-2011 04:21 PM

Quote:

Originally Posted by wynnyelle (Post 7620220)
Then they need to switch to the power lines that are working.

You can't just "switch" a megawatt of power around like that. The supporting power infrastructure is large and complex and not something that should be played around with on a whim without any proper planning, never mind the fact that working with this level of electricity means one slip and you're dead.

Customers that paid for full A+B redundancy still have power from the B side. Customers that opted to only take a single power feed from the A side will have to wait until the problem with the A side is repaired.

wynnyelle 08-10-2011 04:26 PM

Well I guess that explains how my host screwed up. They made a bad decision that's going to give me some serious food for thought.

boskone 08-10-2011 04:27 PM

That's false info.

Lots of hosts with A/B are down in the DC effected.

Colo4's own core network is also down (which one would imagine uses A+B power)

StevenMoir 08-10-2011 04:27 PM

Beginning of the end?
 
Servers are down for more than 6 hours; Our customers are getting agitated; Is this the beginning of the end for all of US and COLO4?

Steve

soniceffect 08-10-2011 04:29 PM

Just seen on a twitter post someone who just got off phone to colo4. Provisional eta of 1800 CST (around 2.5 hours) ... Ouch.

Garfieldcat5 08-10-2011 04:38 PM

Quote:

Originally Posted by wynnyelle (Post 7620004)
I already threatened to quit my webhost for this. He's said he's quitting colo4 now though so we shall see.

I'd think twice about that, they have a pretty stiff ETF.

DomineauX 08-10-2011 04:38 PM

Quote:

Originally Posted by StevenMoir (Post 7620255)
Servers are down for more than 6 hours; Our customers are getting agitated; Is this the beginning of the end for all of US and COLO4?

Steve

Should only be 4.5 hours down so far.

bear 08-10-2011 04:40 PM

Quote:

Originally Posted by DomineauX (Post 7620284)
Should only be 4.5 hours down so far.

Yup, our bells starting ringing about 12:03 Eastern. 4:40 now.

wynnyelle 08-10-2011 04:44 PM

Doesn't matter, if they offered a solution and my host didn't take that solution then my host had no place advertising themselves to me as having any sort of proof against this kind of thing. Someone's to blame. I wish I really knew who so I could form a plan of action.

iwod 08-10-2011 04:45 PM

Quote:

Originally Posted by teh_lorax (Post 7619977)
Do you people who are screaming for an ETA really not understand how this stuff works? Really? You're IT people, right?

You make your FIRST post since joining in 2005... Welcome :D:D:D

I think most people would rather have Colo4 lied to them they need 8 hours to get it repair rather them knowing nothing.

Although from a client / customer perspective. 8 hours of fixing and not telling them anything makes no different... Those who leave will leave.......

wynnyelle 08-10-2011 04:47 PM

Around noon is when we went down. It was shaping up to be a good, brisk day on my site too. It's become the year's worst disaster.

Patrick 08-10-2011 04:47 PM

Most people will stay. Let's be realistic here, it's not like Colo4 has power outages every month. Yes, there have been a few DDoS attacks in the last couple of months that affected network stability but hardly anything to worry about. People panic and freak out when they start losing money, understandable, and when it comes back online most people will move on and put this behind them.

iwod 08-10-2011 04:48 PM

Quote:

Originally Posted by soniceffect (Post 7619996)
As a customer of one of the webhosts on here I can understand your frustration. I have called my host about 10 times trying to get information and sent emails. However in the same respect as it has been said above, cant always give a timeframe because sometimes ya just dont know.

I think people should just all logged into WHT, we get information here much faster :)

FideiCrux 08-10-2011 04:49 PM

I find it interesting that people are knocking the data-center for their redundancy. Complaints about clients leaving and businesses failing cause of not being able to receive e-mails and such. Where are your redundancy plans?

Contingencies need to be planned for businesses as well, not just data-centers. Need your e-mail just in case your data-center gets DDoS'd or Power goes out. Setup a backup domain.

Just as one can't plan for the weather, make it into work when it snows for that day (thus not getting paid), plan for disasters. -.-

JDonovan 08-10-2011 04:50 PM

Downage
 
We have A/B power on multiple servers and they are all still down. We are in the older facility. Even if we didn't have this addon, Colo4's statement regarding this was a poor excuse.

We will be leaving Colo4 after this fiasco. There is no way an outage should last this long from a power failure, that's why you have backup plans. Acts of God are understandable but not this.

There are some major clients who are affected by this right now. An example is Radiant Systems. Tens of thousands of restaurants have major parts of their systems down. Other major clients are experiencing the same thing.

Patrick 08-10-2011 04:50 PM

Quote:

Originally Posted by FideiCrux (Post 7620310)
I find it interesting that people are knocking the data-center for their redundancy. Complaints about clients leaving and businesses failing cause of not being able to receive e-mails and such. Where are your redundancy plans?

Contingencies need to be planned for in businesses as well, not just data-centers. Need your e-mail just in case your data-center gets DDoS'd or Power goes out. Setup a backup domain.

Just as one can't plan for the weather, make it into work when it snows for that day (thus not getting paid), plan for disasters. -.-

Take your logic and go elsewhere! This is WHT, logic often gets thrown out the window. :D

soniceffect 08-10-2011 04:50 PM

Mine is back up!

layer0 08-10-2011 04:50 PM

Quote:

Originally Posted by wynnyelle (Post 7620299)
Around noon is when we went down. It was shaping up to be a good, brisk day on my site too. It's become the year's worst disaster.

If a few hours of downtime is the year's worst disaster for you, then you really need to look into investing in geographic redundancy. To people that want to move - you can keep moving from provider to provider every time there's an outage, it's really not going to do you much good.

An event like this is a rare occurrence at Colo4 and not something I'd pack up and leave over.

FideiCrux 08-10-2011 04:53 PM

Quote:

Originally Posted by Patrick (Post 7620312)
Take your logic and go elsewhere! This is WHT, logic often gets thrown out the window. :D

WTB Logic!!!

ASIMichael 08-10-2011 04:55 PM

We have been a customer of colo4 for over 4yrs now. We are HAPPY.. This was a mechanical electric switch ( ATS ) failure. If customers have A-B Service feeds electrically to there cabinets AND wired correctly to the servers (dual power supplies) they are working. I DO. I paid for it.

Folks dont think of WHAT IF.
asimb

wynnyelle 08-10-2011 04:55 PM

Quote:

Originally Posted by JDonovan (Post 7620311)
We have A/B power on multiple servers and they are all still down. We are in the older facility. Even if we didn't have this addon, Colo4's statement regarding this was a poor excuse.

We will be leaving Colo4 after this fiasco. There is no way an outage should last this long from a power failure, that's why you have backup plans. Acts of God are understandable but not this.

There are some major clients who are affected by this right now. An example is Radiant Systems. Tens of thousands of restaurants have major parts of their systems down. Other major clients are experiencing the same thing.

now i'm wondering if it's my host or Colo4 who is at fault in this. I just want the truth.

Patrick 08-10-2011 04:57 PM

Quote:

Originally Posted by wynnyelle (Post 7620330)
now i'm wondering if it's my host or Colo4 who is at fault in this. I just want the truth.

What do you mean? It's Colo4's fault... your host doesn't control the power infrastructure there. Even if they paid for A/B dual power feeds, it doesn't mean they would have service by now... there are some people who have dual feeds that are offline.

FideiCrux 08-10-2011 04:57 PM

Wikipedia on High Availability...

Please note the following...

"Availability is usually expressed as a percentage of uptime in a given year.:

wynnyelle 08-10-2011 04:58 PM

That's right, I was going on what someone said before that if my or any client had paid for A/B they would not be suffering this loss right now. Come to find out that isn't true.

nightmarecolo4 08-10-2011 05:01 PM

Paul-- Do you think we will be back online within teh next 2 hours? Can you ask the engineers?

wynnyelle 08-10-2011 05:02 PM

Any word yet? We're waiting. And I'm moving past where I can keep waiting. I have to get my site up, however I do that.

MtnDude 08-10-2011 05:02 PM

On geographical redundancy
 
On a web server serving large amounts of static data, what is the best way to achieve geographical redundancy? Is Round robin DNS the way to go? What other alternatives exist? A load balancer would still be a single point of failure.

bostongio 08-10-2011 05:05 PM

Use a CMS like Wordpress that can host cached content on a cloud service like AWS. Very inexpensive and works automatically.

xnpu 08-10-2011 05:05 PM

Quote:

Originally Posted by MtnDude (Post 7620357)
On a web server serving large amounts of static data, what is the best way to achieve geographical redundancy? Is Round robin DNS the way to go? What other alternatives exist? A load balancer would still be a single point of failure.

I think you better open a separate topic for that.

cartika-andrew 08-10-2011 05:05 PM

I am not going to leave a provider over this - these things happen. We have been dealing with colo4 for years and have had good results for a long time. Stuff happens..

Having said this, I am a little upset at Paul over his comments here. This power outage is clearly something colo4 needs to deal with. Insinuating that customers are at fault for not having A+B power feeds is not reasonable. Firstly, Colo4's rates for A+B are not really inline with other facilities we work with. Secondly, we have A+B on some of our infrastructure with colo4 - and some of it is up - yes - but, some of it is also down. What this means is that some PDUs are serviced from the same power plant - so, even though we have A+B protection on parts of our infrastructure, the feeds are coming from different PDUs(but the same powerplant) - and we are still down.

I just do not think it is appropriate to suggest that this may be colo4's customers fault because they dont have A+B. This is something that should have been discussed with your customers - not publicly. I am now answering questions about where we have A+B and where we dont and why - and frankly - the issue is colo4 lost a power plant.. lets try and remember that..


All times are GMT -4. The time now is 09:35 AM.

Powered by vBulletin
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
User Alert System provided by Advanced User Tagging (Pro) - vBulletin Mods & Addons Copyright © 2014 DragonByte Technologies Ltd.
© WebHostingTalk, 1998 - 2014. All Rights Reserved.