I've been calling into their NOC for updates since about 7am ... our servers started to have problems about 3:45am then at 6am we went into a complete outage situation. Up until then we were troubleshooting our own gear ... i called into the NOC at about 7am and they said there was a peering problem with or between Rogers and Bell ... it's unclear the extent of the issue but it certainly is an issue for DCC (and us as well)
I've called in several times and the last update i got was that they were trying to re-route ... I've noticed since that services worked from my home Rogers connection but no where else (that was about 9am) then about 10 min ago i couldn't connect from Rogers anymore but can from the US ... so obviously there are still routing issues.
Any other customers of DCC have updates from their end they'd like to share?
I have many peeved off clients calling me directly now ... so the more info the better.
I'm at the data centre now - there are about 12 or more clients here I think ... DCC staff just provided an update to one of the guys saying Bell injected a whole bunch of bad routes ... No word on why it's not resolved though.
After almost 18 years of managing BGP connections for ISPs I've never seen this ... This should be solvable quickly ... And we should be getting updates.
The DCC people should have a customer communication plan in place for issues like this.
The fact that I cannot reach anyone for an update (nor their website/email because their DNS is also hosted entirely internally) is bad news. Redundancy doesn't just mean multiple connections.
I started having connectivity problems around 2am with full loss at 5am. According to my contract I am entitled to one day of service for each hour of downtime, so we are at roughly 11 days of service compensation so far.
Can anyone out there confirm that this is going to be resolved today?
Now ... that IS fascinating ... my traceroutes from the US were showing hops with GT in the DNS name ... i immediately thought GroupTelecom or something like that ... but now that we (appear) to be back online i don't see those hops any more ...
We are still checking - some customers seem up but this happened earlier today to so we're not counting the chickens quite yet ...
Another interesting issue ... since this is a BGP issue - why has one of my access ports to the access switch at DCC bounced twice ... seems like someone is rebooting switches ... but what would access layer devices have to do with it ... i still think they are doing stuff ...
We have 2 sites and a VPS for DR name services in the US ... today though may have cost us our largest client - but in the end we too need to beef up the other DC and get even more redundancy - at least our corp services and email was up and running so that we could work today ... very frustrating - but the last place we left over a year ago had a 26 hour outage (4 hour planned maintenance that went wrong) and no updates ... i think this will hopefully be a learning experience for EVERYONE that colos and the colo themselves ...
Especially if this was due to a new upstream being turned online today (as per the press release) ... i'll hold judgement for when i see the incident report and the rebate.