Page 1 of 3 123 LastLast
Results 1 to 15 of 43
  1. #1

    Amazon EC2, Bitter Experience

    Hi,

    One of our client has two Amazon instances running in EU zone. Both Windows 2003 instances. Recently they had a hardware failure and site was down for almost 5 days. They won't update you even it is a hardware failure form Amazon's side. And in order to get a response from them, we need to purchase premium support ,very funny Ticket response times were very poor .They are providing 99.95% uptime and as per SLA hardware failure on individual instances are not supported. Anybody has any similar experience?
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  2. #2
    Join Date
    Apr 2002
    Location
    Seattle, WA
    Posts
    947
    I have had issues with EBS and of course the big failure earlier this year. I'm not a fan, much better alternatives out there. Amazon has grown too big for their britches so to speak.
    I <3 Linux Clusters

  3. #3
    Hi Brandon,

    And this was amazon reply after 6 hours

    "I have taken a look at i-88f75bff and see that the underlying hardware is in a degraded state. As this instance is not an EBS-backed instance, you will need to launch a replacement instance from your most recent snapshot and terminate this instance. Unfortunately this can happen in an AWS environment, just as unforeseen hardware issues can occur in any physical environment. While we cannot give too many details about our architecture, degraded hardware essentially means that the underlying hardware, aka the host, may have some type of hardware issue, or other pending issue that is causing issues with your instance running on this host. Just as in any physical environment, such as a traditional datacenter, hardware failure is expected. The benefit to utilizing AWS is that we provide a scalable architecture that allows you to build redundant, fault tolerant systems. That said, while we provide the infrastructure, it is still up to our customers to architect their environment with these best practices in mind. "
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  4. #4
    Join Date
    Jan 2011
    Location
    Canada
    Posts
    934
    As this instance is not an EBS-backed instance...

    That pretty much sums it up. Understand that when running in AWS, unless you are EBS-backed, you acknowledge that the VM can and will likely move to a different host when you stop the instance for any reason - whether it be a node failure, like you encountered, or even just by hitting stop.

    This also means you also lose access to ALL your data that was on the old local HDD.

    I wouldn't say they're doing it wrong, but they are doing it differently from other providers, so care and planning do need to be involved when placing stuff on EC2 instances.

    If it is beyond your technical expertise to setup fault tolerance within AWS, you might want to consider Rackspace or a SAN-backed cloud provider. Those operate the closest to a traditional server.

  5. #5
    We have won several clients recently that have had long outages with Amazon. In a way I guess we should be greatful but the bigger picture is that they are the biggest player in the market yet they are starting to give cloud a bad name. The whole reason organisations are moving to the cloud is take advantage of the increased reliability so it's important for the industry that things are done properly.

    If you want reliability you need to look for cloud providers that publish the specifications & configuration of their platform and the service levels they have achieved. Unfortunately, most providers don't provide this level of information.
    ██ Enterprise Class Cloud Hosting And Disaster Recovery. SAN Replication.
    ██ VMware Hosting on HP Blades With NetApp or EqualLogic SAN Storage. 100% Guaranteed Uptime.
    ██ Build Your Own Virtual DataCentre In The Cloud. Fully Integrated With vCenter.
    ██ StratoGen Are An Authorised VMware Partner | StratoGen.net

  6. #6
    Join Date
    Mar 2010
    Location
    Germany
    Posts
    646
    Quote Originally Posted by Stratogen View Post
    We have won several clients recently that have had long outages with Amazon. In a way I guess we should be greatful but the bigger picture is that they are the biggest player in the market yet they are starting to give cloud a bad name. The whole reason organisations are moving to the cloud is take advantage of the increased reliability so it's important for the industry that things are done properly.

    If you want reliability you need to look for cloud providers that publish the specifications & configuration of their platform and the service levels they have achieved. Unfortunately, most providers don't provide this level of information.
    I dont think they're giving cloud a bad name - coming from a HA environment where not a single data loss could be accepted in 20 years, I can say that cloud had *exactly* this bad rep with the people there, even before the failures happened. We chatted about what would have to change to bring matters forward in cloud environments, and so far none of the big providers are tackling them.
    For example the "san" storage most cloud providers are proud using is what we used for LAB systems!

    And in general, when you need good IT hosting and aren't prepared to run really *instanced* applications, you should go someone specialized in HA hosting - i.e. like the guy i'm replying to hehe.


    Amazon is just an online store!!!!!! They have done a very good job on their cloud platform, but it's not intended or fit for running mission critical apps. (and no, reddit and other sites that failed are not "mission critical" hehe)

    If someone has the budget to build his app around the cloud and it's all well-enough designed to run on multiple different EC2-compatible clouds then i'm quite sure this site will exceed anything that normal IT shops can do - but not with putting a few windows servers on normal amazon ec2 instances. (not S3, not EBS, just local disk ... oh well.
    Check out my SSD guides for Samsung, HGST (Hitachi Global Storage) and Intel!

  7. #7
    Quote Originally Posted by wartungsfenster View Post
    I dont think they're giving cloud a bad name - coming from a HA environment where not a single data loss could be accepted in 20 years, I can say that cloud had *exactly* this bad rep with the people there, even before the failures happened.
    Do you really think cloud has a bad name? The cloud is dominating hosting in 2011 so clearly a lot of people have bought in to the idea, and there are real business benefits.

    I guess the issue here is not about cloud per se, but the fact a hosting provider like Amazon can have a 5 day outage. Any company with that amount of downtime is going to find it hard to win new customers.
    ██ Enterprise Class Cloud Hosting And Disaster Recovery. SAN Replication.
    ██ VMware Hosting on HP Blades With NetApp or EqualLogic SAN Storage. 100% Guaranteed Uptime.
    ██ Build Your Own Virtual DataCentre In The Cloud. Fully Integrated With vCenter.
    ██ StratoGen Are An Authorised VMware Partner | StratoGen.net

  8. #8
    Join Date
    Dec 2001
    Location
    Atlanta
    Posts
    4,419
    Quote Originally Posted by Stratogen View Post
    Do you really think cloud has a bad name? The cloud is dominating hosting in 2011 so clearly a lot of people have bought in to the idea, and there are real business benefits.

    I guess the issue here is not about cloud per se, but the fact a hosting provider like Amazon can have a 5 day outage. Any company with that amount of downtime is going to find it hard to win new customers.
    I would agree with this. But I would add further that clients need to understand what they need and what they are getting from each service out there. Offerings like you and I have are not comparable to an amazon - but maybe everyone does not need it either. As long as people get what they need they should be happy.


    if you need something more stable and secure for your business then you may not want to look at an amazon unless you can suffer the issues from time to time. If its just cheap and hourly usage and you can be down from time to time then its probably ok.
    Dedicated Servers
    WWW.NETDEPOT.COM
    Since 2000

  9. #9
    Join Date
    Jan 2011
    Location
    Canada
    Posts
    934
    Quote Originally Posted by Stratogen View Post
    I guess the issue here is not about cloud per se, but the fact a hosting provider like Amazon can have a 5 day outage. Any company with that amount of downtime is going to find it hard to win new customers.
    For the record, the outage affected only a single region (US-East). Specifically, only one availability zone (AZ) in that region was down, while the other AZ suffered overloading/capacity problems as the majority of the affected users tried to spool up instances in the remaining AZ in that region. US-West was still running fine the whole time.

    I figured adding color was appropriate since I keep seeing the same regurgitated headline version of the event and the use of 'outage' in the sense that the entire service was down or unavailable. In reality, its more complicated and was analogous to a DC being down in the traditional sense.

    I'm sure as an HA expert, you do setup systems that span multiple DCs and don't base your SLA on just one geographical area.


    I agree with wartungsfenster - reddit is not a mission critical app. To further this discussion, here's a quick link to High Scalability's big list of articles related to the incident. You'll find that quite a few people survived the 'outage' just fine.

    http://highscalability.com/blog/2011...on-outage.html

  10. #10
    Quote Originally Posted by tchen View Post
    For the record, the outage affected only a single region (US-East). Specifically, only one availability zone (AZ) in that region was down, while the other AZ suffered overloading/capacity problems as the majority of the affected users tried to spool up instances in the remaining AZ in that region. US-West was still running fine the whole time.
    But surely it's not about which platform went down - it's the 5 days to fix it that will be of concern.
    ██ Enterprise Class Cloud Hosting And Disaster Recovery. SAN Replication.
    ██ VMware Hosting on HP Blades With NetApp or EqualLogic SAN Storage. 100% Guaranteed Uptime.
    ██ Build Your Own Virtual DataCentre In The Cloud. Fully Integrated With vCenter.
    ██ StratoGen Are An Authorised VMware Partner | StratoGen.net

  11. #11
    Join Date
    Jan 2011
    Location
    Canada
    Posts
    934
    Quote Originally Posted by Stratogen View Post
    But surely it's not about which platform went down - it's the 5 days to fix it that will be of concern.
    Actually the platform depends greatly. If I was on a dual-DC service, then yes, 5 days exposure is way too much. On a multi-DC service where each region is setup to be fully independent from each other (and within which, each AZ is designed to be independent from each other) then the 5 days shouldn't be even a concern.

    And about the 5 days....

    http://aws.amazon.com/message/65648/

    * April 21 - outage occurs
    * April 22 - 97.8% of EBS volume restored in 9 hrs. 13% of which still in stuck mode due to backplane capacity issues.
    * April 23 - expanded backplane installed for EBS, unsticking EBS vols.
    * April 24 - manual restoration of backups from S3 for remaining 2.2% volumes that were left.

    Ultimately, 0.07% of the volumes could not be restored in a consistent state. There is the timeline. You can compare that against other outages and make your decision there. Of course, the '5-day' outage does make for a better cover story though.

  12. #12
    The biggest concern is that the attitude towards this type of hardware issues..Amazon is selling servers just like mobile phones or TVs..and support is just like purchasing extended warranty They should realize that webhosting is more than that..and business is very critical for customers..
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  13. #13
    Join Date
    Nov 2009
    Posts
    544
    Quote Originally Posted by ideamine View Post
    The biggest concern is that the attitude towards this type of hardware issues..Amazon is selling servers just like mobile phones or TVs..and support is just like purchasing extended warranty They should realize that webhosting is more than that..and business is very critical for customers..
    It is very disheartening to see a response like this from someone advertising themselves as a server manager for hire. One that lists Amazon's EC2 as their area of expertise at that...

    Why would you require a response from support to tell you that the server is down? Simple monitoring should have told you this. Hopefully you did not wait 6 hours to spin up another instance.

    It is probably not Amazon's attitude that should be in question here... You should realize what it takes to keep your customer's business on line.

  14. #14
    Hi,

    Are you from Amazon?

    We are monitoring servers and found that it was down just after 5 min. But if the server is entirely down due to a hardware issue from webhost, what should we do? How long we have to wait for a response?
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  15. #15
    Join Date
    Apr 2002
    Location
    Auckland - New Zealand
    Posts
    1,572
    Amazon provide a good platform, with lots of functionality, bells and whistles and they are perfect for those that know how to create clustered and fail over applications.

    If you need support, then you can purchase it. For the most part, if you have something mission critical, it can live really quite happily on ec2, but you need to build in your HA yourself. They give you the keys, you are driving.

Page 1 of 3 123 LastLast

Similar Threads

  1. Amazon EC2 Questions?
    By Brandon_R in forum Cloud Hosting
    Replies: 21
    Last Post: 05-14-2011, 01:20 PM
  2. Amazon EC2?
    By Steven F in forum Cloud Hosting
    Replies: 4
    Last Post: 03-14-2011, 06:30 PM
  3. Amazon S3 & Amazon EC2
    By Clone in forum Specialty Hosting and Markets
    Replies: 8
    Last Post: 06-06-2009, 03:23 AM
  4. Amazon EC2
    By txitcs in forum Web Hosting
    Replies: 7
    Last Post: 04-22-2009, 10:39 AM
  5. Amazon EC2
    By Rich in forum Programming Discussion
    Replies: 3
    Last Post: 09-05-2008, 03:09 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •