Page 1 of 2 12 LastLast
Results 1 to 40 of 43
  1. #1

    Amazon EC2, Bitter Experience

    Hi,

    One of our client has two Amazon instances running in EU zone. Both Windows 2003 instances. Recently they had a hardware failure and site was down for almost 5 days. They won't update you even it is a hardware failure form Amazon's side. And in order to get a response from them, we need to purchase premium support ,very funny Ticket response times were very poor .They are providing 99.95% uptime and as per SLA hardware failure on individual instances are not supported. Anybody has any similar experience?
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  2. #2
    Join Date
    Apr 2002
    Location
    Seattle, WA
    Posts
    955
    I have had issues with EBS and of course the big failure earlier this year. I'm not a fan, much better alternatives out there. Amazon has grown too big for their britches so to speak.
    I <3 Linux Clusters

  3. #3
    Hi Brandon,

    And this was amazon reply after 6 hours

    "I have taken a look at i-88f75bff and see that the underlying hardware is in a degraded state. As this instance is not an EBS-backed instance, you will need to launch a replacement instance from your most recent snapshot and terminate this instance. Unfortunately this can happen in an AWS environment, just as unforeseen hardware issues can occur in any physical environment. While we cannot give too many details about our architecture, degraded hardware essentially means that the underlying hardware, aka the host, may have some type of hardware issue, or other pending issue that is causing issues with your instance running on this host. Just as in any physical environment, such as a traditional datacenter, hardware failure is expected. The benefit to utilizing AWS is that we provide a scalable architecture that allows you to build redundant, fault tolerant systems. That said, while we provide the infrastructure, it is still up to our customers to architect their environment with these best practices in mind. "
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  4. #4
    Join Date
    Jan 2011
    Location
    Canada
    Posts
    934
    As this instance is not an EBS-backed instance...

    That pretty much sums it up. Understand that when running in AWS, unless you are EBS-backed, you acknowledge that the VM can and will likely move to a different host when you stop the instance for any reason - whether it be a node failure, like you encountered, or even just by hitting stop.

    This also means you also lose access to ALL your data that was on the old local HDD.

    I wouldn't say they're doing it wrong, but they are doing it differently from other providers, so care and planning do need to be involved when placing stuff on EC2 instances.

    If it is beyond your technical expertise to setup fault tolerance within AWS, you might want to consider Rackspace or a SAN-backed cloud provider. Those operate the closest to a traditional server.

  5. #5
    We have won several clients recently that have had long outages with Amazon. In a way I guess we should be greatful but the bigger picture is that they are the biggest player in the market yet they are starting to give cloud a bad name. The whole reason organisations are moving to the cloud is take advantage of the increased reliability so it's important for the industry that things are done properly.

    If you want reliability you need to look for cloud providers that publish the specifications & configuration of their platform and the service levels they have achieved. Unfortunately, most providers don't provide this level of information.
    ██ Enterprise Class Cloud Hosting And Disaster Recovery. SAN Replication.
    ██ VMware Hosting on HP Blades With NetApp or EqualLogic SAN Storage. 100% Guaranteed Uptime.
    ██ Build Your Own Virtual DataCentre In The Cloud. Fully Integrated With vCenter.
    ██ StratoGen Are An Authorised VMware Partner | StratoGen.net

  6. #6
    Join Date
    Mar 2010
    Location
    Germany
    Posts
    678
    Quote Originally Posted by Stratogen View Post
    We have won several clients recently that have had long outages with Amazon. In a way I guess we should be greatful but the bigger picture is that they are the biggest player in the market yet they are starting to give cloud a bad name. The whole reason organisations are moving to the cloud is take advantage of the increased reliability so it's important for the industry that things are done properly.

    If you want reliability you need to look for cloud providers that publish the specifications & configuration of their platform and the service levels they have achieved. Unfortunately, most providers don't provide this level of information.
    I dont think they're giving cloud a bad name - coming from a HA environment where not a single data loss could be accepted in 20 years, I can say that cloud had *exactly* this bad rep with the people there, even before the failures happened. We chatted about what would have to change to bring matters forward in cloud environments, and so far none of the big providers are tackling them.
    For example the "san" storage most cloud providers are proud using is what we used for LAB systems!

    And in general, when you need good IT hosting and aren't prepared to run really *instanced* applications, you should go someone specialized in HA hosting - i.e. like the guy i'm replying to hehe.


    Amazon is just an online store!!!!!! They have done a very good job on their cloud platform, but it's not intended or fit for running mission critical apps. (and no, reddit and other sites that failed are not "mission critical" hehe)

    If someone has the budget to build his app around the cloud and it's all well-enough designed to run on multiple different EC2-compatible clouds then i'm quite sure this site will exceed anything that normal IT shops can do - but not with putting a few windows servers on normal amazon ec2 instances. (not S3, not EBS, just local disk ... oh well.
    Check out my SSD guides for Samsung, HGST (Hitachi Global Storage) and Intel!

  7. #7
    Quote Originally Posted by wartungsfenster View Post
    I dont think they're giving cloud a bad name - coming from a HA environment where not a single data loss could be accepted in 20 years, I can say that cloud had *exactly* this bad rep with the people there, even before the failures happened.
    Do you really think cloud has a bad name? The cloud is dominating hosting in 2011 so clearly a lot of people have bought in to the idea, and there are real business benefits.

    I guess the issue here is not about cloud per se, but the fact a hosting provider like Amazon can have a 5 day outage. Any company with that amount of downtime is going to find it hard to win new customers.
    ██ Enterprise Class Cloud Hosting And Disaster Recovery. SAN Replication.
    ██ VMware Hosting on HP Blades With NetApp or EqualLogic SAN Storage. 100% Guaranteed Uptime.
    ██ Build Your Own Virtual DataCentre In The Cloud. Fully Integrated With vCenter.
    ██ StratoGen Are An Authorised VMware Partner | StratoGen.net

  8. #8
    Join Date
    Dec 2001
    Location
    Atlanta
    Posts
    4,419
    Quote Originally Posted by Stratogen View Post
    Do you really think cloud has a bad name? The cloud is dominating hosting in 2011 so clearly a lot of people have bought in to the idea, and there are real business benefits.

    I guess the issue here is not about cloud per se, but the fact a hosting provider like Amazon can have a 5 day outage. Any company with that amount of downtime is going to find it hard to win new customers.
    I would agree with this. But I would add further that clients need to understand what they need and what they are getting from each service out there. Offerings like you and I have are not comparable to an amazon - but maybe everyone does not need it either. As long as people get what they need they should be happy.


    if you need something more stable and secure for your business then you may not want to look at an amazon unless you can suffer the issues from time to time. If its just cheap and hourly usage and you can be down from time to time then its probably ok.
    Dedicated Servers
    WWW.NETDEPOT.COM
    Since 2000

  9. #9
    Join Date
    Jan 2011
    Location
    Canada
    Posts
    934
    Quote Originally Posted by Stratogen View Post
    I guess the issue here is not about cloud per se, but the fact a hosting provider like Amazon can have a 5 day outage. Any company with that amount of downtime is going to find it hard to win new customers.
    For the record, the outage affected only a single region (US-East). Specifically, only one availability zone (AZ) in that region was down, while the other AZ suffered overloading/capacity problems as the majority of the affected users tried to spool up instances in the remaining AZ in that region. US-West was still running fine the whole time.

    I figured adding color was appropriate since I keep seeing the same regurgitated headline version of the event and the use of 'outage' in the sense that the entire service was down or unavailable. In reality, its more complicated and was analogous to a DC being down in the traditional sense.

    I'm sure as an HA expert, you do setup systems that span multiple DCs and don't base your SLA on just one geographical area.


    I agree with wartungsfenster - reddit is not a mission critical app. To further this discussion, here's a quick link to High Scalability's big list of articles related to the incident. You'll find that quite a few people survived the 'outage' just fine.

    http://highscalability.com/blog/2011...on-outage.html

  10. #10
    Quote Originally Posted by tchen View Post
    For the record, the outage affected only a single region (US-East). Specifically, only one availability zone (AZ) in that region was down, while the other AZ suffered overloading/capacity problems as the majority of the affected users tried to spool up instances in the remaining AZ in that region. US-West was still running fine the whole time.
    But surely it's not about which platform went down - it's the 5 days to fix it that will be of concern.
    ██ Enterprise Class Cloud Hosting And Disaster Recovery. SAN Replication.
    ██ VMware Hosting on HP Blades With NetApp or EqualLogic SAN Storage. 100% Guaranteed Uptime.
    ██ Build Your Own Virtual DataCentre In The Cloud. Fully Integrated With vCenter.
    ██ StratoGen Are An Authorised VMware Partner | StratoGen.net

  11. #11
    Join Date
    Jan 2011
    Location
    Canada
    Posts
    934
    Quote Originally Posted by Stratogen View Post
    But surely it's not about which platform went down - it's the 5 days to fix it that will be of concern.
    Actually the platform depends greatly. If I was on a dual-DC service, then yes, 5 days exposure is way too much. On a multi-DC service where each region is setup to be fully independent from each other (and within which, each AZ is designed to be independent from each other) then the 5 days shouldn't be even a concern.

    And about the 5 days....

    http://aws.amazon.com/message/65648/

    * April 21 - outage occurs
    * April 22 - 97.8% of EBS volume restored in 9 hrs. 13% of which still in stuck mode due to backplane capacity issues.
    * April 23 - expanded backplane installed for EBS, unsticking EBS vols.
    * April 24 - manual restoration of backups from S3 for remaining 2.2% volumes that were left.

    Ultimately, 0.07% of the volumes could not be restored in a consistent state. There is the timeline. You can compare that against other outages and make your decision there. Of course, the '5-day' outage does make for a better cover story though.

  12. #12
    The biggest concern is that the attitude towards this type of hardware issues..Amazon is selling servers just like mobile phones or TVs..and support is just like purchasing extended warranty They should realize that webhosting is more than that..and business is very critical for customers..
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  13. #13
    Join Date
    Nov 2009
    Posts
    544
    Quote Originally Posted by ideamine View Post
    The biggest concern is that the attitude towards this type of hardware issues..Amazon is selling servers just like mobile phones or TVs..and support is just like purchasing extended warranty They should realize that webhosting is more than that..and business is very critical for customers..
    It is very disheartening to see a response like this from someone advertising themselves as a server manager for hire. One that lists Amazon's EC2 as their area of expertise at that...

    Why would you require a response from support to tell you that the server is down? Simple monitoring should have told you this. Hopefully you did not wait 6 hours to spin up another instance.

    It is probably not Amazon's attitude that should be in question here... You should realize what it takes to keep your customer's business on line.

  14. #14
    Hi,

    Are you from Amazon?

    We are monitoring servers and found that it was down just after 5 min. But if the server is entirely down due to a hardware issue from webhost, what should we do? How long we have to wait for a response?
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  15. #15
    Join Date
    Apr 2002
    Location
    Auckland - New Zealand
    Posts
    1,572
    Amazon provide a good platform, with lots of functionality, bells and whistles and they are perfect for those that know how to create clustered and fail over applications.

    If you need support, then you can purchase it. For the most part, if you have something mission critical, it can live really quite happily on ec2, but you need to build in your HA yourself. They give you the keys, you are driving.

  16. #16
    I second to StevenG, AMazon rocks.

  17. #17
    Join Date
    Feb 2008
    Location
    Wilkes-Barre, PA
    Posts
    1,099
    Quote Originally Posted by Stratogen View Post
    Do you really think cloud has a bad name? The cloud is dominating hosting in 2011 so clearly a lot of people have bought in to the idea, and there are real business benefits.

    I guess the issue here is not about cloud per se, but the fact a hosting provider like Amazon can have a 5 day outage. Any company with that amount of downtime is going to find it hard to win new customers.
    I think 90% of providers offer "cloud" hosting that isn't what most people think they're buying. When I think of cloud, I think of a COMPLETE fault tolerant system. Most "cloud" hosts are offering these little VPS containers that have the ability to expand with next to no fault tolerance, and they're calling them cloud. There is nothing cloud-like about that IMO.
    Last edited by Encrypted; 07-02-2011 at 12:20 PM.
    NEPA Fiber
    AS 394868 - Wilkes-Barre, PA
    █ Fiber Internet, Dedicated Servers, Colocation, Cloud
    99.99% Uptime Guarantee - 24/7/365 Support

  18. #18
    Join Date
    Dec 2001
    Location
    Atlanta
    Posts
    4,419
    Quote Originally Posted by Encrypted View Post
    I think 90% of providers offer "cloud" hosting that isn't what most people think they're buying. When I think of cloud, I think of a COMPLETE fault tolerant system. Most "cloud" hosts are offering these little VPS containers that have the ability to expand with next to no fault tolerance, and they're calling them cloud. There is nothing cloud-like about that IMO.

    Unfortunately you are correct.
    Dedicated Servers
    WWW.NETDEPOT.COM
    Since 2000

  19. #19
    Join Date
    Jan 2011
    Location
    Canada
    Posts
    934
    Quote Originally Posted by ideamine View Post
    Hi,

    Are you from Amazon?

    We are monitoring servers and found that it was down just after 5 min. But if the server is entirely down due to a hardware issue from webhost, what should we do? How long we have to wait for a response?
    srfreeman, StevenG and I are all from Amazon. No, not really.

    Look, there's two ways his thread can continue.

    A) you take that attitude displayed above and ignore anyone who doesn't share your opinion that Amazon sucks. Or....

    B) recognize that you currently are deficient in knowledge about how EC2 works. Realize that an EC2 instance is ephemeral and can easily go away at any time for any number of reasons. This is widely documented in third party blogs, but it is also clearly laid out in the best practices from Amazon themselves.

    You need to build your own HA/failover as StevenG suggest. Considering you're using windows instances, there's likely some legacy support issues preventing you from making any required changes to your application, so your best bet is to use EBS.

    Remember to snapshot your EBS volumes regularly to another AZ (preferably another region too) as the SLA for EBS isn't 100% either. It's high, but there is a very slim chance that an EBS volume can become unmountable. If you have a backup, then its a moot point.

  20. #20
    Join Date
    Dec 2001
    Location
    Atlanta
    Posts
    4,419
    Quote Originally Posted by tchen View Post
    srfreeman, StevenG and I are all from Amazon. No, not really.

    Look, there's two ways his thread can continue.

    A) you take that attitude displayed above and ignore anyone who doesn't share your opinion that Amazon sucks. Or....

    B) recognize that you currently are deficient in knowledge about how EC2 works. Realize that an EC2 instance is ephemeral and can easily go away at any time for any number of reasons. This is widely documented in third party blogs, but it is also clearly laid out in the best practices from Amazon themselves.

    You need to build your own HA/failover as StevenG suggest. Considering you're using windows instances, there's likely some legacy support issues preventing you from making any required changes to your application, so your best bet is to use EBS.

    Remember to snapshot your EBS volumes regularly to another AZ (preferably another region too) as the SLA for EBS isn't 100% either. It's high, but there is a very slim chance that an EBS volume can become unmountable. If you have a backup, then its a moot point.
    don't run mission critical or revenue producing business apps that you count on in EC2 and I think you will be fine with what you are getting for what you are paying for.

    These points above are all good advice.
    Dedicated Servers
    WWW.NETDEPOT.COM
    Since 2000

  21. #21
    Join Date
    Nov 2009
    Posts
    544
    Quote Originally Posted by ideamine View Post
    Hi,

    Are you from Amazon?

    We are monitoring servers and found that it was down just after 5 min. But if the server is entirely down due to a hardware issue from webhost, what should we do? How long we have to wait for a response?
    Really, come now. A server manager...? If you find that a server is down, that would be the time to start up your backup server. This would be the case with any host, not just Amazon (though Amazon makes this easy to do.)

    After your customer is back online, if you are curious about why the original went down, check with support. The support response time is of no consequence.

    As the server manager, your response time in getting your customer back up is the only one that matters.

  22. #22
    As a paying customer, we have the right to know what happened to our servers. " The support response time is of no consequence " Support response time is important for a customer who is paying money for support. We have many client servers running on rackspace cloud and it is not the case.
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  23. #23
    and do you know why major websites like Foursquare, Reddit, Quora, Heroku and Hootsuite offline for more than 24 hours? It is lack of knowledge from respective server managers?
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  24. #24
    Join Date
    Jan 2011
    Location
    Canada
    Posts
    934
    Quote Originally Posted by ideamine View Post
    and do you know why major websites like Foursquare, Reddit, Quora, Heroku and Hootsuite offline for more than 24 hours? It is lack of knowledge from respective server managers?
    Yes.

    From Heroku's own incident report at http://status.heroku.com/incident/151

    In their (Heroku's) own words

    Failures at the IaaS layer will happen. It's Heroku's responsibility to shield our customers from this; part of our value proposition is to abstract away these concerns. We failed at this in a big way this weekend, and our engineers are even as we speak hard at work on architectural changes that will allow us to handle infrastructure outages of this magnitude with less or no disruption to our customers in the future.

    There are three major lessons about IaaS we've learned from this experience:

    1. Spreading across multiple availability zones in single region does not provide as much partitioning as we thought.

    2. Block storage is not a cloud-friendly technology.

    3. Continuous database backups for all.
    From Quora's own engineers...

    Our main database and slave were still both operating in the broken AZ (a mistake that is very clear to me now)
    - emphasis mine.


    To be fair, the only ones that could be arguably absolved from responsibility are those that did span two AZs but sadly only in the same US-east region. That DID take a lot of people by surprise (including Amazon) that EBS wasn't as isolated between those two AZs as envisioned/designed.

    As for the rest of the companies you mentioned above, do a google search. It's easy to find their respective postmortems (or use the High Scalability link I posted). The professionals are quite forthcoming in acknowledging where they failed and need to bolster their knowledge.

  25. #25
    Join Date
    Nov 2009
    Posts
    544
    Quote Originally Posted by ideamine View Post
    As a paying customer, we have the right to know what happened to our servers. " The support response time is of no consequence " Support response time is important for a customer who is paying money for support. We have many client servers running on rackspace cloud and it is not the case.
    Ok, so you feel you have a right to know something. Using the same logic, your customers have a right to have their servers managed.

    It is really just a simple two step process:
    1. Notice server is down.
    2. Start backup server.

    Automating this process is something you as a professional should strive for.

    Calling support (RackSpace, Amazon or any leasing host) to hear what you already know "Yep, server is down, we will get it back up as soon as possible." is not any form of management. It is a curiosity at best. You should not really be asking these types of questions or paying for these types of answers - another type of management...

    Why do you feel that "Support response time is important..." for a server down issue?

  26. #26
    Join Date
    Nov 2009
    Posts
    544
    Quote Originally Posted by ideamine View Post
    and do you know why major websites like Foursquare, Reddit, Quora, Heroku and Hootsuite offline for more than 24 hours? It is lack of knowledge from respective server managers?
    Yes, I do know. It appears that you do to.

    It may just be a typo but it is the answer.

    Surprises are the order of the day in this business, design for system failure and your customers will never know a system failed. The only times your customers notice that you are alive is when you fail.

    After ~30 years in this industry I still find myself saying "Hmm, never seen one do that before." It is just curiosity that makes me wonder why.

  27. #27
    I agree with x86brandon, there are much better alternatives to Amazon. Personally, I'm a fan of Microsoft services. I've been using them for 2 months and I love it.

  28. #28
    Join Date
    Jun 2007
    Location
    Australia
    Posts
    819
    We have a Debian instance running on EC2's US East Coast for ~3 months without any disruption nor downtime.

    16:39:28 up 109 days, 9:52, 1 user, load average: 0.00, 0.00, 0.00


  29. #29
    Join Date
    Jan 2011
    Location
    Canada
    Posts
    934
    While I'm a fan of AWS, please don't mislead the OP and compare single server uptimes on specific EC2 instances.

    Unlike Rackspace or any other major VPS provider, they really don't go out of their way to ensure hardware uptime. SLA is on the service, not individual instances.

  30. #30
    One quick point of fact before I wade into this: During the recent Amazone East outage, the entire east coast infrastructure was inaccessible or compromised, and new servers and storage units could not be copied or deployed for at least 1 day and up to 4 days depending on who you were and in what zone you were running. tchen posted a link, but if you read the actual Amazon write-up, it is clear.

    I had a bunch of machines deployed there, and I know what kind of issues came up. Further, zones were said to be isolated from each other, and Amazon suggested (IMHO) that you don't need more redundancy than deploying in 2 zones in the same region, which turned out to be false, and this was the big issue: the surprise!

    Seems there are strong feelings in some of the posts above, and sorry for the data loss, but this is as simple as two principles:
    1) horses for courses
    2) be aware of your environment

    on #1, Amazon AWS has some great advantages, but thinking that AWS is just like regular hosting is a huge mistake. AWS offers s3 for redundant storage, and EBS volumes for working data, so if your instance data or availability was important, you should have chosen those horses. Choosing an ephemeral storage unit should only be done if you can simply bring up another instance to replace one that's down.

    On #2, if you take some time to research AWS, you will see that how you deploy servers and configure them *must* be different from a regular host if you're going to get the best out of the environment. For instance, you can setup Amazon load balancing and monitoring to instantly deploy a new server when yours stops functioning.

    In this case, saying that Amazon took 5 days to get back to you is like saying it took Dell 5 days to respond to your complaint that the machine doesn't turn on when it's unplugged.

    Ephemeral instances are just that: ephemeral. You should monitor them and be have a machine setup to programatically start another one if the first becomes unavailable.

    And that's the beauty of AWS and other cloud services: the instance itself becomes a programmable element, not an immutable foundation.

    As StevenG said: if you need HA, you have to build it in yourself.
    WholesaleBackup.com
    Online Backup Reseller
    Your Brand, Our Technology: Total Success.
    +1.800-624-9561

  31. #31
    Quote Originally Posted by srfreeman View Post
    Ok, so you feel you have a right to know something. Using the same logic, your customers have a right to have their servers managed.

    It is really just a simple two step process:
    1. Notice server is down.
    2. Start backup server.

    Automating this process is something you as a professional should strive for.

    Calling support (RackSpace, Amazon or any leasing host) to hear what you already know "Yep, server is down, we will get it back up as soon as possible." is not any form of management. It is a curiosity at best. You should not really be asking these types of questions or paying for these types of answers - another type of management...

    Why do you feel that "Support response time is important..." for a server down issue?
    Amen.
    WholesaleBackup.com
    Online Backup Reseller
    Your Brand, Our Technology: Total Success.
    +1.800-624-9561

  32. #32
    " While I'm a fan of AWS, please don't mislead the OP and compare single server uptimes on specific EC2 instances"

    As a paying customer i am only concerned with my server instance and uptime, not other instances on the same zone. As per SLA, in order to get 10% of money back, all instances on the zone should be down for more than .05%. Ir is same like "if our datacenter is completely down, you will get 10% of monthly money back not your single instance for more than 100% " right?
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  33. #33
    Join Date
    Nov 2009
    Posts
    544
    Quote Originally Posted by ideamine View Post
    ...As a paying customer i am only concerned with my server instance and uptime, not other instances on the same zone. As per SLA, in order to get 10% of money back, all instances on the zone should be down for more than .05%. Ir is same like "if our datacenter is completely down, you will get 10% of monthly money back not your single instance for more than 100% " right?
    Well... No. You really need to understand that the EC2 SLA has nothing to do with your individual instances nor even individual data centers. It has to do with the service availability on a regional level. Also, it is based on an annual percentage not monthly.

    The customer is the only one responsible for the availability of instances they create themselves. Effectively, you can write an SLA for each instance you create and reimburse yourself for any mistakes you make. Just make yourself happy... you are the creator.

  34. #34
    By the way: Amazon elected to give large refunds for many affected customers during their outage.
    WholesaleBackup.com
    Online Backup Reseller
    Your Brand, Our Technology: Total Success.
    +1.800-624-9561

  35. #35
    Join Date
    Apr 2011
    Location
    San Francisco, USA
    Posts
    195
    Quote Originally Posted by StevenG View Post
    Amazon provide a good platform, with lots of functionality, bells and whistles and they are perfect for those that know how to create clustered and fail over applications.

    If you need support, then you can purchase it. For the most part, if you have something mission critical, it can live really quite happily on ec2, but you need to build in your HA yourself. They give you the keys, you are driving.
    Agreed, I have a friend who's building an app for his startup, he have 3 instances, not sure how they did the architecture, but I believe that the performance had been rock solid.

    To OP:
    I wish you all the best, if you think Amazon is not a good fit for you, find another reputable host. Try Rackspace or Joyent?
    Rg Enzon, Founder & Designer @ Play Technica
    Building user interfaces for humans.

  36. #36
    Join Date
    Nov 2009
    Posts
    544
    Quote Originally Posted by WholesaleBackup View Post
    By the way: Amazon elected to give large refunds for many affected customers during their outage.
    I'm sure they did, the well publicized outage was regional in nature. Though I stopped reading the reports before the talk of refunds, I am sure they, at least, provided what the SLA's called for. Even 10% of some of the affected customers would have been sizable amounts.

    Were there refunds given that were over and above?

  37. #37
    Hi,

    Amazon has an outage in the EU zone again

    http://status.aws.amazon.com

    Please refer to the Status History below for the prior day's entries

    Aug 8 9:58 AM PDT We have added capacity to EBS and are continuing to make progress in recovering EBS volumes and EC2 instances. We have now also enabled the ability to launch new instance-store backed EC2 instances in the impacted Availability Zone. Customers can still not create new EBS volumes or launch new EBS backed instances in the affected zone, but we anticipate enabling that soon. All functionality remains available in the other Availability Zones.

    Aug 8 12:32 PM PDT We have now enabled the ability to create new EBS backed instances and EBS volumes in the affected zone. API functionality is now fully restored to the affected Availability Zone. We remain focused on recovering the remaining instances and volumes and are continuing to make progress.

    Aug 8 3:11 PM PDT Separately, and independent from the power issue in the affected availability zone, we've discovered an error in the EBS software that cleans up unused snapshots. During a recent run of this EBS software in the EU-West Region, one or more blocks in a number of EBS snapshots were incorrectly deleted. The root cause was a software error that caused the snapshot references to a subset of blocks to be missed during the reference counting process. This process compares the blocks scheduled for deletion to the blocks referenced in customer snapshots. As a result of the software error, the EBS snapshot management system in the EU-West Region incorrectly thought some of the blocks were no longer being used and deleted them. We've addressed the error in the EBS snapshot system to prevent it from recurring. We have now also disabled all of the snapshots that contain these missing blocks.

    We are in the process of creating a copy of the affected snapshots where we've replaced the missing blocks with empty block(s). Customers can then create a volume from that copy and run a recovery tool on it (e.g. a file system recovery tool like fsck); in some cases this may restore normal volume operation. We will email affected customers as soon as we have the copy of their snapshot available. You can tell if you have a snapshot that has been affected via the DescribeSnapshots API or via the AWS Management Console. The status for the snapshot will be shown as "error." Alternately, if you have any older or more recent snapshots that were unaffected, you will be able to create a volume from those snapshots without error. We apologize for any potential impact it might have on customers applications.

    Aug 8 4:26 PM PDT The process of creating copies of data from the affected snapshots is now complete. These snapshots can be identified via the Description field which you can see on the AWS console or via a DescribeSnapshot API call. The Description field contains "Recovery Snapshot snap-xxxx" where snap-xxx is the id of the affected snapshot. Customers can create a volume from that copy and run a recovery tool on it (e.g. a file system recovery tool like fsck); in some cases this may restore normal volume operation. Alternately, if you have any older or more recent snapshots that were unaffected, you will be able to create a volume from those snapshots without error.

    Aug 8 10:01 PM PDT We have now recovered all of the volumes and instances that we could verify were fully consistent at the time of the power outage. For the remaining EBS volumes, we were unable to verify whether or not there were any in-flight writes that did not get consistently saved. As a result, we've now started the process of creating recovery snapshots for all of these EBS volumes that are still unavailable. As these recovery snapshots become available, we will put them in your account. This process is time consuming and we expect these recovery snapshots to start to show up in customers' accounts in the next 6-8 hours, but the process might take up to 24 hours to fully complete. We expect that a large portion of these volumes created from these recovery snapshots will be consistent, but customers will need to verify volume consistency by running a recovery tool on their new volume (e.g. a file system recovery tool like fsck). Volumes that are consistent should be usable by all applications. If your volume is inconsistent, your application's ability to use the volume will depend on how it handles the inconsistency. When we make meaningful progress on delivering recovery snapshots to affected customers, we will post an update here.

    2:53 AM PDT We are continuing to make steady progress in delivering recovery snapshots to affected customers accounts. We will continue to post updates here.

    We are unable to create snapshots and launch a new instance? What is wrong with Amazon?
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  38. #38
    Alot has been written on this thread already about whether Amazon is suitable for production servers - in my opinion you really must build in some kind of resilience yourself by deploying servers on different nodes.
    ██ Enterprise Class Cloud Hosting And Disaster Recovery. SAN Replication.
    ██ VMware Hosting on HP Blades With NetApp or EqualLogic SAN Storage. 100% Guaranteed Uptime.
    ██ Build Your Own Virtual DataCentre In The Cloud. Fully Integrated With vCenter.
    ██ StratoGen Are An Authorised VMware Partner | StratoGen.net

  39. #39
    Our client's instances are EBS backed and we have the data. But as per amazon's instruction, we are unable to take a snapshopt of this volume, then only we can start a new instance in US or Asia zone. Also we can't create a new instance on EU zone. Getting errors..Anybody has similar experience
    Ideamine Technologies
    http://www.ideaminetech.com, sales(a)ideaminetech.com
    Server Management|Outsourced Support|Web Development|Mobile Applications
    Skype: servernix|GTalk: servernix|AIM: servernix

  40. #40
    Join Date
    Aug 2011
    Location
    Dub,Lon,Dal,Chi,NY,LA
    Posts
    1,821
    It seems issues in Dublin are still ongoing for many EC2 customers.

    As has been said previously, relying on single instances or even regions seems to be a flawed approach due to the duration and severity of AWS issues

Page 1 of 2 12 LastLast

Similar Threads

  1. Amazon EC2 Questions?
    By Brandon_R in forum Cloud Hosting
    Replies: 21
    Last Post: 05-14-2011, 01:20 PM
  2. Amazon EC2?
    By Steven F in forum Cloud Hosting
    Replies: 4
    Last Post: 03-14-2011, 06:30 PM
  3. Amazon S3 & Amazon EC2
    By Clone in forum Specialty Hosting and Markets
    Replies: 8
    Last Post: 06-06-2009, 03:23 AM
  4. Amazon EC2
    By txitcs in forum Web Hosting
    Replies: 7
    Last Post: 04-22-2009, 10:39 AM
  5. Amazon EC2
    By Rich in forum Programming Discussion
    Replies: 3
    Last Post: 09-05-2008, 03:09 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •