Tuesday, August 9, 2011

Amazon issues with EBS affect Ubuntu images in the EU-WEST region

Note: This blog post has been updated in-place.

We have received information from Amazon that the EBS snapshots for our released 10.04 images from 20110719 were not affected (ami-5c417128 and ami-52417126). It seems that an api issue incorrectly marked them as such. It was an error in our logic that associated snapshot-ids with amis that gave us the incorrect output. The only Ubuntu images that were affected were old daily builds and milestone releases. If you are interested in reading the original message, please do so on the Ubuntu cloud-announce mailing list archives.

We received this morning an automated email[1] from Amazon informing us of possible loss of data in EBS snapshots on the EU-WEST-1 region. Our engineering team immediately started an assessment of the damages this might have caused to the EBS images that we publish for our users. We are working with Amazon to re-mediate customer impact and prevent any future outages.

A number of non-current daily build and old alpha or beta images have been affected, but we hope that no one would have been using these images for production use; we are not planning corrective actions for these images. You can see the full list of AMIs affected at http://paste.ubuntu.com/662210/.

To have this type of announcements sent to your email directly, please subscribe to our ubuntu-cloud-announce mailing list at https://lists.ubuntu.com/mailman/listinfo/ubuntu-cloud-announce..

Our support services are available to help customers of the Ubuntu Advantage Cloud Guest program. Details about this program can be found at http://www.canonical.com/enterprise-services/ubuntu-advantage/cloud

[1] Email received from Amazon on Aug 9 2011 at 9:11 UTC


Hello,

We've discovered an error in the Amazon EBS software that cleans up unused snapshots. This has affected at least one of your snapshots in the EU-West Region.

During a recent run of this EBS software in the EU-West Region, one or more blocks in a number of EBS snapshots were incorrectly deleted. The root cause was a software error that caused the snapshot references to a subset of blocks to be missed during the reference counting process. This process compares the blocks scheduled for deletion to the blocks referenced in customer snapshots. As a result of the software error, the EBS snapshot management system in the EU-West Region incorrectly thought some of the blocks were no longer being used and deleted them. We've addressed the error in the EBS snapshot system to prevent it from recurring.

We have now disabled all of your snapshots that contain these missing blocks. You can determine which of your snapshots were affected via the AWS Management Console or the DescribeSnapshots API call. The status for any affected snapshots will be shown as "error."

We have created copies of your affected snapshots where we've replaced the missing blocks with empty blocks. You can create a new volume from these snapshot copies and run a recovery tool on it (e.g. a file system recovery tool like fsck); in some cases this may restore normal volume operation. These snapshots can be identified via the snapshot Description field which you can see on the AWS Management Console or via the DescribeSnapshots API call. The Description field contains "Recovery Snapshot snap-xxxx" where snap-xxx is the id of the affected snapshot. Alternately, if you have any older or more recent snapshots that were unaffected, you will be able to create a volume from those snapshots without error. For additional questions, you may open a case in our Support Center: https://aws.amazon.com/support/createCase

We apologize for any potential impact this might have on your applications.

Sincerely,
AWS Developer Support

This message was produced and distributed by Amazon Web Services LLC, 410 Terry Avenue North, Seattle, Washington 98109-5210

1 comment:

  1. Very nice informative blog... It is good information on AWS scheduled snapshots. I found this information useful. Thanks for sharing.

    ReplyDelete