CloudWatch includes a powerful feature that enables auto recovery of an EC2 instance if it ever fails a system status check. A key benefit of this feature is that it relaunches an instance with the exact same configuration, preserving any auto-assigned public IP addresses and using the current instance volumes.
Every EC2 instance is monitored for two distinct types of status checks that report as metrics to CloudWatch:
- System status checks: These identify AWS infrastructure issues, such as hardware failures, network connectivity loss, or power outages in the data center.
- Instance status checks: These identify software or configuration issues, such as corrupted file systems, incompatible kernels, or exhausted memory.
The auto recovery option specifically targets system status check failures. It enables the automated migration of an instance to a new physical host when the StatusCheckFailed_System metric enters an alarm state.
Requirements and Considerations
- This feature requires VPC EBS-backed instances.
- It is available for the majority of current instance types in all AWS regions.
- Placement Groups: Recovered instances remain in their original placement group.
- Notifications: It is highly recommended to link these alarms to an Amazon SNS topic to receive immediate alerts when a recovery event is triggered.
For the most up-to-date configuration steps, see the official Amazon EC2 Instance Recovery documentation.
NOTICE: All thoughts/statements in this article are mine alone and do not represent those of Amazon or Amazon Web Services. Referenced AWS services are the property of AWS. While I strive for accuracy, I disclaim liability for any disruption caused by errors or omissions.