Coati is a service built on Amazon EC2 (EC2), which automates troubleshooting of your system. You can start Coati with easy signing up and use it immediately after signing up.
Coati basically monitors services on the operating system running on EC2 instances, and restarts services or instances when it detects service abnormality. The instances and services to monitor can be flexibly set from the GUI administration screen. You can set Coati not to monitor, or set it to only monitor but not to perform recovery depending on the importance of your instances and services.
Coati can take over most of troubleshooting activities that had been handled by engineers based on the information from operation monitoring tools. Coati is a service which greatly contributes to reduction of stem operation costs.
Features
This section describes features of Coati.
- Automatic configuration
Coati automatically detects the EC2 instances to monitor, and then start monitoring the services running on them with only simple configurations. By default, all the EC2 instances that you are running and their services are monitored. Coati can monitor the both Windows and Linux environment. (Please refer to Supported Environment described later for details.) - Specifying monitoring targets
You can specify EC2 instances and services to monitor. - Monitoring (check)
Function to check the status of the monitored services. The services for which automatic startup is set within the OS are monitored. - Recovery (recover)
When detecting that a monitored service is stopped, Coati restarts the service. If the service still does not recover, then Coati restarts the instance. - Notification
Coati can send a notification to the designated email address upon detection of/recovery from failure. - Reporting (to be implemented in the future)
Coati collects system information at the time of service failure/recovery and then notifies an administrator as a report. - Publishing API (to be implemented in the future)
Each function provided by Coati will be published as an API.
How Coati Works
Coati uses the API provided by AWS to automatically detect the EC2 instances to monitor. Then, using the RoleName set by a user and Run Command, Coati performs automatic service detection, failure monitoring and recovery for the instances where the AWS Systems Manager agent (SSM agent) is installed.
Coati’s Behavior
1.Resources Discovery
Coati will recognize the existence of the added service by automatic resource detection performed every 30 minutes. Also, Coati will immediately recognize it if you choose the Resources Discovery button.
2.Basic operation of monitoring and recovery
Coati monitors the state of the services in the EC2 instances, and attempts recovery if there is any failure. Service recovery is performed with following two steps.
- Restarting services with commands
- Restarting instances
Coati first tries to recover the service with commands, but if it cannot recover then restarts the instance. If restoration cannot be confirmed after restarting the instance, Coati restarts the service once again. If it still cannot be recovered, Coati determines that the service cannot be recovered by restarts and send a notification email to the administrator and exclude the service from the monitoring target.
3.Monitoring and recovery of services
How to check and recover the service status depends on the monitored OS.
Below is a summary of monitoring (service status checking) and recovery (restart) method for each OS.
OS |
Service management |
Monitoring (status checking) |
Recovery (restart) |
RHEL6/CentOS6/Amazon Linux |
SysVinit |
service status command |
service --full-restart command |
RHEL7/CentOS7 /Amazon Linux 2 |
systemd |
systemctl is-active command |
systemctl restart command |
Windows Server 2012 R2/2016 |
Windows services |
Get-Service cmdlet |
Start-Service cmdlet |
If monitoring (service status checking) or recovery (restart) command takes more than 30 seconds it will time out. If timeout occurs during recovery (restart), it will be treated as recovery failure.
Using Coati with Auto Recovery
Auto Recovery is a service of AWS that automatically recovers EC2 instances from failure such as hardware failure or failure requiring repair by AWS occurs on the EC2 instances.
Coati strongly recommends you to use Auto Recovery for the monitored instances. Coati and Auto Recovery perform monitoring and automatic recovery for the following troubles respectively.
Coati: Failure of services running on EC2 instances
Auto Recovery: Hardware failure and system failure of AWS
In this way Coati and Auto Recovery monitor different layers. If monitored instances become unavailable due to hardware failure, Coati cannot recover the instances.
For this reason, it is recommended to use Coati and Auto Recovery services together in order to properly monitor and restore from failure at each layer occurred on monitored targets.
When using Auto Recovery together, there is no need to make special settings for Coati. If Auto Recovery is already set for the instances you want to monitor, you can continue to use Auto Recovery as it is. You can set up Auto Recovery even after setting up Coati.
Comments
0 comments
Please sign in to leave a comment.