What we learned about cloud security running a SaaS in AWS for 5 years - Part 5 - EC2 Instance Security

This is Part 5 of a multi-part series of posts on how we securely ran ThreatSim in AWS for 5 years and never lost a customer (that we know of) due to any cloud security concerns.

There are tons of resources for all kinds of host security. In this post, we focused on EC2-specific controls.

Patch Management

Description: Ensure security patches are applied in a timely manner.

Why it's important: Some operating systems allow automatic security updates to be applied. If possible, enable this feature. If the organization or application is sensitive to interruption, ensure that a process is in place for the application of security patches in a way that doesn't impact the availability of the system.

Vulnerability Management

Description: Ensure vulnerability management does not rely on external network connectivity into the VPC to assess the host.

Why it's important: Traditional vulnerability scanners operate over the network, scanning target networks for live hosts, and then scanning the ports and services found to be available. In EC2, it is likely that a significant amount of devices operate behind NAT gateways or on subnets that are not reachable. In these cases it is helpful if the organization's vulnerability scanning solution is agent-based. This way, an AMI can be launched from any environment where it will initiate the connection out to the management host. This model simplifies the deployment since it only requires allowing (usually) a single port out to the management host which should be located on a network outside of the target network and AWS account.

Immutable: Patch Management

Description: Ensure that AMI build processes incorporate OS updates.

Why it's important: When AMIs are built or "baked", ensure that the process or tools used incorporate OS updates so that the resulting EC2 instances are launched with a strong security posture.

Immutable: Security Controls

Description: Ensure that AMI build processes incorporate security controls.

Why it's important: If the organization uses Amazon Machine Images (AMIs), ensure that the AMI build process incorporates any installed security tools (e.g. HIDS, log forwarders, vulnerability scanning agents, etc.). This ensures that as new EC2 instances are launched from AMIs that security and operational tools are included.

Immutable Secure Secret Storage

Description: Do not store sensitive data (e.g. database connection strings, encryption keys, API keys, etc.) on AMIs.

Why it's important: Store sensitive data in an encrypted, central repository so that secrets are not stored within AMIs. For example, store the application's database connection information encrypted in a location that is called when the instance is launched. As the instance boots, use the EC2 instance's IAM role to call KMS to decrypt the secret file.

Clear-text Protocols

Description: Should be the exception

Why it's important: Wherever possible do not use clear-text protocols to transfer data within AWS. Specifically, do not use clear-text protocols for any administrative access (e.g. telnet, etc.)

Immutable Forensic Capability

Description: Ensure that forensic evidence is preserved, even when an immutable infrastructure is used.

Why it's important: Applications that are immutable (where compute instances are disposable, all application state is stored in a high-availability static data store on the EC2 instance storage) should use a system that preserves critical forensic information. For example, if an attacker compromises an EC2 instance, and instance is terminated as part of normal operations, security responders will need to conduct hard drive forensics to recover artifacts such as stolen data, rootkits, exploits used, log messages, etc. One approach is to configure regular EBS snapshots that are taken of the running instances and stored for a reasonable period of time (e.g. 90 days). Given the low price of AWS storage, an approach like this is not cost-prohibitive.

Centralized Log Storage

Description: Forward OS and application logs to a central log collector.

Why it's important: Configure all capable devices within the production environment to forward application, OS, and other critical logs to a central log collection server. This may be a central syslog server, log aggregation solution (e.g. ELK Stack, Splunk, etc.) or a SIEM.

Standard Build

Description: Follow a standard build process.

Why it's important: Implement a standard host build process so that only approved software packages are installed on the device. A standard build process ensures that changes to the script or template used to build AMIs is approved and any changes tracked.

Immutable Security Controls

Description: Ensure that critical security event and forensic data is preserved beyond instance termination.

Why it's important: Ensure that system logs, process information, kernel modules, and other critical and security-relevant events are stored so that security investigations are still possible, even if the EC2 instance has been terminated. Some 3rd party solutions perform granular system process tracking, child/parent process tracking, loaded modules, binary hash logging, etc. These solutions send the data in real time to a central server. Controls such as these are invaluable in a situation where a security event is detected, but the machine in scope of the event is terminated. AMI build processes should incorporate these and other (e.g. log forwarders, HIDS, vulnerability scanning agents, anti-virus, etc.) security controls so that new AMIs incorporate these controls.

Time Synchronization

Description: Ensure that all EC2 instances use NTP to synchronize time from an accurate source.

Why it's important: Given that computers running in virtualized environments are more susceptible to clock drift, it is important to ensure that devices perform NTP synchronization on a frequent basis.