In this article I will go around the fundamentals of EC2 Spot Instances, what we can gain from them, when to take advantage of them, and when to avoid them.

A Short Story Before…

Let’s say you’re visiting a city where you wish to stay for the night. You search for a hotel, and you find the only one in the city that is too expensive for you. Luckily, the hotel offers a 60% discount if you consider the following deal:
You can occupy any reserved room as long as the official guest of the room is absent. You can use the resources that this room provides (bed, shower, electricity etc.). However, whenever the official guest of the room comes back, you must leave the room, and the hotel will find you another reserved yet unoccupied room for you to stay. If we assume that there is always a room whose guest is absent (and if we ignore the clear hygiene/security problems of sharing rooms with strangers) then you will be able to always be under a roof at the only cost of interruptions (when the guest of the room comes back).

If you understand this strategy, then you understand how EC2 Spot works. The hotel is AWS, the guest is you, the rooms are EC2 computing resources, and the deal of sharing unused resources is EC2 Spot. Let’s now clearly define EC2 Spot.

Formal Definition of EC2 Spot

A Spot Instance is an instance that uses spare EC2 capacity that is available for less than the On-Demand price. Because Spot Instances enable you to request unused EC2 instances at steep discounts, you can lower your Amazon EC2 costs significantly.

Spot Instances are a cost-effective choice if you can be flexible about when your applications run and if your applications can be interrupted.

Spot Instances can be interrupted by Amazon EC2, with two minutes of notification, when Amazon EC2 needs the capacity back.

The following are possible reasons for Spot instance interruption:

Capacity: Amazon EC2 can interrupt your Spot Instance when it needs it back. EC2 reclaims your instance mainly to repurpose capacity, but it can be due to host maintenance or hardware decommission.
Price: The Spot price is higher than the maximum price you specified. (That’s why specifying a maximum price will result in more interruptions)
Constraints: If your Spot request includes a constraint such as a launch group or an Availability Zone group, the Spot Instances are terminated as a group when the constraint can no longer be met.

While running, Spot instances are the same as On-Demand machines.

The only differences are the following:

Spot offers no guarantee on the time your instance will be able to run for. You can be interrupted in the middle of a process with a two-minute grace period.
Spot offers no guarantee on the availability of the instances you requested. That’s why you should specify a set of desired instances.
Spot’s past performance is no guarantee of future results since interruption and capacity vary according to supply and demand.

Pricing

EC2 instances can cost you a lot… — https://twitter.com/gregocoder/status/1680952244198592513

The price of a Spot instance can be up to 90% less than the On-Demand price.

Spot Instance prices are set by Amazon EC2 and adjust gradually based on long-term trends in supply and demand for Spot Instance capacity. Since those prices are not set in stone, it’s recommended that you inform yourself about the current price of a Spot instance before running it.

When your Spot request is fulfilled, your Spot Instances launch at the current Spot price, not exceeding the On-Demand price. For you to have an impression of the price ratio On-Demand/Spot instances, I have gathered here a table indicating the current prices (February 2024) of the popular T4 instance types.

╔═══════════════╦═════════════════╦════════════╗
║ Instance Type ║ On-Demand Price ║ Spot Price ║
╠═══════════════╬═════════════════╬════════════╣
║ t4g.nano      ║ $0.0042         ║ $0.0029    ║
║ t4g.micro     ║ $0.0084         ║ $0.0038    ║
║ t4g.small     ║ $0.0168         ║ $0.0076    ║
║ t4g.medium    ║ $0.0336         ║ $0.0154    ║
║ t4g.large     ║ $0.0672         ║ $0.0314    ║
║ t4g.xlarge    ║ $0.1344         ║ $0.0593    ║
║ t4g.2xlarge   ║ $0.2688         ║ $0.1172    ║
╚═══════════════╩═════════════════╩════════════╝

We observe that the Spot instance are about 50% less expensive than the On-Demand instance.

To consult the current Spot Instance prices, check this link: Amazon EC2 Spot Instances Pricing.

When To Use Spot Instances

Spot instances are a good match for stateless, fault tolerant and flexible applications. I enumerate here some use cases where Spot Instances fit well.

Web servers/REST APIs: especially those employing microservices architecture, operate in a stateless manner, meaning they do not retain information about previous interactions with clients. This design ensures scalability and fault tolerance by allowing multiple instances of the same service to handle requests interchangeably without relying on shared state. In the event of a failure in one server instance, requests can seamlessly be rerouted to other healthy instances, minimizing downtime, and maintaining service availability. This statelessness also simplifies horizontal scaling, as new instances can be spun up or down without concern for preserving session state, making the system more adaptable to varying workloads.
Continuous Integration/Continuous Deployment (CI/CD): Spot Instances are advantageous for CI/CD workflows because they offer scalable compute resources at lower costs. CI/CD processes often require extensive computational power to build, test, and deploy applications efficiently. With Spot Instances, users can dynamically scale their compute capacity based on workload demands, enabling faster iteration cycles and reducing the overall cost of software development and deployment.
High-Performance Computing (HPC): Spot Instances are well-suited for HPC workloads, which typically involve complex simulations, scientific computations, and data analysis tasks. These tasks often require significant computational resources for processing large datasets or performing intricate calculations. Spot Instances provide access to high-performance compute clusters at lower prices, allowing researchers, engineers, and scientists to run computationally intensive workloads cost-effectively without sacrificing performance.
Non-critical EKS nodes: EKS clusters used for software development and end-to-end testing will typically be a perfect fit for Spot instances. Indeed, those clusters usually don’t require high-availability constraints because they are only used by the developing and testing teams. Along with a good cluster auto-scaler such as Karpenter, Spot instances will allow you to provider flexible and powerful clusters for your developers at a reduced cost.

When Not to Use Spot Instances

Consider the little hotel deal that I described in the beginning of the article. If you are on an adventurous trip with long-time friends this deal might seem like a good idea to save money. However, if you are reserving for your wedding night, it might be reckless to opt for this kind of deal. The same goes for Spot; you will not use it in certain frameworks.

Spot Instances are not suitable for workloads that are inflexible, stateful, fault-intolerant, or tightly coupled between instance nodes. They’re also not recommended for workloads that are intolerant of occasional periods when the target capacity is not completely available.
EC2 Spot Best Practices

Therefore, IT IS NOT RECOMMENDED to use Spot in the following situations:

Databases: Relational databases or other stateful data stores that require consistent and reliable access to storage and compute resources. These workloads typically rely on persistent storage and may not tolerate interruptions or fluctuations in resource availability.
Real-time Processing Systems: Applications that require low-latency processing or real-time data analysis, such as real-time streaming analytics or online gaming platforms. These workloads often demand consistent and predictable performance, which may be compromised by the variability in Spot Instance availability.
Highly Coupled Distributed Systems: Workloads that involve tightly coupled communication between instance nodes, such as tightly integrated microservices architectures or distributed databases with complex interdependencies. Spot Instances, with their potential for intermittent interruptions and variability, may disrupt the communication and coordination between these interconnected components.

Conclusion

That’s all for this short overview of EC2 Spot. You should now understand how Spot works, why it’s cheap (it uses spare resources from other instances) and at what risk it comes (interruptions, dependence on supply and demand).

Check these links if you want to deeply understand spot best practices and spot interruptions.

Always remember, there is no free lunch. Savings on cloud are a good thing but they need to be planned (I would even say engineered) wisely and in a full understanding of the technology.

Fundamentals of EC2 Spot Instances