Optimizing Lambda Architecture

7 min readJul 31, 2024

The intent of this article is to be a living document on best methods for optimizing AWS Lambda architecture. This is not an inclusive list. Feel free to add your suggestions in the comments and the article will be periodically updated.

Method 1: Don’t Use Lambda

The truth is if you require consistent, low-latency performance, Lambda probably is not for you. Out of the box you can expect an API call to frequently take over 5 seconds to complete due to cold starts, and as your API grows it will error out after the unchangeable 29 second API Gateway time limit expires.

AWS Lambda is not optimized for performance; it’s optimized for speed to delivery. It performs well if used correctly, but it still performs worse than almost every alternative, even when configured correctly. It scales well, but the scaling might not perform the way you want. Containerization using tools like AWS ECS, AWS Fargate, Docker, Kubernetes, etc will give you more customization options to scale the way you want and to almost any size. AWS EC2 can take this even further. Unfortunately this requires much, much more setup and maintenance than a Lambda based infrastructure and therefor we only mention this for posterity and not an actual recommendation for everyone.

Method 2: Provisioned Concurrency (recommended)

While it’s not public knowledge on what AWS uses under the hood, Lambda behaves similar to Docker. When a Lambda function is invoked, it spins up a container (Lambda function instance) and processes the request. If it can’t keep up with requests, it spins up more containers until it either can or it hits the service limit. Once Lambda detects it has more containers than needed, it begins killing them until it balances out or all containers have been killed. Enter Cold Starts.

A Cold Start is when a request needs to be processed and Lambda spins up a new container to process it. This process can be expected to take at least 2 seconds and likely around 5. It’s influenced by a combination of things including Lambda’s own spin up time, the size of the Lambda function, and the time it takes to initialize your Lambda function’s code. This cold start time chains with every Lambda function in the call stack meaning a cold Lambda chain of more than 15 Lambda functions will almost guarantee an API Gateway failure but can even happen with far fewer.

Provisioned Concurrency is a configuration setting in Lambda that tells it “No matter what you’re detecting, never have fewer than x number of containers active.” This means you won’t get cold starts for that Lambda function as long as you don’t need to scale higher than that number of containers. It’s important to note provisioning concurrency is NOT CHEAP.

Note: API Gateway MUST BE CONFIGURED USING ALIASES if you use provisioned concurrency, otherwise cold starts will continue

Note: this explanation of Provisioned Concurrency is actually a simplification. Even with Provisioned Concurrency, Lambda will kill all the containers eventually. What it does is saves the RAM for an initialized Lambda function so it can spin up new containers in a fraction of the time.

Method 3: Lambda Warmer (not recommended)

A similar pattern for preventing cold starts is called a Lambda warmer. A Lambda warmer is a tool that pings the API on a schedule so that the Lambda function instances in the chain never get the opportunity to cool down (about once every 15 minutes). This has the advantage of keeping the whole chain ready without having to know anything about the implementation. Many frameworks (like Serverless) have plugins that support this.

UPDATE: as an acquaintance recently proved to me, Lambda Warmers DO NOT charge for keeping the lambda function warm aside from charging for the invocation. This makes it significantly cheaper than Provisioned Concurrency on paper. However, the unreliability and extra complexity needed for Lambda Warmers will inevitably lead to paying more in developer maintenance man-hours.

Now for the drawbacks. This method is specifically NOT recommended. It leads to your monitoring systems having no idea how often your APIs are actually being used, what percent of real requests actually fail, how long real requests are typically taking for a customer. It also is a cop-out for not knowing your system and technology as provisioned currency when used correctly will give you more consistent results and clearer metrics.

Method 4: More Lambda Ram (recommended)

Setting a Lambda function’s RAM higher also increases it’s network speed and CPU performance. These means everything about the Lambda function, including the cold start time, is faster.

Method 5: Don’t/Do Use Concurrent Async Code

There is a trade-off to using concurrent asynchronous calls (like using Promise.all() ) to the point where I strongly advise against ever using Promise.all(). The benefit is that when optimizing code, you can have network calls run concurrently like a Ghantt chart. In theory if you have two calls that take five seconds each, doing both sequentially takes 10 seconds while concurrently it would take 5 seconds.

The problem with concurrent code in Lambda is logging. Unless done correctly (which is unintuitive) logs for things done concurrently in a Lambda function are unreadable. Lambda separates invocation logs into separate streams, but almost no one remembers you need to do that manually in concurrent code or the logs blend horribly. Avoid this pattern unless you need to squeeze every drop of performance

Method 6: API Gateway Alternatives

API Gateway, like Lambda, is designed for speed to delivery, not low latency. You can expect 100ms latency consistently added to every request that goes through API Gateway. Load balancers like ELB are lighter weight and therefore an attractive option for people squeezing performance out of a system, but this comes at the drawback of the routing, caching, auth, etc that API Gateway provides. This is a bit of an extreme choice, but it’s an option.

Method 7: API Gateway Caching (recommended)

“There are two hard things in programming- caching, and naming things.” API Gateway has built-in caching. Unless you have experience with caching, you are likely going to mess it up in the beginning. Your coworkers are almost guaranteed to break it at some point. That said, API Gateway caching can reduce responses to below 40ms, is simple, and is cheap. It’s highly recommended to use when possible.

Method 8: Cache with Redis/AWS ElastiCache

If you want to take caching to the next level, use Redis as a caching service. Redis is a free tool to do key-value lookups with extremely low latency and scales well despite only using a single thread. AWS implemented a Redis-compatable service called ElastiCache for those who want something that works out-of-the-box. It’s slightly more work than API Gateway caching and slightly more maintenance, but it’s extremely effective as a caching solution and an industry standard.

Method 9: Cache in the Environment (not recommended).

Now you’re wondering “Did I read that correctly?” Unfortunately, you did. Lambda provides you with the ability to store data in the container’s RAM that persists between invocations. This data is not shared between containers and dies when Lambda kills the container.

You can look into this more on you own, but there are tools and patterns for storing cached data in RAM. This is incredibly performant since there is no data transfer, but this requires complete understanding of AWS Lambda to use correctly and a clear system so future engineers don’t break it. It’s not recommended and generally frowned upon.

Method 10: Measure Your Latency Correctly (recommended)

The first question you should have asked when you said you wanted to optimize your system is “Where is the latency coming from?” Here’s how you do that:

Picture your system as a flow. It has a trigger that starts it. It has a point at which it’s complete.
Start a timer at the trigger and stop it when complete. That time in seconds is your system latency.
Pick a new point in your flow. It can be anywhere in the flow.
Measure the time from the trigger to that point. The time in seconds is the latency of that portion of the flow.
If that you decide this is what you want to focus on, disregard the rest of the flow. Otherwise, disregard everything in the flow before the point you picked. What’s left is your new system flow.
Repeat steps 2–5 until you’ve reach something atomic (cannot be divide)
This atomic piece of your flow is the thing you want to optimize. It’s usually a specific tech (eg. Postgres database, Lambda authorizer, Google Weather API call, etc). This tech usually requires experience to optimize. You can get this by either trial-and-error experimentation or simply Google search how other people optimized that tech.

Method 11: Fat Lambda Functions (recommended at method level)

When using API Gateway, usually each method/endpoint combination has its own Lambda function. This is the cleanest way to design a system using API Gateway with Lambda. It’s also the least optimized for speed. Some people seeking extra performance will group all the methods of an endpoint into a single Lambda function. That way when the GET call happens, Lambda will have a warm container for the POST call. Some people take this even further and have the entire API in a single Lambda function using something like Express for routing.

Grouping all the Lambda functions into a single Lambda function kills the benefits of clean logs, monitoring, testability, etc, but in return you get an API that is much more consistent. It’s not recommend unless you want to use Lambda as a monolith.

Method 12: Optimize Authorizer (recommended)

An easily forgotten piece of your architecture is your Lambda function’s authorizer. This Lambda function also has cold starts and can run into every problem Lambda runs into, but is way easier to miss when looking for things to optimize.

Method 13: Direct Invocation (not recommended)

API Gateway is not the only way to invoke a Lambda function. One way to bypass it is to use the aws-sdk library’s invoke Lambda method. This is absolutely the fastest way to invoke a Lambda function, making it seductive.
The reason we organize calls into APIs is because it decouples them and provides clear contracts. People who invoke Lambda functions directly tend to run into every problem you’d expect to run into if you couple two services directly: unclear what the contract is, a name change in code breaks something, everything using it either has no access or too much access to the Lambda function, monitoring becomes less reliable, etc.

Please feel free to contribute in the comments and adjustments will be made based on feedback.