Saving My Pennies And Saving My Dimes
Reducing Costs By Serving Static Content From Amazon S3 Instead Of Lambda
by JamesQMurphy | December 23, 2019
DevOps engineers have a different perspective than developers, because they solve different problems. I know this because I've been in both roles. And I was reminded of this difference just last week. Thanks to the logging dashboard I described in my previous post, I realized that I had set up my API Gateway all wrong. From a developer's standpoint, it was totally fine. But when I took a look from a DevOps perspective, (and more importantly, an account owner's perspective,) I realized that my mistake could have cost me money.
All changes described in this blog post were made in releases v0.3.7 and v0.3.8.
Static Versus Dynamic Content
First, a quick recap of how I have my site set up. This is a simplified diagram; a full diagram is available on the site's About page. Web content can come from two places: The S3 bucket (static content only), and the Lambda Function (static or dynamic content):
By static content, I mean content that doesn't have to be generated specifically for each user -- images, style sheets, JavaScript files, etc. Static content can be (and should be) cached in the browser by supplying a Cache-Control
response header. In a typical ASP.NET MVC application, the static content is placed inside the wwwroot
folder, which makes deployment easier. Dynamic content, of course, includes content that changes depending on the request context, and includes most (if not all) of the HTML that comprises your site, as well as data returned from web services. It requires actual running code to generate the content on-the-fly.
Most ASP.NET MVC web applications will serve both static and dynamic content. To ease the demand on web servers, many large organizations will place the static content on CDN (content delivery network) type devices, and use some sort of network routing or reverse proxy to send the requests for the static content to the device. Using an S3 bucket is simply the cloud-based analog of this approach, and I described the basic setup in a previous post. However, I was approaching the problem from the wrong angle, as I'll describe later.
AWS Pricing
Let's compare the costs of the three services in the diagram (API Gateway, Amazon S3, and AWS Lambda). I calculated the prices of one million hits for each service in the "US East - Virginia" region (us-east-01
), which is the region in which this site is hosted. My calculations are based on the basic options for each of the services, and the initial pricing tier (the rates go down a little when you hit certain levels). I've linked to each service's pricing page if you want further details. All prices mentioned are at the time of this writing (December 2019).
API Gateway: API Gateway REST API, which is what this site uses, charges per request and per GB transferred (source: AWS API Gateway Pricing). The per-request rate is $3.50 per one million hits. The data transfer rate, however, is a little harder to find. Way down at the bottom, under "Data transfer", the page states that "If you use external data transfers, you will be charged at the EC2 data transfer rate." This rate can be found here, and at the time of this writing, it's $0.09 per GB per month. However, the first GB is free, although the per-request rate still applies.
Amazon S3: The cost of standard S3 storage itself is pretty cheap... only $0.023 (that's less than three cents) per GB per month. The data transfer rate is also comparatively cheap. A million GET
requests cost $0.401. If we were serving the S3 content directly to the Internet, the rate would also be $0.09 per GB, but since I'm transferring it via another AWS service (API Gateway) in the same region, it costs nothing additional. There is a free tier, but it's only good for a year (source: Amazon Simple Storage Service Pricing).
Lambda: Here's where costs get interesting. AWS Lambda charges per request, per duration, and per GB of RAM allocated (source: AWS Lambda - Pricing). For the duration, let's assume the very best Lambda performance for static content and use the minimum billable processing time, which is 100 milliseconds. The Lambda function is currently allocated at 320MB of RAM (0.32 GB), so by using the rate of $0.000016667 per GB-second, a million requests would cost 1,000,000 × 0.1 sec × 0.32 GB × $0.000016667 per GB-sec = $0.53. Add on the flat per-request rate of $0.20 per million, and we arrive at $0.73 per million requests. Since I'm not using the provisioned concurrency feature, I qualify for the free tier, so the first million hits at these execution times wouldn't cost me anything in any given month.
Here are the prices summarized:
Service | Per Million Hits | Per GB Downloaded | Pricing Links |
---|---|---|---|
API Gateway | $3.50 | $0.09 (first GB free) | Per Hit/Per GB |
S3 | $0.40 | free | Pricing |
Lambda | $0.73 (first million free) | free | Pricing |
From the table above, we can see that the amount of data downloaded gets billed at the same rate, no matter if it comes from Lambda or S3. It's the per-request numbers that differ. Using the assumptions above, one million requests served from S3 cost $3.50 + $0.40 = $3.90, whereas the same million requests from Lambda (assuming 100 millisecond requests at 0.32 GB) cost $3.50 + $0.73 = $4.23. It may not seem like much of a difference, but remember that we assumed that all Lambda requests would only take 100ms. In reality, there are cold starts and other factors that might make the Lambda function take more time. After all, it's running an ASP.NET Core MVC application, so the request gets handled like all other requests.
In short, unless you somehow stay under the free tier, static content served from Amazon S3 is always cheaper. (And it's also always faster, by the way.) But there's another wrinkle: The million hits include all web requests, which means it includes any stray web request that comes to your site! Even a "404 NOT FOUND" response is more expensive when it comes from AWS Lambda!
But It's Only A Little Static Content, Right?
When I was first setting up this site back in June, I was focused on getting it up and running. I did make sure that I served most of my static content from S3, but not all of it. This is how I initially set up API Gateway:
Resource | Method | Mapped To |
---|---|---|
/ |
GET | Lambda Function |
/blogimages |
GET | S3 Bucket |
/dist |
GET | S3 Bucket |
/images |
GET | S3 Bucket |
Everything else | ANY | Lambda Function |
If the request was for anything that wasn't in the blogimages
, dist
, or images
folder, the request would be handled by the Lambda function. Initially, I was okay with this, because there were only a few small static content files to worry about:
favicon.ico
, which would be cached by the browser2robots.txt
, which would only be requested by bots (and the curious)js/site.js
, which would also be cached by the browser
But I totally underestimated the number of "404 NOT FOUND" responses that could occur.
Giddy Up, Giddy Up, 404
In my last post I illustrated how I used CloudWatch Logs Insights to create a "Top 10 404 Responses" dashboard. The results were certainly eye-opening (well, at least to me they were):
Page | Count |
---|---|
/apple-touch-icon-precomposed.png |
22 |
/apple-touch-icon.png |
22 |
/apple-touch-icon-152x152-precomposed.png |
14 |
/apple-touch-icon-152x152.png |
14 |
/apple-touch-icon-120x120-precomposed.png |
8 |
/apple-touch-icon-120x120.png |
8 |
/wp-login.php |
4 |
/support/troubleshooting/china-status |
1 |
/support/troubleshooting |
1 |
/support/troubleshooting/china-status/ |
1 |
The majority of the 404 responses were requests for PNG files beginning with apple-touch-icon
. These requests are Apple's Safari web browser looking for Webpage Icons for your site. As you might guess from the requested file names, Safari looks for various sizes of the icon. If it doesn't find an icon, Safari will use a rather bland-looking Banangrams-style tile containing the first letter of the title of your website:
I honestly would never had known about these icons if I didn't look at the logs. So I added the icons to the root of the wwwroot
folder3, and the result looks great:
The other 404 responses were bots just probing the site, looking for well-known avenues of attack. I understand the probe for wp-admin
, which is the administrative page for WordPress sites, but the "support/troubleshooting" requests were a surprise.
Map The Routes, Not The Content
The point is, there's nothing you really can do to stop all the 404 responses -- they're going to happen. What you want to ensure, however, is that they aren't costing you money. That's why I changed the way that API Gateway was set up. Now, I explicitly map the routes to AWS Lambda and assume everything else is static content. (I still need a resource mapped to the /blogimages
folder, since blog images don't live in the same location as the source code.)
Resource | Method | Mapped To |
---|---|---|
/ |
GET | Lambda Function |
/account |
ANY | Lambda Function |
/admin |
ANY | Lambda Function |
/blog |
ANY | Lambda Function |
/blogimages |
GET | S3 Bucket |
/home |
ANY | Lambda Function |
Everything else | GET | S3 Bucket |
This is what it looks like in the AWS console:
Re-doing the Nested Stacks
Previously, I only had two resources in the API calling the Lambda function -- the root resource (/
) and the proxy resource underneath (/{proxy+}
). Now, although the root resource still needs to call the Lambda function, the main proxy method (which is now a GET
method) needs to call S3. In the CloudFormation template, I renamed the method to TheProxyGetMethod
and pointed it to S3:
TheProxyGetMethod:
Type: 'AWS::ApiGateway::Method'
Properties:
RestApiId: !Ref TheGatewayRestAPI
ResourceId: !Ref TheProxyResource
HttpMethod: GET
AuthorizationType: NONE
RequestParameters:
method.request.path.proxy: true
MethodResponses:
- StatusCode: 200
ResponseParameters:
'method.response.header.Timestamp': true
'method.response.header.Content-Length': true
'method.response.header.Content-Type': true
'method.response.header.Cache-Control': true
Integration:
Type: AWS
IntegrationHttpMethod: GET
Credentials: !GetAtt TheRoleForTheProxyGetMethod.Arn
Uri: !Sub arn:aws:apigateway:${AWS::Region}:s3:path/${S3BucketForCodeParameter}/${S3BucketPathForStaticFilesParameter}/{fullpath}
PassthroughBehavior: WHEN_NO_MATCH
RequestParameters:
integration.request.path.fullpath: 'method.request.path.proxy'
IntegrationResponses:
- StatusCode: 200
ResponseParameters:
'method.response.header.Timestamp': 'integration.response.header.Date'
'method.response.header.Content-Length': 'integration.response.header.Content-Length'
'method.response.header.Content-Type': 'integration.response.header.Content-Type'
'method.response.header.Cache-Control': !Sub "'public, max-age=31536000'"
Because of this change, I could remove the resources that pointed at static content (/dist
and /images
), and instead specify resources that pointed at the dynamic content. Since I now had four new resources (one for each route) to point to the Lambda function, I defined a new stack template named cf-apiGatewayToLambda.yaml
, which encapsulates all necessary elements to map an API Gateway resource to a Lambda function:
- The resource itself, mapped to the Lambda function
- An
ANY
method for the resource - A proxy resource underneath the main resource, to handle all subpaths
- An
ANY
method for the proxy resource - A Lambda Permission resource
The source code for the new stack template can be found here. Invoking the template in the main template is straightforward; here is how it is called for the home
route:
TheHomeRouteResourceStack:
Type: 'AWS::CloudFormation::Stack'
Properties:
TemplateURL: !Sub 'https://${S3BucketForCodeParameter}.s3.amazonaws.com/${S3BucketPathForStaticFilesParameter}/cf-apiGatewayToLambda.yaml'
TimeoutInMinutes: 10
Parameters:
RestApiIdParameter: !Ref TheGatewayRestAPI
ParentResourceIdParameter: !GetAtt TheGatewayRestAPI.RootResourceId
ApiResourceNameParameter: home
LambdaArnParameter: !GetAtt TheLambdaFunction.Arn
The same stack template is used for the account
, admin
, and blog
routes.
Totally Un-Static
After I created Release 0.3.7, I realized that the Lambda function was not serving dynamic content any more, so I could remove the call to app.UseStaticFiles()
. Well, sort of. I couldn't totally remove it, since I need it when the site runs locally on my desktop. But I could control whether or not it was used via configuration:
if (Configuration["UseStaticFiles"].ToLowerInvariant() == "true")
{
app.UseStaticFiles();
}
I made this change and released it as v0.3.8. The value defaults to "true"
but is set to "false"
when deployed to AWS Lambda.
Some Minor Drawbacks
There are some things to watch out for with this approach:
Case now matters. API Gateway sees
/home
and/Home
as two different routes; ASP.NET does not. However, I saw this as an opportunity to standardize on (lower) case, since Google also sees them differently. As part of Release 0.3.7, I also made all routes lowercase.New routes will require a new nested stack. If I create a new route, I need to remember to create a new stack in the CloudFormation file. As a general rule, I'm not a fan of having to "remember" to do anything, and while it is conceivable to build a mechanism that links the two, it isn't worth it. Unlike the traditional approach, I don't have to worry about setting up a route or filtering rule in a networking device -- it's still a source code change, and it lives with the rest of the source code4. And if it does get forgotten, it will be caught in the DEV environment.
Beware of exposing binary files. The deployment process I use simply unzips the
wwwroot
folder and uploads it to an S3 bucket. If your solution serves files from other locations, make sure you're not accidentally serving files that you shouldn't be, such as config files or DLLs. S3 doesn't discriminate... it will return whatever it is told to return.
Summary
For any type of cloud application, big or small, you need to watch costs -- and this is only feasible if you are also collecting the metrics. AWS is very up-front with their pricing and billing structure, and every service they use writes to CloudWatch logs. As long as you are gathering the information, you should have all the data you need to make an informed decision.
-
Even though it's relatively inexpensive, be aware that this rate includes the requests that occur if you're just browsing the contents of your S3 buckets in the AWS console!↩
-
There is also an issue when serving
.ico
files from Lambda functions via API Gateway -- they don't get decoded properly. Previous versions of the site were not serving thefavicon.ico
file properly, and as a result, the browsers were constantly re-requesting this file, even when theCache-Control
header was present. This issue was eliminated when I switched it over to serving it from S3, so I didn't pursue it any further. Check out this Stack Overflow question for more details.↩ -
I could have can use
<link rel="apple-touch-icon" href="...">
links on all pages to specify a different folder for the icons, but this wouldn't have completely eliminated the requests in the root folder. If somebody browsed directly to an image file on this site, there would be no accompanying<link>
tag, and Safari would default back to looking for the icon in the root folder.↩ -
You probably know this, but this is what DevOps folks call infrastructure as code, and this example illustrates why it's a good thing.↩