Level up your API Gateway Usage Plans with Cloudwatch Metrics and Alarms

Rick O'Sullivan
AWS
February 12, 2024

TLDR; API Gateway Usage Plans are a great way to set Usage Quotas on your API’s but lack the ability to emit telemetry against the configured quota. Thankfully there’s a neat solution we can build using Lambda, Event bridge and Cloudwatch to extend your API reporting capabilities to address this short coming. If this sounds like something you need, read on…

What are Usage Plans ?

Usage Plans in API Gateway API allow you to make your APIs available as product offerings for your customers. You can configure usage plans and API keys to allow customers to access selected APIs based on defined limits and quotas.

A throttling limit allows you can configure an upper request per second limit on your APIs to help protect them from being overwhelmed by too many requests. Throttles are applied on a best-effort basis and should be thought of as targets rather than guaranteed request ceilings.

A quota limit on the other hand sets the target maximum number of requests with a given API key that can be submitted within a specified time interval. You can also configure individual API methods to require API key authorization based on usage plan configuration.

This blog post will focus exclusively on quota limits.

But…..?

The challenge with using Quota Limits is there is no way to monitor and alert on usage in real time. API Gateway provides an ad-hoc reporting capability to export the usage data (csv or json) from within the Console, but this isn’t practical at scale or if you want to automate notification.

The Solution

As with all AWS Services, they are primitives and as such, can be interfaced through their respective APIs. Using this approach we can combine Lambda with Event Bridge and Cloudwatch to interrogate our API Gateway Usage plan data and publish this for reporting and alerting. Lets take a look at the solution below;

Thankfully its a pretty straight forward approach. The solution leverages an Event bridge schedule to periodically invoke a Lambda which retrieves the latest usage plan details and publishes those quota values for each associated API Key to a Cloudwatch Metric. Once we have the Metrics published, we can attach Cloudwatch Alarms as custom thresholds. This ensures each API key can be configured independently.

The Function

While the above sounds quite trivial, in practice, building this was a little more involved. The reason being is that not all the data we want comes from a single API. in fact we need to make 3 distinct API calls to API Gateway and 1 API call to Cloudwatch to facilitate this. Lets step through the sequence using an UML diagram before digging in a little deeper.

Invocation

The first step is to trigger the Lambda function. There are a number of options available here, however the simplest is to use an Event Bridge Schedule. This works well in our use case as we don’t need to update our data after every API request and an hourly schedule provides a nice balance between accuracy and cost. Setting this up is also very straightforward. In my example I am using the serverless framework (SAM) which supports this configuration under the Events object in your resource definition.

Events:
    InvocationLevel:
      Type: Schedule
      Properties:
        Schedule: "rate(1 hour)"
        Name: API-UsagePlan-Collector
        Description: "Hourly Trigger to Collect API Gateway UsagePlan Metrics"
        Enabled: true

Get Usage Plans

Before we can get the quota values, we first need to determine which usage plans we want to get quota information from. To do this we make a request to the apigateway.getUsagePlans API using the AWS Javascript SDK to return all available usage plans.

const usagePlanData = await apigateway.getUsagePlans(params).promise();

This returns a response containing an array of UsagePlans within the account. Depending on your setup this could return quite a few, so we need a way to filter this list. So lets do that now…

As with a lot of AWS resources, resource tags are a great way to label or group resources together and this is exactly what we need here. On my Usage Plans I have configured a tag called ApplicationId, This has a value i can look for within my code to limit the usage plans we want to process.

In our example code below I set an environment variable on the Lambda function to match the resource tag value on the Usage Plan and conditionally check for this value as I iterate through the usage plans.

 if (process.env.ApplicationId) {
      console.log(
        "Application configured to process only Usage Plans with the",
        process.env.ApplicationId + " Tag"
      );
      const ApplicationId = process.env.ApplicationId;
      const filteredUsagePlan = []; // Create an new object for filtered UsagePlans
      usagePlanData.items.forEach((usagePlan) => {
        if (usagePlan.tags && usagePlan.tags.ApplicationId === ApplicationId) {
          filteredUsagePlan.push(usagePlan);
        }
      });
      return filteredUsagePlan;
    } else {
      console.log(
        "No Usage Plans filtering configured. Processing all Usage Plans"
      );
      // Send All UsagePlan items back if "ApplicationId" Environment Variable is not found
      return usagePlanData.items;
    }

As you can see from the code above this is conditional. If we do not set an environment variable named ApplicationId, we simply assume you want to process all usage plans before finally returning those to the main handler code in a new list called filteredUsagePlan

Get Quota Values

Now that we have a filtered list of Usage Plans, we want to confirm we actually have configured a quota. We do this using a simple for loop that checks for a “quota” attribute within the array, as shown below;

for (usagePlan of GetAllUsagePlans) {
    console.log("****** Processing", usagePlan.name + " ******");
    if ("quota" in usagePlan) {
      .......

Assuming this evaluates to true, we then need to pass this information to another API method called apigateway.getUsage to return the quota information. This API requires a Usage Plan ID which we have from the previous response as well as a start/end date range in unix epoch format which we construct using the helper function GeneralUtils.generateTimestamp

// Create Date Ranges for Quota data
      const endDateRange =
        GeneralUtils.generateTimestamp("YYYY-MM-DD").toString();
      const startDateRange = GeneralUtils.firstOfTheMonth(
        usagePlan.quota.period,
        endDateRange
      );

Unfortunetly, this is where we hit a snag. The response object we get back gives us a list of attached API Key ids and quota values which aren’t very useful unless we can map them to a description or label. To obtain this information we pass these API key IDs to the apigateway.getUsagePlanKey API, again using a for loop to iterate through all the Key id’s. The code below demonstrates this in action;

// Check the response object contains Quota Items and extract the latest Quota value
      if (Object.keys(UsagePlanData.items).length === 0) {
        console.log("No Quota value to send", usagePlan.name);
      } else {
        for (keyId of Object.keys(UsagePlanData.items)) {
          // process each API Key associated with the Usage Plan
          const quotaRemaining = UsagePlanData.items[keyId].pop().pop();
          var keyQuotaData = await UsagePlanUtils.getUsagePlanKey(
            usagePlan.id,
            keyId
          );

And with that, we should now have all the information we need to construct the Cloudwatch metric payload.

Put Metric Data

In the previous section we spent all our time interfacing with the API Gateway service collecting information about our API Key usage and quota values. In this stage we can finally use this information and push this into cloudwatch so it’s actionable.

The first thing we need to do is build a request payload for the Cloudwatch PutMetric API. As shown below we need to pass in a few parameters. Lets take a look at each one below;

CWMetricparams = {
    MetricData: [
      {
        MetricName: keyQuotaData.name,
        Dimensions: [
          {
            Name: "Quota Remaining",
            Value: "Value",
          },
        ],
        Unit: "None",
        Value: quotaRemaining,
      },
    ],
    Namespace: usagePlan.tags.ApplicationId || process.env.CloudWatchNameSpace,
  };

Namespace: This is effectively a label to group or isolate your metrics together. In our example we use the Resource Tag value or an Environment variable called CloudWatchNameSpace if the tag does not exist.

MetricName: this is how we identify each metric. In our case this is the name of the API Key.

Value: This is the remaining quota value for the current API Key as returned by the getUsagePlanDetails API

Finally with all the data correctly formatted we can invoke the CloudWatchUtils.putMetricData API to create our custom metric.

The Result

If all goes to plan you should see (after 10–15mins) new custom cloudwatch Namespace appears under Metrics containing a new metric for every API key attached to each Usage Plan processed. In my example I have 4 customer API Keys across 3 Usage Plans which ive graphed using the Gauge widget.

From here you can create custom alarms (clicking the bell icon) for each metric to send notifications or invoke actions when the number of remaining API requests is less than your configured threshold, and or use this data to build custom dashboards or reports for your application.

The Code

In my example I am using NodeJS for my Lambda function with the v2 javascript SDK. If you’d like to look at my Proof of Concept code, feel free to click the link below to my repo.

GitHub – rickosaws/UsagePlanMonitor: Solution to monitor and alert on the use of API Gateway Usage…
Solution to monitor and alert on the use of API Gateway Usage Plan Quotas – GitHub – rickosaws/UsagePlanMonitor…github.com

Conclusion

One of the really amazing things about using AWS Services is the fact that almost everything is accessible through an API or SDK. This give you almost unlimited freedom to extend or build new solutions to address gaps or shortcomings with what is currently available. In our case we’ve used Event Bridge, Lambda and Cloudwatch to extend the API Gateway reporting capabilities for Usage Plans making them far more useful from a monitoring and reporting perspective.

What will you build on AWS?.

Author

Rick O'Sullivan

Rick is a Senior Media and Entertainment Solutions Architect based in Sydney, Australia. Rick spends his time working with Australia's largest Media and Publication customers helping them bring News, Sports and Drama to your home. Rick is a dedicated Father and Husband and in his spare time, builds GenAI applications, plays a few instruments ( poorly ) and dabbles with DJ'ing and video editing.

View all posts Senior Media and Entertainment Solutions Architect

Rick O'Sullivan

Rick is a Senior Media and Entertainment Solutions Architect based in Sydney, Australia. Rick spends his time working with Australia's largest Media and Publication customers helping them bring News, Sports and Drama to your home. Rick is a dedicated Father and Husband and in his spare time, builds GenAI applications, plays a few instruments ( poorly ) and dabbles with DJ'ing and video editing.

Level up your API Gateway Usage Plans with Cloudwatch Metrics and Alarms

What are Usage Plans ?

But…..?

The Solution

The Function

Invocation

Get Usage Plans

Get Quota Values

Put Metric Data

The Result

The Code

Conclusion

Author

Rick O'Sullivan

Leave a Comment Cancel Reply

Top Posts

Using Amazon Bedrock Agents to augment RAG

Tips and Best Practices to get the most out of Bedrock Prompt Engineering

Getting Started with DynamoDB single table design

Recent Posts

Using Amazon Bedrock Agents to augment RAG

Amazon Q – does Q stand for quick?

Streamline insurance underwriting with generative AI using Amazon Bedrock

Tips from the Trenches – Updating Serverless to support newer Node versions

Elevate Your DDoS Protection with AWS Shield Advanced Mitigation Metrics

Using Amazon Bedrock Knowledge Bases to power up sports stats

Tips and Best Practices to get the most out of Bedrock Prompt Engineering

AWS Secrets Manager: The Ultimate Solution for RDS Password Management

Uncovering Hidden Vulnerabilities: Amazon Inspector’s Deep Instance Inspection for EC2 Instances

Mastering Your Brand: A Tech Professional’s Guide to Scaling Yourself

Related Posts

Using Amazon Bedrock Agents to augment RAG

Amazon Q – does Q stand for quick?

Streamline insurance underwriting with generative AI using Amazon Bedrock

Tips from the Trenches – Updating Serverless to support newer Node versions

Quick Links

Newsletter