How nuanced rate limiting transforms your API and business

The most basic version of API rate limiting is like pouring a massive concrete dam across a river of incoming requests. You’ve successfully blocked the potential flood by allowing only a subset through as a small, artificial trickle. Mission accomplished, right?

Rate limiting can actually be a value-add in all sorts of nuanced ways. Stick with the basics and you will miss opportunities to improve availability, UX, fairness and even internal development velocity.

Now, instead of a dam, imagine a natural rivershed as it meanders its way downstream. Water ends up in the same place eventually, but a river provides endless knock-on benefits, like providing wildlife habitats, replenishing aquifers and filtering pollutants.

Rate-limiting policies that are created and managed with the nuance of a river — not the binary of a dam — can be an enormous enabler for any API business. Let’s see what lies around the bend.

Nuanced security policy

Like any service or system exposed to the public internet, your APIs will be subject to automated malicious attacks. If you make particularly interesting data available on your API, like financial records, or become a tier-1 service for your customers, you’ll also become a tempting target for threat actors.

Naturally, your first mode of protection should be DDoS (distributed denial of service) protection from a network-level provider. If it also provides your API gateway, that’s a bonus — you will have also made some headway against tool sprawl.

But neither DDoS protection or a dam-like rate-limiting approach can effectively deal with more advanced attacks or fraudulent usage. For those, you need an API gateway that helps you quickly write rate-limiting policies based on variables like authentication state, values present (or not) in headers, location of the request and more.

With every nuanced layer of rate limiting, your API becomes more secure, available and performant for legitimate users.

Nuanced protection against accidents

Here’s a scenario that has played out on one too many unsuspecting API providers:

  1. After a few tough sprints, your team finally delivers an API to production.
  2. You notify your customers about the new functionality, some of whom have been waiting for months, as it’s a massive unblocker for their own business.
  3. Customers quickly provision new authentication tokens and start firing off test requests through services like Postman, which helps them validate which headers they’ll need and which data they’ll want to parse out from your API’s responses.
  4. They start to write scripts that use your API for automation.
  5. A simple bug in their automation code creates a loop that hammers your API without a backoff or kill switch, inevitably degrading service for other customers or costing you dearly in excess compute.

You can’t always blame the customers here, either. The DDoS can just as often come from “inside the house,” with an eager developer rushing to merge a new feature before the current sprint ends.

The rate-limiting capabilities of any API gateway you’re seriously validating should allow you to create a catchall rate-limiting policy for general protection, plus more subtle limits to protect yourself from mistakes from known entities. You may even want to create different limits for certain endpoints, like those used most often in onboarding, to strike a better balance between protecting your service while also helping customers through into production.

Nuanced performance gains

Rate limiting inevitably adds some latency to requests and subsequent responses. You are, after all, inserting an algorithm and business logic into what was previously an “open” pipeline.

However, there are some ways you can claw back that latency based on how you deploy your API gateway, which affects where your rate-limiting algorithms are run. Be sure to:

  • Isolate rate limiting from your services: The compute required to analyze incoming requests and apply a rate-limiting algorithm should never steal cycles away from your API service. Separate the logic and processes to ensure each operates consistently and adaptably, based on the most volume and consistency of traffic in the most recent window.
  • Distribute rate limiting widely: All API gateway operations and policies, especially rate limiting, should operate close to your users. If your users send requests from all around the world, your API gateway should follow suit — not only responding faster to legitimate requests, but also turning them gently away with a 429 response code when they’ve taken their API usage a step too far.
  • Unbind rate limiting from how or where your infrastructure works: Choosing between hosted and self-hosted is a good first step, but also consider the potential negatives of tightly binding an API gateway to your existing clouds, regions or infrastructure. Ideally, your API gateway shouldn’t care whether you’re running off a single VM in Sydney or a multicloud Kubernetes infrastructure with clusters in AWS and GCP: The gateway’s operations should be independent enough that it only needs an internal endpoint for tunneling traffic.

Instead of thinking of your self-hosted or on-premises API gateway as a necessary bottleneck, a hosted API gateway easily turns rate limiting into an enabler for you overall when distributed across an edge network’s many points of presence (PoPs).

Nuanced pricing and go-to-market strategy

Speaking of service tiers, subtle rate-limiting policies can quickly become a path to monetizing your API.

As of late, most tools for developers and engineers operate on a usage-based or “pay-as-you-go” strategy rather than being based around the number of seats. Billing based on usage, such as the volume of requests passing through an API gateway, lowers the barrier to entry for new users who want to build a proof of concept, validate your technology and carefully migrate to your service.

For example, GitHub carefully manages rate limiting as part of its pricing strategy:

  • Unauthenticated users: 60 requests per hour
  • Authenticated users, GitHub apps and OAuth apps: 5,000 requests per hour
  • Users and apps owned by a GitHub Enterprise Cloud organization: 15,000 requests per hour

If your API is in demand, and a small subset of users create large volumes of requests in short periods, a similar structure might be your best path toward monetization. Just be sure to spend some time on due diligence: Observe common usage of your APIs, from the most relevant and high-opportunity customers, to best craft your policies with value and fairness in mind.

Nuanced UX for your customers

From an end-user perspective, the 429 HTTP error code for rate limiting feels punitive — a slap on the wrist in exchange for finding value in your API and wanting to use it all the time?

You can also improve how said users interact with your API, even if they don’t see the impact firsthand, through nuanced rate limiting.

For example, you can implement nuanced rate-limiting policies for different types of requests. You might be more lenient on routes and HTTP methods most used during onboarding and building an minimum viable product with your API, but be harsher on others requested less often or only by specific users that don’t fit into your monetization strategy.

Alternately, consider seasonal or event-based rate-limiting policies during times you know you’ll see a spike in load to even the playing field during rush periods, and move some API traffic to off-peak hours. Combine those with policies that filter requests based on the country of origin for nuanced rate limiting that best matches your users’ on-peak and off-peak hours, no matter where they request your API from.

Finally, look into rate limiting as a mechanism for easing the UX around breaking changes or managing versions. Informing users of an upcoming change, then slowly degrading their services on the “old” way through rate limiting, gives them the time and motivation to update their implementation before they start seeing other HTTP response codes, like a 410: Gone.

Of course, your team should also be closely monitoring the observability data created by your API gateway for trends around 429 responses. Who, when and how often are customers receiving that slap on the wrist? Create processes by which someone on your team — whether in sales, customer success or DevRel — reaches out to those affected by rate-limiting policies with educational information or a relevant pitch as to why they should consider upgrading to a higher service tier.

Nuanced fairness in multitenant systems

In a single-tenant model, rate limiting as a basic dam works well: Your API gateway blocks attacks or mistakes from affecting the speed and reliability of your customer’s endpoint, then allows a manageable trickle of legitimate requests.

A multitenant approach has a higher bar for fairness: Every API user accessing a multitenant system must get a single-tenant experience. If User A fires off a thousand requests in a few seconds, you must prevent this “noisy neighbor” from negatively affecting the experience for User B or User C.

Implementing fairness in a multitenant system requires far more sophisticated rate-limiting policies. If you create a single policy for all users, you neglect those who have paid for access to your API. If your rate limit for enterprise-tier users is too generous, their usage will crowd out new users, who might not opt to upgrade their free plans. If you can’t get this equation right, your API will be both unfair and lack the resiliency expected from a multitenant architecture.

The key is a configurable API gateway that lets you build rate limits based on every possible shred of data attached to requests, even down to geographic origin and body size.

Nuanced internal developer experience

Aside from JSON Web Token (JWT) authentication, rate limiting is often the first stop for delivering an API to production. API businesses often work intensely on improving the “time to first call” developer experience (DX) for new users, and similarly, your “time to rate limiting” greatly affects any downstream success — or struggles.

You can use the developer-friendly implementation of rate limiting as a leading wedge, onboarding your teams onto configuring your API gateway and empowering them to transfer what they’ve learned to other parts of API management.

Whether your business gives developers ownership of the entire API life cycle or has a DevOps or platform engineering team to handle publication and production operations, your DX improves with:

  1. Flexibility: When you refuse to settle for simple dam-like rate-limiting policies, you can fine-tune API usage based on request-time variables like who your users are, where they’re requesting from, the size of request bodies and more.
  2. Familiarity: Choose API gateways that use patterns most developers already understand, like YAML/JSON and Common Expression Language (CEL), to define custom policies.
  3. Environmental flexibility: Give developers and DevOps engineers plenty of opportunity to test rate-limiting policies on dev/staging versions of your API gateway before pushing to production.
  4. GitOps-readiness: Use API gateways that can pull policy directly from git repositories for better version control, code review and repeatability.
  5. Centralization: Define a businesswide set of security-centric and availability-centric guardrails, then let individuals layer in additional rate-limiting policies based on the requirements of each API.

Time to chart new waters in API rate limiting

A dam-like approach to rate limiting might solve your immediate problem around managing load, but in doing so, you also harm the ecosystem around your API.

Whether you’re a platform or DevOps engineer deploying a configurable API gateway on behalf of your API developer peers or an API developer suddenly responsible for the entire life cycle due to the “shift left”-ification of all things operations and security, take a moment to consider:

  • How can you better layer nuance into your rate-limiting practices?
  • Does your API gateway give you the variables, expressions, actions and DX necessary to act with nuance?

By layering in many rate-limiting policies in an API gateway that acts conditionally based on every facet of request data, you have a fresh opportunity to enrich your API’s ecosystem for both your business and every user downstream.

This post was originally posted on The New Stack.

Share this post
Joel Hans
Joel Hans is a Senior Developer Educator at ngrok. He has plenty of strong thoughts about documentation, developer education, developer marketing, and more.
GSLB
Networking
ngrok agent
Load balancer
Production