Designing Scalable Rate Limiting Notifications

You’ve finally hit the top of Product Hunt. Traffic is surging, the "New User" Slack channel is a scrolling blur, and your database is holding steady. Then, the support tickets start trickling in. Existing customers—the ones paying your bills—can't log in. They aren't receiving their Multi-Factor Authentication (MFA) codes. You check the logs and find that your outbound email and SMS queues are backed up by 40,000 "Welcome to the Platform!" emails.

By the time the queue clears, those MFA codes have expired. Your critical security infrastructure has been effectively DDoS’d by your own marketing success. This disaster happens because most notification systems use naive, binary rate limiting notifications. They treat every packet as an equal citizen, failing to distinguish between a low-priority newsletter and a high-priority password reset. If your system is hard-capped at 100 messages per second, a burst of "Welcome" messages will swallow the bandwidth needed for the login flow. To build a resilient SaaS, you have to move past "Allow vs. Drop" and implement a priority-aware architecture.

Why Naive Rate Limiting Notifications Break at Scale

Standard API gateways, Nginx configurations, or AWS WAF rules protect your compute resources by looking at request frequency from an IP or an API key. When a threshold is crossed, they return a 429 Too Many Requests. This is "Edge" rate limiting. While it stops your servers from melting, it lacks context. The gateway doesn't know about the intent of the payload.

The most common failure is the "Loudest Neighbor" problem. In a fragmented stack, you might have a background worker processing a batch of 50,000 transactional receipts. If that worker exhausts your global rate limit for AWS SES or Twilio, it creates a service-wide blackout. Your magic link requests, which share that same provider connection, get rejected because the neighboring process was too loud.

Returning a 429 to a background worker often makes the problem worse. Without a sophisticated retry strategy, workers enter a tight loop of unmanaged retries, hammering the bottleneck and preventing the recovery of the downstream service. In a notification context, a dropped request isn't just an error; it's a broken user journey. You can see how the Zyphr platform architecture avoids this by decoupling event triggers from delivery execution.

Architectural Patterns for Rate Limiting Notifications

Reliability is about making trade-offs during congestion. When bandwidth is scarce, you must decide which messages are allowed to fail and which are mission-critical. We categorize traffic into three tiers:

Tier 1 (Security): OTPs, Magic Links, Password Resets, and Account Lockout alerts. These require sub-second latency and 99.99% deliverability.
Tier 2 (Transactional): Order confirmations, billing alerts, and system-triggered notifications. These are important but can tolerate a 30-second delay.
Tier 3 (Informational/Marketing): New feature announcements, newsletters, and social engagement digests. These can be processed over minutes or hours.

To implement this, you need a weighted fair queuing system. Instead of one giant bucket, you route messages into tier-specific queues. Your workers poll these queues with a defined priority—ensuring Tier 1 is always empty before Tier 2 is even checked.

Here is a conceptual implementation using a sorted set to ensure critical messages bypass the backlog:

import Redis from 'ioredis';

const redis = new Redis();

async function routeNotification(message: any, priority: 'SECURITY' | 'TRANSACTIONAL' | 'MARKETING') {
  const priorityMap = {
    'SECURITY': 100,      // Highest priority
    'TRANSACTIONAL': 50,
    'MARKETING': 1
  };

  const payload = JSON.stringify({
    ...message,
    timestamp: Date.now(),
    priority: priorityMap[priority]
  });

  // Score combines priority and time to prevent starvation of low-priority tasks
  // while still letting high-priority tasks jump the line.
  const score = priorityMap[priority] + (1 / Date.now());
  
  await redis.zadd('notification_outbound_queue', score.toString(), payload);
}

// Worker logic: Pop the highest score first
async function processQueue() {
  const [task] = await redis.zpopmax('notification_outbound_queue');
  if (task) {
    await deliver(JSON.parse(task));
  }
}

By using a sorted set with weighted scores, you ensure that even if there are a million marketing emails, a single security code with a high priority score jumps to the front of the line.

Choosing the Right Algorithm: Token Bucket vs. Sliding Window

How you calculate the limit is just as important as how you prioritize the queue.

Token Bucket is standard for many APIs. You have a "bucket" of tokens that refills at a steady rate. If a user has 5 tokens and sends 5 SMS messages, the bucket is empty. They have to wait for the refill. This handles short bursts while maintaining a consistent average rate. However, it is less effective at preventing "boundary bursts"—where a user exhausts their quota at the very end of a minute and again at the start of the next minute, effectively doubling their allowed throughput in a two-second window.

Sliding Window Counter is more precise. Instead of fixed time blocks, it looks at the exact timestamp of every request over the last N seconds. This prevents the 2x burst at window boundaries. In a distributed system, implementing this requires atomic operations to avoid race conditions.

Here is an example of a sliding window counter using a Redis Lua script for atomicity:

-- ARGV[1]: current timestamp in ms
-- ARGV[2]: window size in ms
-- ARGV[3]: limit
local window_start = tonumber(ARGV[1]) - tonumber(ARGV[2])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', window_start)
local current_count = redis.call('ZCARD', KEYS[1])

if current_count < tonumber(ARGV[3]) then
    redis.call('ZADD', KEYS[1], ARGV[1], ARGV[1])
    redis.call('PEXPIRE', KEYS[1], ARGV[2])
    return 0
else
    return 1
end

Using this script ensures that your rate limiting notifications logic is consistent even when multiple worker nodes are processing requests simultaneously. At Zyphr, we use a hybrid approach to manage "Smoothing." If you send a burst of 10,000 emails, we don't drop 9,000 of them. We use the sliding window to identify the burst and then use the token bucket to "leak" those messages out at the maximum allowed rate of your downstream provider.

Dealing with Multi-Dimensional Constraints

A stable system requires protection at three distinct layers. If you only limit at the user level, your downstream provider might still ban you. If you only limit at the provider level, one bad actor can ruin the experience for everyone else.

The User Level: You must prevent a buggy useEffect hook or a malicious script from spamming your /send endpoint. This is where you implement strict sliding window limits (e.g., no more than 3 password resets per 15 minutes).
The Channel Level: Channels have different costs. You might allow 100 in-app notifications per hour—because they are inexpensive—but cap SMS at 5 per hour to control costs and prevent SMS pumping fraud.
The Provider Level: AWS SES, Twilio, and FCM have hard limits. If you exceed Twilio's throughput on a standard long code, they will queue and potentially drop your messages. You must map your internal governor to these external realities.

This is where the fragmented stack—using separate vendors for Auth, Email, and SMS—falls apart. These services do not communicate. Your Auth provider doesn't know that your Email provider is currently being throttled due to a domain warm-up limit. You end up building a massive, brittle middleware "Traffic Controller" just to synchronize these states. This technical debt is the "Integration Tax" that slows down engineering teams.

Backpressure and the No-Drop Architecture

When a rate limit is hit, the message shouldn't simply disappear into an error log. It should transition into a "Pending-Throttle" state. This requires backpressure—the ability of the system to signal that it's full and gracefully buffer the overflow.

We use exponential backoff with jitter. If a provider returns a throttling error, we re-queue the message with an increasing delay. We add "jitter" (randomized variance) to the delay to prevent the "thundering herd" problem—where thousands of previously throttled messages all retry at the exact same millisecond when the window resets.

function getRetryDelay(attempt: number): number {
  const baseDelay = 1000; // 1 second
  const maxDelay = 60000; // 1 minute
  const exponentialDelay = Math.min(maxDelay, baseDelay * Math.pow(2, attempt));
  // Add 20% jitter
  const jitter = exponentialDelay * 0.2 * Math.random();
  return exponentialDelay + jitter;
}

Furthermore, we implement circuit breakers for delivery. If a specific regional SMS gateway is failing at a rate higher than 50%, the circuit opens. We stop sending traffic to that provider and move those messages to a Dead Letter Queue (DLQ). This protects your system from wasting resources on a dead end and allows for manual or automated reconciliation once the provider is back online. Our webhook system never drops message events because it follows these exact principles of persistence and intelligent retries.

How Zyphr Governs Delivery

The reason we unified Auth and Messaging into a single platform was to solve this governance problem. Because Zyphr manages the identity layer, it has semantic awareness of every event. When a zyphr.auth.sendMagicLink() call comes in, the system recognizes it as a Tier 1 Security event. It doesn't matter if you're currently sending a million-row newsletter; the Magic Link is routed to a dedicated high-priority lane with pre-reserved throughput.

You don't have to write custom Redis logic or manage complex backpressure state. You can configure these constraints directly in the SDK:

import { Zyphr } from '@zyphr/sdk';

const zyphr = new Zyphr(process.env.ZYPHR_API_KEY);

await zyphr.emails.send({
  to: 'user@example.com',
  template: 'quarterly-digest',
  // Lower priority allows the system to 'smooth' this over a longer window
  // to stay within provider limits without blocking OTPs.
  priority: 'low', 
  retryConfig: {
    maxAttempts: 5,
    backoff: 'exponential'
  }
});

By using a unified platform, you eliminate the glue code between your identity provider and your messaging gateway. There is one SDK, one dashboard for delivery logs, and one unified governor that ensures your marketing bursts don't kill your core product functionality.

Audit Your Critical Path

Take a look at your current notification architecture. If you sent 100,000 "New Feature" emails right now, what would happen to your password reset flow? If they share the same provider account, the same API key, or the same worker queue, you are at risk of a success disaster.

Identify your Tier 1 notifications today. Ensure they are isolated from your marketing traffic—either through separate provider accounts or priority-aware middleware. If you want to stop managing this infrastructure yourself, check our documentation to see how to migrate your high-priority flows to a system built for 99.99% deliverability.

Designing Scalable Rate Limiting Notifications

Why Naive Rate Limiting Notifications Break at Scale

Architectural Patterns for Rate Limiting Notifications

Choosing the Right Algorithm: Token Bucket vs. Sliding Window

Dealing with Multi-Dimensional Constraints

Backpressure and the No-Drop Architecture

How Zyphr Governs Delivery

Audit Your Critical Path

Get posts like this in your inbox

Related Posts

Building a Resilient Webhook Architecture

Stop Gluing Auth0 to SendGrid

Why Authentication and Notifications Belong Together

Why Naive Rate Limiting Notifications Break at Scale

Architectural Patterns for Rate Limiting Notifications

Choosing the Right Algorithm: Token Bucket vs. Sliding Window

Dealing with Multi-Dimensional Constraints

Backpressure and the No-Drop Architecture

How Zyphr Governs Delivery

Audit Your Critical Path