go / limiter

updated March 08, 2026

I use a keyed token bucket limiter for per-client rate limiting without unbounded memory growth.

The problem

Rate limiting by key (per IP, per user, per API key) seems simple:

var buckets = make(map[string]*TokenBucket)

func Allow(key string) bool {
    bucket := buckets[key]
    if bucket == nil {
        bucket = NewTokenBucket(rate, burst)
        buckets[key] = bucket
    }
    return bucket.Allow()
}

But this map grows forever. Every unique IP that ever hits your server stays in memory. Under attack, this becomes a memory exhaustion vector.

The pattern

Track only the N most recently seen keys using an LRU cache. Assume untracked keys are well-behaved; they haven't been seen recently enough to be a problem.

package limiter

import (
	"sync"
	"time"

	"github.com/hashicorp/golang-lru/v2"
)

type Limiter[K comparable] struct {
	Size           int           // number of keys to track
	Max            int64         // tokens per bucket
	RefillInterval time.Duration // time to add one token
	Overdraft      int64         // extra tokens that can go negative

	mu    sync.Mutex
	cache *lru.Cache[K, *bucket]
}

type bucket struct {
	cur        int64
	lastUpdate time.Time
}

func (lm *Limiter[K]) Allow(key K) bool {
	lm.mu.Lock()
	defer lm.mu.Unlock()

	b := lm.getBucket(key)
	lm.refill(b, time.Now())

	if b.cur > 0 {
		b.cur--
		return true
	}
	if b.cur > -lm.Overdraft {
		b.cur-- // charge overdraft
	}
	return false
}

Key points:

Bounded memory: LRU evicts old keys when Size is reached.
Per-key buckets: Each key gets independent rate limiting.
Overdraft/cooldown: Abusive keys go into debt and must stop completely to recover.

Overdraft explained

Without overdraft, an abusive client sending 1000 req/sec against a 10 req/sec limit still gets 10 requests through. They consume each token as it appears.

With overdraft, exceeding the limit puts the bucket negative. The client must stop entirely until tokens refill past zero. If they keep hammering, they stay in debt forever.

// Without overdraft: abuser gets rate-limited throughput
// With Overdraft: 50, abuser must wait 5+ seconds of silence

Usage

var ipLimiter = &limiter.Limiter[netip.Addr]{
    Size:           10_000,                   // track 10K IPs
    Max:            100,                      // 100 request burst
    RefillInterval: limiter.QPSInterval(10), // 10 req/sec sustained
    Overdraft:      50,                       // cooldown penalty
}

func handler(w http.ResponseWriter, r *http.Request) {
    ip := getClientIP(r)
    if !ipLimiter.Allow(ip) {
        http.Error(w, "rate limited", http.StatusTooManyRequests)
        return
    }
    // handle request
}

Helper for queries-per-second:

func QPSInterval(qps float64) time.Duration {
    return time.Duration(float64(time.Second) / qps)
}

When to use

API rate limiting per client, IP, or key
Protection against abuse with unknown/unbounded key space
When "rough" enforcement is acceptable: block outliers, ignore well-behaved clients that fell out of the LRU

When not to use

Precise rate limiting where every request must be tracked
Small, known set of keys (just use a map of buckets)
Global rate limiting (use a single token bucket)

Sizing

Choose Size based on your expected cardinality:

Web API with 10K daily active users: Size: 50_000
Public endpoint open to the internet: Size: 100_000

The LRU ensures that active abusers stay tracked while inactive keys get evicted.

← All articles