go / limiter
I use a keyed token bucket limiter for per-client rate limiting without unbounded memory growth.
The problem
Rate limiting by key (per IP, per user, per API key) seems simple:
var buckets = make(map[string]*TokenBucket)
func Allow(key string) bool {
bucket := buckets[key]
if bucket == nil {
bucket = NewTokenBucket(rate, burst)
buckets[key] = bucket
}
return bucket.Allow()
}
But this map grows forever. Every unique IP that ever hits your server stays in memory. Under attack, this becomes a memory exhaustion vector.
The pattern
Track only the N most recently seen keys using an LRU cache. Assume untracked keys are well-behaved — they haven't been seen recently enough to be a problem.
package limiter
import (
"sync"
"time"
"github.com/hashicorp/golang-lru/v2"
)
type Limiter[K comparable] struct {
Size int // number of keys to track
Max int64 // tokens per bucket
RefillInterval time.Duration // time to add one token
Overdraft int64 // extra tokens that can go negative
mu sync.Mutex
cache *lru.Cache[K, *bucket]
}
type bucket struct {
cur int64
lastUpdate time.Time
}
func (lm *Limiter[K]) Allow(key K) bool {
lm.mu.Lock()
defer lm.mu.Unlock()
b := lm.getBucket(key)
lm.refill(b, time.Now())
if b.cur > 0 {
b.cur--
return true
}
if b.cur > -lm.Overdraft {
b.cur-- // charge overdraft
}
return false
}
Key points:
- Bounded memory: LRU evicts old keys when
Sizeis reached. - Per-key buckets: Each key gets independent rate limiting.
- Overdraft/cooldown: Abusive keys go into debt and must stop completely to recover.
Overdraft explained
Without overdraft, an abusive client sending 1000 req/sec against a 10 req/sec limit still gets 10 requests through. They consume each token as it appears.
With overdraft, exceeding the limit puts the bucket negative. The client must stop entirely until tokens refill past zero. If they keep hammering, they stay in debt forever.
// Without overdraft: abuser gets rate-limited throughput
// With Overdraft: 50, abuser must wait 5+ seconds of silence
Usage
var ipLimiter = &limiter.Limiter[netip.Addr]{
Size: 10_000, // track 10K IPs
Max: 100, // 100 request burst
RefillInterval: limiter.QPSInterval(10), // 10 req/sec sustained
Overdraft: 50, // cooldown penalty
}
func handler(w http.ResponseWriter, r *http.Request) {
ip := getClientIP(r)
if !ipLimiter.Allow(ip) {
http.Error(w, "rate limited", http.StatusTooManyRequests)
return
}
// handle request
}
Helper for queries-per-second:
func QPSInterval(qps float64) time.Duration {
return time.Duration(float64(time.Second) / qps)
}
When to use
- API rate limiting per client, IP, or key
- Protection against abuse with unknown/unbounded key space
- When "rough" enforcement is acceptable — block outliers, ignore well-behaved clients that fell out of the LRU
When not to use
- Precise rate limiting where every request must be tracked
- Small, known set of keys (just use a map of buckets)
- Global rate limiting (use a single token bucket)
Sizing
Choose Size based on your expected cardinality:
- Web API with 10K daily active users:
Size: 50_000 - Public endpoint open to the internet:
Size: 100_000
The LRU ensures that active abusers stay tracked while inactive keys get evicted.