API Cache + Rate Limiter
Build a small FastAPI service that demonstrates the two workhorse Redis patterns
together: cache-aside (with a TTL and explicit invalidation) on a deliberately
slow endpoint, and a sliding-window rate limiter as a FastAPI dependency. Both
are backed by the async redis.asyncio client, sharing one connection pool opened
in the app’s lifespan.
By the end you’ll watch a 1.5-second endpoint drop to single-digit milliseconds on a
cache hit, see a write blow the cache away, and get a 429 with a Retry-After
header when you hammer it past the limit.
What you’ll practice
Section titled “What you’ll practice”redis.asyncio.from_url+ a connection pool opened in the FastAPI lifespan and shared via a dependency.- Cache-aside: get → miss → load →
SET ... EX→ return, withProductserialized via Pydanticmodel_dump_json(). - Invalidation:
DELETEthe key on write, after the “DB” mutation. - A sliding-window rate-limit dependency using an atomic Lua script over a sorted
set, returning
429+Retry-After. - A
uvproject wired withruffandty.
Requirements
Section titled “Requirements”- A
GET /reports/{region}endpoint backed by a slow (simulated 1.5s) computation, cached in Redis with a 30-second TTL. - A
POST /reports/{region}/refreshendpoint that invalidates the cached report. - A response that reports
cached: true | falseso you can see hits vs misses without reading logs. - A sliding-window rate limiter (5 requests / 10 seconds, keyed by client IP)
applied to the report endpoint as a dependency, returning
429withRetry-Afterwhen exceeded. - One shared async Redis client, opened and closed in the lifespan.
The worked solution
Section titled “The worked solution”A single-package FastAPI app. Everything lives under app/: the lifespan + wiring in
main.py, the cache-aside logic in cache.py, and the limiter in ratelimit.py.
Directoryapi-caching/
- pyproject.toml uv project, deps, ruff + ty config
Directoryapp/
- init .py
- main.py lifespan, Redis pool, routes
- cache.py cache-aside get + invalidate
- ratelimit.py sliding-window dependency
- docker-compose.yml (optional) just Redis, if not using shared-infra
pyproject.toml
Section titled “pyproject.toml”Three runtime deps: fastapi, uvicorn (the ASGI server), and redis with the
hiredis speedup. ruff and ty go in a dev group.
[project]name = "api-caching"version = "0.1.0"requires-python = ">=3.13"dependencies = [ "fastapi>=0.115", "uvicorn[standard]>=0.34", "redis[hiredis]>=5.2",]
[dependency-groups]dev = ["ruff>=0.8", "ty>=0.0.1"]
[tool.ruff]target-version = "py313"
[tool.ty.environment]python-version = "3.13"app/cache.py
Section titled “app/cache.py”The cache-aside core. slow_report is the stand-in for an expensive aggregation
query — it sleeps 1.5 seconds so the cache win is obvious. get_report does the
classic dance: check Redis, return on hit, otherwise compute, store with a TTL, and
return. invalidate_report deletes the key.
The Report model serializes with Pydantic’s model_dump_json() and validates back
with model_validate_json(), so a malformed cache entry raises instead of returning
garbage.
import asynciofrom datetime import datetime
import redis.asyncio as redisfrom pydantic import BaseModel
REPORT_TTL_SECONDS = 30
class Report(BaseModel): region: str total_sales: float generated_at: datetime cached: bool = False
async def slow_report(region: str) -> Report: """Stand-in for an expensive query — sleeps to make the cache win visible.""" await asyncio.sleep(1.5) return Report( region=region, total_sales=round(len(region) * 1234.56, 2), generated_at=datetime.now(), )
async def get_report(r: redis.Redis, region: str) -> Report: key = f"report:{region}"
cached = await r.get(key) if cached is not None: # --- cache HIT --- report = Report.model_validate_json(cached) report.cached = True return report
report = await slow_report(region) # --- cache MISS: compute --- await r.set(key, report.model_dump_json(), ex=REPORT_TTL_SECONDS) # store w/ TTL report.cached = False return report
async def invalidate_report(r: redis.Redis, region: str) -> bool: deleted = await r.delete(f"report:{region}") # next read repopulates return deleted > 0app/ratelimit.py
Section titled “app/ratelimit.py”The sliding-window limiter from the module, packaged as a FastAPI dependency factory.
The Lua script runs atomically: prune old timestamps, count, reject-or-add. The
rate_limit(...) factory returns a dependency you attach to any route; on rejection
it raises HTTPException(429) with a Retry-After header computed from when the
oldest in-window request slides out.
import timeimport uuid
from fastapi import HTTPException, Request, status
SLIDING_WINDOW_LUA = """-- KEYS[1] = zset key-- ARGV: now_ms, window_ms, limit, member, ttl_secondslocal now = tonumber(ARGV[1])local window = tonumber(ARGV[2])local limit = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, now - window)local count = redis.call('ZCARD', KEYS[1])if count >= limit then local oldest = redis.call('ZRANGE', KEYS[1], 0, 0, 'WITHSCORES') return {0, oldest[2]}endredis.call('ZADD', KEYS[1], now, ARGV[4])redis.call('EXPIRE', KEYS[1], ARGV[5])return {1, 0}"""
def rate_limit(limit: int, window: int): """Factory -> a FastAPI dependency enforcing `limit` requests per `window` seconds."""
async def dependency(request: Request) -> None: r = request.app.state.redis client_id = request.client.host if request.client else "unknown" now_ms = int(time.time() * 1000) member = f"{now_ms}-{uuid.uuid4()}" # unique per request
allowed, oldest = await r.eval( SLIDING_WINDOW_LUA, 1, f"rate:sliding:{client_id}", now_ms, window * 1000, limit, member, window, )
if not allowed: # Retry-After = seconds until the oldest in-window request slides out. oldest_ms = int(oldest) if oldest else now_ms retry_after = max(1, (oldest_ms + window * 1000 - now_ms) // 1000) raise HTTPException( status_code=status.HTTP_429_TOO_MANY_REQUESTS, detail="Rate limit exceeded", headers={"Retry-After": str(retry_after)}, )
return dependencyapp/main.py
Section titled “app/main.py”The wiring. The lifespan opens one redis.asyncio pool, pings it to fail fast, and
closes it on shutdown. get_redis hands the shared client to routes. The report
route depends on both the Redis client and the rate-limit dependency (5 req / 10 s by
IP); the refresh route invalidates.
from collections.abc import AsyncIteratorfrom contextlib import asynccontextmanagerfrom typing import Annotated
import redis.asyncio as redisfrom fastapi import Depends, FastAPI, Request
from app.cache import Report, get_report, invalidate_reportfrom app.ratelimit import rate_limit
@asynccontextmanagerasync def lifespan(app: FastAPI) -> AsyncIterator[None]: app.state.redis = redis.from_url( "redis://localhost:6379", decode_responses=True, max_connections=20, ) await app.state.redis.ping() # fail fast if Redis is down yield await app.state.redis.aclose()
app = FastAPI(lifespan=lifespan)
def get_redis(request: Request) -> redis.Redis: return request.app.state.redis
RedisDep = Annotated[redis.Redis, Depends(get_redis)]
@app.get("/reports/{region}", dependencies=[Depends(rate_limit(limit=5, window=10))])async def read_report(region: str, r: RedisDep) -> Report: return await get_report(r, region)
@app.post("/reports/{region}/refresh")async def refresh_report(region: str, r: RedisDep) -> dict[str, bool]: return {"invalidated": await invalidate_report(r, region)}Run it
Section titled “Run it”You need Redis running first — it comes from the guide’s shared-infra Docker
Compose stack.
-
Start Redis from the shared infra:
Terminal window cd shared-infra && docker compose up -d redisNo shared-infra checkout handy? A one-service compose file works too:
Terminal window docker run -d --name redis -p 6379:6379 redis:8 -
Scaffold the project and add dependencies:
Terminal window uv init api-caching && cd api-cachinguv add fastapi "uvicorn[standard]" "redis[hiredis]"uv add --dev ruff ty -
Add the
app/package (__init__.py,main.py,cache.py,ratelimit.py) from the solution above, then lint and type-check:Terminal window uv run ruff check .uv run ty check # mypy works identically if you prefer it -
Run the server:
Terminal window uv run uvicorn app.main:app --reload --port 8000
See the cache hit/miss and the 429
Section titled “See the cache hit/miss and the 429”-
First call — a cache miss. It takes ~1.5s and returns
"cached": false:Terminal window curl -s -w "\n(%{time_total}s)\n" http://localhost:8000/reports/emea# {"region":"emea","total_sales":4938.24,...,"cached":false}# (1.51s) -
Second call — a cache hit. Near-instant,
"cached": true:Terminal window curl -s -w "\n(%{time_total}s)\n" http://localhost:8000/reports/emea# {"region":"emea",...,"cached":true}# (0.006s) -
Confirm the key and its TTL directly in Redis:
Terminal window docker exec -it redis redis-cli GET "report:emea"docker exec -it redis redis-cli TTL "report:emea" # counts down from 30 -
Invalidate it, then read again — back to a slow miss:
Terminal window curl -s -X POST http://localhost:8000/reports/emea/refresh # {"invalidated":true}curl -s -w "\n(%{time_total}s)\n" http://localhost:8000/reports/emea # ~1.5s, cached:false -
Trip the rate limiter — 5 requests are allowed per 10s, the 6th gets
429:Terminal window for i in $(seq 1 7); doecho "Request $i:"curl -s -o /dev/null -w " HTTP %{http_code} Retry-After: %header{retry-after}\n" \http://localhost:8000/reports/apacdone# Requests 1-5: HTTP 200# Requests 6-7: HTTP 429 Retry-After: 10