Tenant Resolution
Before any business logic runs, every request to a tenant worker (apps/server or
apps/auth) has to turn request.host into an organizationId. This is the
critical path — every single API call goes through it, so two things matter at once:
- Get it slow and you’ve taxed every endpoint in the system.
- Get it wrong and you leak one tenant’s data to another.
The shape of the solution is three small pieces: a pure classifier, a thin middleware, and a two-tier cache.
Hostname shapes
Section titled “Hostname shapes”There are three valid host patterns, plus a hard rejection for everything else. Knowing which bucket a host falls into is the first decision the system makes.
| Pattern | Example | Where the org lives |
|---|---|---|
| Default subdomain | acme.app.example.com | organization.slug = "acme" |
| Tenant custom domain | app.acme.com | tenant_custom_hostnames.hostname (status active) |
| Apex (marketing) | app.example.com | No tenant — serves the marketing / find-your-team page |
| Admin host | admin.example.com | A different worker entirely; tenantMiddleware rejects it with 404 |
Slugs are constrained at org-creation time: a reserved-slug list, a Punycode
(xn--) reject, and the regex ^[a-z0-9](?:[a-z0-9-]{1,61}[a-z0-9])?$. That keeps
the subdomain space clean and predictable, which matters because the classifier below
trusts it.
The pure classifier
Section titled “The pure classifier”The classification logic lives in a single pure function, parseHostname, in
packages/shared/src/tenant.ts (or @repo/tenancy after the Phase C consolidation —
see Deep Modules). Keeping it pure
means it’s trivially testable and has no I/O on the hot path.
export const WILDCARD_SUFFIX = ".app.example.com"; // env-configurableexport const ADMIN_HOST = "admin.example.com";
export function parseHostname(host: string, suffix: string, adminHost: string): | { kind: "subdomain"; slug: string } | { kind: "apex" } | { kind: "custom"; hostname: string } | { kind: "admin" } | { kind: "invalid" } { const normalized = host.toLowerCase().normalize("NFC").replace(/\.$/, ""); if (!/^[a-z0-9.-]+$/.test(normalized)) return { kind: "invalid" }; if (normalized.includes("*") || normalized.includes(" ")) return { kind: "invalid" }; if (normalized === adminHost) return { kind: "admin" }; if (normalized === suffix.slice(1)) return { kind: "apex" }; if (normalized.endsWith(suffix)) { const slug = normalized.slice(0, -suffix.length); return slug.includes(".") ? { kind: "invalid" } : { kind: "subdomain", slug }; } return { kind: "custom", hostname: normalized };}The order of these checks is load-bearing. The admin host is matched first, before
any wildcard or custom-host classification — otherwise admin.example.com would fall
through to a custom tenant lookup and leak a 404 timing oracle for “is this an
admin host?” queries. The apex case is matched next, then the wildcard suffix, and
only then does anything unrecognized become a custom hostname to look up in the
database.
Tenant middleware
Section titled “Tenant middleware”The middleware is deliberately thin: classify the host, short-circuit the cases that
need no database, and otherwise resolve through the cache. It sets tenant on the
request context for everything downstream to read.
export const tenantMiddleware = createMiddleware<AppEnv>(async (c, next) => { const rawHost = c.req.header("host"); if (!rawHost) return c.text("Bad Request", 400);
const parsed = parseHostname(rawHost, c.env.WILDCARD_SUFFIX, c.env.ADMIN_HOST); if (parsed.kind === "invalid" || parsed.kind === "admin") return c.notFound(); if (parsed.kind === "apex") { c.set("tenant", null); return next(); }
const tenant = await resolveCached(parsed, c); if (!tenant) return c.notFound(); c.set("tenant", tenant); return next();});apps/auth runs the same middleware. It has to execute before the Better Auth
handler so that SSO discovery and the session hooks already see the tenant context
by the time they run.
Two things about the apex branch are easy to get wrong:
- It’s a legitimate request, not an error. The marketing / find-your-team page on
app.example.comhitsapps/serverwithtenant === null. - So any tenant-scoped route must default-deny when
c.var.tenant === null. The codebase keeps an explicit allowlist of routes valid on apex — the login picker and the public marketing endpoints — and everything else rejects.
Cache strategy
Section titled “Cache strategy”A Postgres lookup on every API call would be wasteful, so the result is cached. But this cache is security-sensitive: when a tenant is suspended or a slug is renamed, stale entries must clear quickly and everywhere. That constraint rules out the obvious single-tier choices.
The system uses two tiers, each playing to a different strength of Cloudflare’s edge:
- Cache API — a per-colo cache (a colo is one Cloudflare data center). Sub-millisecond reads in-colo, with strong same-colo consistency on invalidation. This is the fast read path.
- KV — Cloudflare’s eventually-consistent global key-value store. It holds one small, rarely-changing version counter.
The mental model: Cache API answers the lookup fast; KV is the kill switch that expires every colo’s cache at once. Fast read, plus one versioned lever for the rare broad invalidation.
Why not just use KV as the lookup cache directly? Its cross-colo eventual-consistency window (around 60 seconds) is too long for security-sensitive operations like tenant suspension — you can’t have a suspended tenant still resolving for a minute somewhere. The Cache API has no such lag in-colo, which is exactly what the hot path wants.
The cache key embeds that version prefix, which the admin worker bumps on a slug rename or a suspension:
const version = (await c.env.CACHE.get("tenant_cache_version")) ?? "v0";const cacheKey = new Request( `https://tenant-cache.internal/${version}/${parsed.kind}/${ parsed.kind === "subdomain" ? parsed.slug : parsed.hostname }`);Bumping the version invalidates every cache entry under the old version, fleet-wide, on the next read — a single coarse lever for the rare, broad invalidations.
Resolve order on a miss
Section titled “Resolve order on a miss”On the cold path the middleware walks a short, fixed sequence:
flowchart TB
Start["request host"] --> Cache["Cache API lookup<br/>(versioned key)"]
Cache -->|"hit (positive or negative)"| Return["return cached tenant"]
Cache -->|"miss"| Kind{"parsed.kind?"}
Kind -->|"subdomain"| QSub["SELECT organizations<br/>WHERE slug = ? AND deleted_at IS NULL"]
Kind -->|"custom"| QCust["SELECT tenant_custom_hostnames<br/>WHERE hostname = ? AND lifecycle_status = 'active'<br/>join organizations (deleted_at IS NULL)"]
QSub --> Put["ctx.waitUntil(cache.put)<br/>positive TTL 60s / negative TTL 5s"]
QCust --> Put
Put --> Return
The asymmetric TTLs are deliberate. A positive result is cached for 60 seconds; a negative (“tenant not found”) result for only 5 seconds. The short negative TTL means a freshly-created tenant becomes visible within about 5 seconds in any colo that had previously cached the negative result — deletes and creations propagate fast without making the common positive case chatty.
Cache invalidation
Section titled “Cache invalidation”The Cache API is per-colo and per-worker. A cache.delete() in apps/admin’s
isolate does nothing for the matching entry in apps/server’s isolate, let alone in
another colo. Targeted invalidation therefore needs two cooperating mechanisms.
The first is service-binding RPC fan-out for invalidating a specific tenant. Each tenant worker exposes the operation on its entrypoint:
// (apps/auth/src/entrypoint.ts is identical)async invalidateTenant(spec: { kind: "subdomain" | "custom"; key: string }) { await caches.default.delete(`https://tenant-cache.internal/${currentVersion}/${spec.kind}/${spec.key}`);}async bumpTenantCacheVersion() { await this.env.CACHE.put("tenant_cache_version", String(Date.now()), { expirationTtl: 86400 });}The second is the admin worker, which orchestrates the fan-out across its peers:
export async function invalidateTenant(env: AdminEnv, spec) { await Promise.all([ env.API.invalidateTenant(spec), env.AUTH.invalidateTenant(spec), ]);}The fan-out is asymmetric on purpose: apps/auth has no binding back to
apps/admin. When the auth worker itself triggers an invalidation — for example on a
session-version bump — it calls the existing apps/auth → apps/server.invalidateTenant(...)
binding, while the admin worker remains the orchestrator for global invalidations.
After Phase C this whole pattern lives behind @repo/tenancy’s Invalidator and
FanOutInvalidator types; see Deep Modules.
Local development
Section titled “Local development”In local dev, window.location.host is localhost:3000, which matches neither the
wildcard suffix nor any custom host — so the middleware would 404. A dev-only header
provides an escape hatch, behind a deliberately strict two-factor gate:
if (c.env.NODE_ENV === "development" && c.env.ALLOW_DEV_TENANT_HEADER === "true") { const devSlug = c.req.header("x-dev-tenant-slug"); if (devSlug) { // resolve as if host were `${devSlug}.app.example.com` }}Both conditions must hold: NODE_ENV (a wrangler var) and ALLOW_DEV_TENANT_HEADER
(a Cloudflare Secret that exists only in .dev.vars). A CI guard rejects any deploy
to staging or production where NODE_ENV !== "production", so the header can’t slip
into a real environment. The tenant SPA (apps/app) sends X-Dev-Tenant-Slug: acme
from VITE_DEV_TENANT_SLUG=acme in .env.development.
Tombstoned slugs
Section titled “Tombstoned slugs”When an organization is soft-deleted, its slug must never be reissued. If it were, a future tenant with the same slug would inherit confusion across links, cookies, JWTs, and audit logs. The soft-delete therefore does two things in a single transaction (decisions D16 and D37):
UPDATE organization SET deleted_at = now(), deleted_by = ?, slug = NULL WHERE id = ?INSERT INTO reserved_slugs (slug, reason: "deleted_org", organization_id)
Tenant creation then checks both the active organization.slug UNIQUE constraint and
the reserved_slugs table, so a tombstoned slug can never be handed out again.
The soft-delete invariant
Section titled “The soft-delete invariant”Because organizations are never hard-deleted, every query against the organization
table must filter deleted_at IS NULL. Forgetting that filter is a quiet way to
resurrect a deleted tenant. The rule is documented in CLAUDE.md and enforced in code
review: every PR touching an organization query is checked for the filter. A future
Drizzle helper, activeOrganizations, may make the filter the default; until then it’s
applied by hand and verified at review.
Performance targets
Section titled “Performance targets”These are the budgets the design holds itself to:
- Cache hit, warm isolate: under 1ms for the cache lookup itself; total middleware overhead under 2ms at p99.
- Cache miss, cold path: roughly 5–10ms for the KV version lookup plus the Hyperdrive query plus the cache write.
- Cache miss with a cold isolate: roughly 50–100ms, a one-time cost per isolate boot.