Skip to content

Tenant Resolution

Before any business logic runs, every request to a tenant worker (apps/server or apps/auth) has to turn request.host into an organizationId. This is the critical path — every single API call goes through it, so two things matter at once:

  • Get it slow and you’ve taxed every endpoint in the system.
  • Get it wrong and you leak one tenant’s data to another.

The shape of the solution is three small pieces: a pure classifier, a thin middleware, and a two-tier cache.

There are three valid host patterns, plus a hard rejection for everything else. Knowing which bucket a host falls into is the first decision the system makes.

PatternExampleWhere the org lives
Default subdomainacme.app.example.comorganization.slug = "acme"
Tenant custom domainapp.acme.comtenant_custom_hostnames.hostname (status active)
Apex (marketing)app.example.comNo tenant — serves the marketing / find-your-team page
Admin hostadmin.example.comA different worker entirely; tenantMiddleware rejects it with 404

Slugs are constrained at org-creation time: a reserved-slug list, a Punycode (xn--) reject, and the regex ^[a-z0-9](?:[a-z0-9-]{1,61}[a-z0-9])?$. That keeps the subdomain space clean and predictable, which matters because the classifier below trusts it.

The classification logic lives in a single pure function, parseHostname, in packages/shared/src/tenant.ts (or @repo/tenancy after the Phase C consolidation — see Deep Modules). Keeping it pure means it’s trivially testable and has no I/O on the hot path.

packages/shared/src/tenant.ts
export const WILDCARD_SUFFIX = ".app.example.com"; // env-configurable
export const ADMIN_HOST = "admin.example.com";
export function parseHostname(host: string, suffix: string, adminHost: string):
| { kind: "subdomain"; slug: string }
| { kind: "apex" }
| { kind: "custom"; hostname: string }
| { kind: "admin" }
| { kind: "invalid" } {
const normalized = host.toLowerCase().normalize("NFC").replace(/\.$/, "");
if (!/^[a-z0-9.-]+$/.test(normalized)) return { kind: "invalid" };
if (normalized.includes("*") || normalized.includes(" ")) return { kind: "invalid" };
if (normalized === adminHost) return { kind: "admin" };
if (normalized === suffix.slice(1)) return { kind: "apex" };
if (normalized.endsWith(suffix)) {
const slug = normalized.slice(0, -suffix.length);
return slug.includes(".") ? { kind: "invalid" } : { kind: "subdomain", slug };
}
return { kind: "custom", hostname: normalized };
}

The order of these checks is load-bearing. The admin host is matched first, before any wildcard or custom-host classification — otherwise admin.example.com would fall through to a custom tenant lookup and leak a 404 timing oracle for “is this an admin host?” queries. The apex case is matched next, then the wildcard suffix, and only then does anything unrecognized become a custom hostname to look up in the database.

The middleware is deliberately thin: classify the host, short-circuit the cases that need no database, and otherwise resolve through the cache. It sets tenant on the request context for everything downstream to read.

apps/server/src/middlewares/tenant-context.ts
export const tenantMiddleware = createMiddleware<AppEnv>(async (c, next) => {
const rawHost = c.req.header("host");
if (!rawHost) return c.text("Bad Request", 400);
const parsed = parseHostname(rawHost, c.env.WILDCARD_SUFFIX, c.env.ADMIN_HOST);
if (parsed.kind === "invalid" || parsed.kind === "admin") return c.notFound();
if (parsed.kind === "apex") {
c.set("tenant", null);
return next();
}
const tenant = await resolveCached(parsed, c);
if (!tenant) return c.notFound();
c.set("tenant", tenant);
return next();
});

apps/auth runs the same middleware. It has to execute before the Better Auth handler so that SSO discovery and the session hooks already see the tenant context by the time they run.

Two things about the apex branch are easy to get wrong:

  1. It’s a legitimate request, not an error. The marketing / find-your-team page on app.example.com hits apps/server with tenant === null.
  2. So any tenant-scoped route must default-deny when c.var.tenant === null. The codebase keeps an explicit allowlist of routes valid on apex — the login picker and the public marketing endpoints — and everything else rejects.

A Postgres lookup on every API call would be wasteful, so the result is cached. But this cache is security-sensitive: when a tenant is suspended or a slug is renamed, stale entries must clear quickly and everywhere. That constraint rules out the obvious single-tier choices.

The system uses two tiers, each playing to a different strength of Cloudflare’s edge:

  • Cache API — a per-colo cache (a colo is one Cloudflare data center). Sub-millisecond reads in-colo, with strong same-colo consistency on invalidation. This is the fast read path.
  • KV — Cloudflare’s eventually-consistent global key-value store. It holds one small, rarely-changing version counter.

The mental model: Cache API answers the lookup fast; KV is the kill switch that expires every colo’s cache at once. Fast read, plus one versioned lever for the rare broad invalidation.

Why not just use KV as the lookup cache directly? Its cross-colo eventual-consistency window (around 60 seconds) is too long for security-sensitive operations like tenant suspension — you can’t have a suspended tenant still resolving for a minute somewhere. The Cache API has no such lag in-colo, which is exactly what the hot path wants.

The cache key embeds that version prefix, which the admin worker bumps on a slug rename or a suspension:

const version = (await c.env.CACHE.get("tenant_cache_version")) ?? "v0";
const cacheKey = new Request(
`https://tenant-cache.internal/${version}/${parsed.kind}/${
parsed.kind === "subdomain" ? parsed.slug : parsed.hostname
}`
);

Bumping the version invalidates every cache entry under the old version, fleet-wide, on the next read — a single coarse lever for the rare, broad invalidations.

On the cold path the middleware walks a short, fixed sequence:

Resolve order (cold path)
Rendering diagram…

The asymmetric TTLs are deliberate. A positive result is cached for 60 seconds; a negative (“tenant not found”) result for only 5 seconds. The short negative TTL means a freshly-created tenant becomes visible within about 5 seconds in any colo that had previously cached the negative result — deletes and creations propagate fast without making the common positive case chatty.

The Cache API is per-colo and per-worker. A cache.delete() in apps/admin’s isolate does nothing for the matching entry in apps/server’s isolate, let alone in another colo. Targeted invalidation therefore needs two cooperating mechanisms.

The first is service-binding RPC fan-out for invalidating a specific tenant. Each tenant worker exposes the operation on its entrypoint:

apps/server/src/entrypoint.ts
// (apps/auth/src/entrypoint.ts is identical)
async invalidateTenant(spec: { kind: "subdomain" | "custom"; key: string }) {
await caches.default.delete(`https://tenant-cache.internal/${currentVersion}/${spec.kind}/${spec.key}`);
}
async bumpTenantCacheVersion() {
await this.env.CACHE.put("tenant_cache_version", String(Date.now()), { expirationTtl: 86400 });
}

The second is the admin worker, which orchestrates the fan-out across its peers:

apps/admin/src/lib/cache-fanout.ts
export async function invalidateTenant(env: AdminEnv, spec) {
await Promise.all([
env.API.invalidateTenant(spec),
env.AUTH.invalidateTenant(spec),
]);
}

The fan-out is asymmetric on purpose: apps/auth has no binding back to apps/admin. When the auth worker itself triggers an invalidation — for example on a session-version bump — it calls the existing apps/auth → apps/server.invalidateTenant(...) binding, while the admin worker remains the orchestrator for global invalidations. After Phase C this whole pattern lives behind @repo/tenancy’s Invalidator and FanOutInvalidator types; see Deep Modules.

In local dev, window.location.host is localhost:3000, which matches neither the wildcard suffix nor any custom host — so the middleware would 404. A dev-only header provides an escape hatch, behind a deliberately strict two-factor gate:

if (c.env.NODE_ENV === "development" && c.env.ALLOW_DEV_TENANT_HEADER === "true") {
const devSlug = c.req.header("x-dev-tenant-slug");
if (devSlug) {
// resolve as if host were `${devSlug}.app.example.com`
}
}

Both conditions must hold: NODE_ENV (a wrangler var) and ALLOW_DEV_TENANT_HEADER (a Cloudflare Secret that exists only in .dev.vars). A CI guard rejects any deploy to staging or production where NODE_ENV !== "production", so the header can’t slip into a real environment. The tenant SPA (apps/app) sends X-Dev-Tenant-Slug: acme from VITE_DEV_TENANT_SLUG=acme in .env.development.

When an organization is soft-deleted, its slug must never be reissued. If it were, a future tenant with the same slug would inherit confusion across links, cookies, JWTs, and audit logs. The soft-delete therefore does two things in a single transaction (decisions D16 and D37):

  1. UPDATE organization SET deleted_at = now(), deleted_by = ?, slug = NULL WHERE id = ?
  2. INSERT INTO reserved_slugs (slug, reason: "deleted_org", organization_id)

Tenant creation then checks both the active organization.slug UNIQUE constraint and the reserved_slugs table, so a tombstoned slug can never be handed out again.

Because organizations are never hard-deleted, every query against the organization table must filter deleted_at IS NULL. Forgetting that filter is a quiet way to resurrect a deleted tenant. The rule is documented in CLAUDE.md and enforced in code review: every PR touching an organization query is checked for the filter. A future Drizzle helper, activeOrganizations, may make the filter the default; until then it’s applied by hand and verified at review.

These are the budgets the design holds itself to:

  • Cache hit, warm isolate: under 1ms for the cache lookup itself; total middleware overhead under 2ms at p99.
  • Cache miss, cold path: roughly 5–10ms for the KV version lookup plus the Hyperdrive query plus the cache write.
  • Cache miss with a cold isolate: roughly 50–100ms, a one-time cost per isolate boot.