Deep Modules
A deep module is a small, simple interface that hides a lot of implementation — a narrow door into a big room. The term is John Ousterhout’s; the idea is the leverage: callers work against a few well-named methods instead of re-deriving the same rules from prose every time they touch the concept. The opposite is a shallow module — one that exposes almost as much interface as it has implementation, so it barely earns its keep (think a wrapper class whose methods map one-to-one onto a SQL query each).
Six concepts in this system started out shallow and scattered: the same idea spread across several files, each consumer re-implementing the rules. This chapter consolidates each one behind a single entry point. Read every section as the same four beats:
- The problem — the concept lives in N files, and the rules drift.
- The module — one small typed interface, the only way in.
- What it hides — the implementation now behind that door.
- The test boundary — you test the small interface, not the internals.
The payoff is that every later consumer targets the deep boundary and never grows shallow again.
One of the six, @repo/tenancy, is already a runtime boundary that auth, server,
and custom-hostname flows all depend on, so it lands early (Phase A). The other
five are consolidations applied once the core multi-tenant behavior is stable
(Phase C).
Implementation order
Section titled “Implementation order”The order is strict because each step builds on the one before it — schema before
the package that queries it, the token verifier before the service that reads
org.sessionVersion:
- The Phase 0 spike validates the Better Auth SSO schema and the Cloudflare hostname state model.
- Phase A schema migrations land.
- Phase A:
@repo/tenancy— every worker uses it. - Phase C:
@repo/auth-tokens— the verifier-side helper. - Phase C:
authenticateOperatorplus@repo/authorization/operator. - Phase C: the
customHostnameLifecycleservice inapps/server. - Phase C:
ssoProviderRepository, with its Postgres view. - Phase C: the
tenantOperationsservice. - Workers refactor: route handlers shrink to thin wrappers around the modules above.
Tenancy package
Section titled “Tenancy package”Tenant context is the first thing every request needs and the easiest thing to let
fragment. Spread across parseHostname, two tenant middlewares (one in apps/server,
one in apps/auth), cache-fanout.ts, and the bumpTenantCacheVersion RPC, it
becomes five files for one concept — and the cache-key shape leaks into invalidation
code in three separate workers. Pulling it into one package gives every worker a
single way to turn a host into an organization.
export type TenantContext = { organizationId: string; host: string; slug?: string; kind: "subdomain" | "custom"; enforceSSO: boolean; sessionVersion: number;};
export type ResolveDeps = { db: DrizzleClient; cache: Cache; kv: KVNamespace; wildcardSuffix: string; adminHost: string;};
export async function resolveTenant(host: string, deps: ResolveDeps): Promise<TenantContext | null>;
// Asymmetric invalidator: workers that ONLY invalidate their own colo,// vs the admin worker that fans out to peers.export type Invalidator = { invalidateOwn(spec: { kind: "subdomain" | "custom"; key: string }): Promise<void>; bumpOwnVersion(): Promise<void>;};
export type FanOutInvalidator = Invalidator & { fanOut(spec: { kind: "subdomain" | "custom"; key: string }): Promise<void>; fanOutBumpVersion(): Promise<void>;};
// Workers create their own variant in their entrypoint:export function createInvalidator(env: { CACHE: KVNamespace }): Invalidator;export function createFanOutInvalidator(env: { CACHE: KVNamespace; API: AuthRpc; AUTH: AuthRpc }): FanOutInvalidator;What the package hides behind that interface:
- The
parseHostnamerules — lowercase plus NFC normalization, the regex, reserved names, and the tombstone check. - The Cache API key shape — the version prefix and the kind/key segments.
- The KV version-bump TTL.
- The deleted-org filter (
WHERE deleted_at IS NULL). - The admin-host exclusion.
- The reverse-lookup join paths: a subdomain resolves through
organizations.slug; a custom hostname joinstenant_custom_hostnamestoorganizations.
The invalidator is split in two on purpose. apps/auth has no binding back to
apps/admin, so it gets the plain Invalidator (own-colo only) and reuses the
existing apps/auth → apps/server.invalidateTenant(...) binding for cross-server
invalidation. The admin worker gets the FanOutInvalidator, which calls both
apps/server and apps/auth. The asymmetry of the binding graph is reflected in
the type, so a worker can’t accidentally call a fan-out method it has no binding for.
Test boundary: integration tests against an in-process Postgres with a mocked Cache and KV, asserting:
- Resolution by slug returns the expected org.
- Resolution by custom hostname joins correctly.
- Soft-deleted orgs return
null. - A cache miss writes a positive entry; a cache hit returns without touching the DB.
- The negative-cache TTL is short.
- A version bump invalidates entries written under the old version.
This one suite replaces four separate test files from the earlier design.
Auth tokens package
Section titled “Auth tokens package”JWT verification started life as a protocol written in prose: every consumer was
told to check aud, iss, org.host, org.id, and that sessionVersion >= db.
No module owned it, so each new consumer re-implemented the same checks from an
English description — exactly the kind of duplication that drifts. The fix is to
make the protocol a module that returns a typed result.
export type AuthorizedClaims = { sub: string; email: string; roleSlugs: string[]; platform: "web" | "mobile"; org: { id: string; host: string; sessionVersion: number };};
export type VerifyError = | { kind: "expired" } | { kind: "wrong_aud"; actual: string; expected: string } | { kind: "wrong_iss"; actual: string; expected: string } | { kind: "wrong_org"; actual: string; expected: string } | { kind: "wrong_host"; actual: string; expected: string } | { kind: "stale_session"; claim: number; current: number } | { kind: "bad_signature" };
export type VerifyOpts = { expectedHost: string; expectedOrgId: string; jwks: JWKSResolver;};
// Stateful variant — for internal verifiers with DB access (uses the up-to-date sessionVersion).export async function verifyTenantJwt( token: string, opts: VerifyOpts & { db: DrizzleClient },): Promise<AuthorizedClaims | VerifyError>;
// Stateless variant — for external verifiers (the caller supplies the version they last saw).export async function verifyTenantJwtStateless( token: string, opts: VerifyOpts & { expectedMinSessionVersion: number },): Promise<AuthorizedClaims | VerifyError>;The two variants exist because two kinds of verifier exist. An internal verifier in
another worker has DB access and can read the live sessionVersion, so it uses
verifyTenantJwt. An external, downstream service has no DB access, so it uses
verifyTenantJwtStateless and supplies the most recent version it saw.
Minting deliberately stays in Better Auth. Its jwt plugin keeps owning key
management and JWKS distribution — JWKS (JSON Web Key Set) being the public phone
book of signing keys: a verifier fetches it once, then validates token signatures
offline against it. This package only consumes tokens; the design just extends
definePayload to add the org claim. Replacing Better Auth’s mint side would
break its session helpers and force the team to manage JWKS by hand — a large
maintenance burden for no gain. Internal verifiers (other workers) and external
verifiers alike fetch JWKS from Better Auth’s /api/auth/jwks endpoint and cache it
with createRemoteJWKSet.
The test boundary is round-trip tests covering every claim combination and every failure mode — the coverage the prose protocol never had.
Tenant operations
Section titled “Tenant operations”Every operator-on-tenant mutation — create, suspend, restore, delete — has to coordinate four things in the same transaction: the DB writes, a dual-scope audit record, a session-version bump, and (after commit) a cache invalidation. Spread across many endpoint handlers, nothing structurally guarantees that the next endpoint someone writes remembers all four. A service that owns the coordination turns “remember four steps” into “call one method.”
type TenantOperator = { kind: "global_admin"; admin: GlobalAdmin } | { kind: "system"; reason: string };
export class TenantOperations { constructor(private deps: { db; auditLogService; invalidator: FanOutInvalidator }) {}
async create(payload: { slug; name; primaryAdminEmail }, by: TenantOperator): Promise<{ orgId; invitationId; hostedAt }>; async suspend(orgId: string, by: TenantOperator, reason: string): Promise<void>; async restore(orgId: string, by: TenantOperator): Promise<void>; async delete(orgId: string, by: TenantOperator): Promise<void>; // soft-delete + tombstone slug}Each method runs the same shape of transaction, with the version bump, session deletes, and slug tombstone switched on only for the operations that need them:
db.transaction([ DB writes (insert/update), dual-scope audit (createDualScope inside the tx), session-version bump (suspend / restore / delete only), session deletes (suspend / delete only), slug tombstone (delete only),])post-commit: invalidator.fanOut(spec) // + bump version on rename (rename deferred to v2)The by parameter is a union because not every mutation is operator-initiated.
Billing-driven suspension and scheduled DPA-deletion run as the system, which needs
a typed actor that isn’t a global admin. The audit log already supports
actor_type: "system"; the union types the call site so a system mutation can’t
masquerade as an operator.
rename is deliberately absent. A slug rename invalidates SSO callback URLs
registered with external IdPs — the IdP holds an absolute URL pointing at the old
hostname, so the rename breaks SSO until the tenant updates their IdP config.
“Hide this from callers” is the wrong abstraction here, because the operator genuinely
has to coordinate with the tenant. v2 adds rename with an explicit operator runbook
rather than pretending it’s a transparent operation.
With the service in place, the admin endpoints shrink to wrappers:
suspendTenant.guard = [requireOperator("tenant.suspend")];suspendTenant.handler = async (c) => { const { id } = c.req.param(); const { reason } = await c.req.valid("json"); await tenantOperations.suspend(id, { kind: "global_admin", admin: c.var.globalAdmin }, reason); return c.json({ ok: true });};Five lines after validation, and the four-piece coordination is invisible to the route. The test boundary lives on the service, not the handlers: each method is tested for its transactional invariants — partial-failure rollback, the dual audit row count, session-version monotonicity, and that post-commit invalidation is actually called.
Operator authentication
Section titled “Operator authentication”The admin worker’s identity boundary spans three things: verifying the Cloudflare Access JWT, the enrollment-token flow on first login, and DB-side activity tracking. Left as separate middleware snippets across several places, it’s hard to reason about as one boundary. Folding it into one function gives the admin worker a single “who is this operator?” call that returns a typed result.
type AuthFailure = | { kind: "missing_token" } | { kind: "invalid_token" } | { kind: "service_token" } | { kind: "enrollment_required" } | { kind: "deactivated" };
class JwksCache { async get(): Promise<ReturnType<typeof createRemoteJWKSet>>; reset(): void;}
export async function authenticateOperator( c: AdminContext, deps: { jwks: JwksCache; db: DrizzleClient },): Promise<{ ok: true; admin: GlobalAdmin } | { ok: false; failure: AuthFailure }>;What it hides:
- CF Access JWT verification against the team JWKS.
- Service-token rejection.
- The first-login enrollment-token claim flow.
- The
lastActiveAtping. - A stable failure mapping for the SPA (
ENROLLMENT_REQUIRED, deactivated, invalid token).
The test boundary is one place: JWT success and failure, service-token rejection, the enrollment-token claim race, a deactivated user, and JWKS reset behavior are all exercised against this single interface.
Operator authorization
Section titled “Operator authorization”The earlier design answered “what can an operator do?” with four separate pieces —
the policies, a whereGlobalAdminRole builder, an adminBypassTenantIsolation
middleware, and a buildGlobalAdminPrincipal attribute layout. A reader couldn’t
answer “can support suspend a tenant?” without cross-referencing all four. A single
permission matrix, with the action type derived from it, makes the answer one
lookup.
export const OPERATOR_PERMISSIONS = { "tenant.create": ["super_admin", "support"], "tenant.suspend": ["super_admin", "support"], "tenant.restore": ["super_admin", "support"], "tenant.delete": ["super_admin"], "tenant.invite_admin": ["super_admin", "support"], "tenant.list": ["super_admin", "support", "read_only", "security"], "tenant.view": ["super_admin", "support", "read_only", "security"], "platform.view_audit_logs_global": ["super_admin", "support", "read_only", "security"], "platform.view_system_metrics": ["super_admin", "support", "read_only", "security"], "platform.manage_feature_flags": ["super_admin", "support"], "platform.manage_global_admins": ["super_admin"],} as const satisfies Record<string, readonly GlobalAdminRole[]>;
// Type DERIVED from matrix keys — no separate union to drift.export type OperatorAction = keyof typeof OPERATOR_PERMISSIONS;
export const requireOperator = (action: OperatorAction) => createMiddleware<AdminEnv>(async (c, next) => { const operator = c.get("globalAdmin"); if (!operator) return c.text("Forbidden", 403); if (!OPERATOR_PERMISSIONS[action].includes(operator.role)) { return c.text("Forbidden", 403); } return next(); });
export function canOperator(admin: GlobalAdmin, action: OperatorAction): boolean { return OPERATOR_PERMISSIONS[action].includes(admin.role);}The key move is that OperatorAction is keyof typeof OPERATOR_PERMISSIONS. Every
action in the matrix is automatically a valid action, and an unknown action is a
compile error — there’s no second union to keep in sync. The
as const satisfies Record<string, readonly GlobalAdminRole[]> clause keeps the
literal types while still checking the shape.
This matrix coexists with the existing whereGlobalAdminRole policy builder.
OPERATOR_PERMISSIONS is the source of truth for operator-only actions on the
tenant and platform resources. The whereGlobalAdminRole DSL builder remains for
the rare case where a global_admin touches an org-scoped resource through the
existing authorize("...") Hono adapter — for example, reading a tenant’s audit log
via the existing audit-logs route. In v1 that’s rare (the admin worker uses
requireOperator everywhere), but the builder stays for future extension.
The test boundary is a matrix test generated directly from OPERATOR_PERMISSIONS:
every (action, role) pair against its allow and deny case, mirroring the existing
__tests__/typed-actions.test.ts pattern.
SSO provider repository
Section titled “SSO provider repository”The earlier design had a secrets.ts with two functions wrapping two SQL calls — the
textbook shallow module. Worse, plaintext IdP secrets were passed around freely, so a
single accidental log line could leak every tenant’s IdP credentials. The repository
makes the safe path the only path: reads never return plaintext, and plaintext is
reachable only inside a scoped closure.
// Co-located with the sso-config module, NOT a separate package.export class SsoProviderRepository { constructor(private deps: { db: DrizzleClient; secretsKey: string }) {}
// Reads NEVER return plaintext. async findByOrg(orgId: string): Promise<Omit<SsoProvider, "encryptedSecret">[]>; async findById(providerId: string): Promise<Omit<SsoProvider, "encryptedSecret"> | null>;
// Plaintext access only via a scoped closure. async withDecryptedSecret<T>(providerId: string, fn: (secret: string) => Promise<T>): Promise<T>;
async create(input: { orgId; providerId; issuer; clientId; clientSecret; ... }): Promise<SsoProvider>; async rotateSecret(providerId: string, newClientSecret: string): Promise<void>;}There’s a complication: Better Auth’s SSO plugin reads provider rows directly from
node_modules, so those reads can’t be intercepted. To encrypt at rest while keeping
Better Auth working, the underlying table stores ciphertext and a Postgres view
exposes plaintext for Better Auth alone:
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Underlying table stores ciphertext.ALTER TABLE sso_providers ADD COLUMN client_secret_encrypted bytea;-- (Migrate existing plaintext into the encrypted column.)ALTER TABLE sso_providers DROP COLUMN client_secret;
-- View exposes plaintext; Better Auth reads from the view.CREATE VIEW sso_providers_decrypted AS SELECT id, ..., pgp_sym_decrypt(client_secret_encrypted, current_setting('app.sso_key')) AS client_secret FROM sso_providers;The decryption key is provided per session with SET LOCAL app.sso_key = '...'.
Better Auth’s adapter is configured to read from sso_providers_decrypted instead of
sso_providers, while the application’s own code reads the raw, encrypted
sso_providers table by default. withDecryptedSecret opens a connection, sets the
session key, queries the view, hands the secret to the closure, and closes the
connection so the key can’t be reused.
Custom hostname lifecycle
Section titled “Custom hostname lifecycle”The earlier design split one concept across three pieces: cloudflare-api.ts, the
tenancy HTTP module, and the hostname-reconciler.ts cron handler. A single
service owns the whole lifecycle — add, verify, reconcile, remove — so the cron and
the HTTP routes call the same code path.
export class CustomHostnameLifecycle { constructor(private deps: { db; cfApi; auditLogService; emailService; invalidator: FanOutInvalidator }) {}
async add(orgId: string, hostname: string, by: TenantOperator): Promise<TenantCustomHostname>; async verifyTxt(hostnameId: string): Promise<{ verified: boolean; errors?: string[] }>; async reconcile(hostnameId: string): Promise<{ statusChanged: boolean }>; async remove(hostnameId: string, by: TenantOperator): Promise<void>;
// Called from cron — wraps reconcile() across all non-terminal rows. async reconcileAll(): Promise<{ scanned: number; updated: number }>;}It’s co-located in apps/server rather than promoted to a package because it has
three call sites and all three live in apps/server: the HTTP module routes, the
cron scheduled handler, and the admin worker’s support actions (which reach it via
ApiBinding RPC). One service, three consumers — no package needed. The full
state machine these methods drive is covered in
Custom Hostnames.
The test boundary is state-transition tests with a mocked CF API, and the reconciler
cron becomes a thin caller of reconcileAll().
Refactor effort per module
Section titled “Refactor effort per module”The consolidations are bounded work. Here’s the scope of each:
| Module | Scope | New code | Refactored code |
|---|---|---|---|
@repo/tenancy | new package | ~400 LOC | tenant middlewares in 2 workers |
@repo/auth-tokens | new package | ~200 LOC | downstream verifiers (none in v1; future-proof) |
tenantOperations | new service in apps/server | ~300 LOC | 4 admin route handlers shrink to wrappers |
authenticateOperator + @repo/authorization/operator | authn/authz extension | ~180 LOC | admin worker middleware + route guards |
ssoProviderRepository | new module + Postgres view | ~250 LOC | sso-config module + Better Auth adapter config |
customHostnameLifecycle | new service | ~400 LOC | 3 call sites refactor to wrappers |
After @repo/tenancy lands in Phase A, the remaining Phase C work is roughly two
weeks of focused refactoring.
Why deepen now, not later
Section titled “Why deepen now, not later”The natural inclination is to ship Phases A and B first and consolidate later. Done
that way, the workers accumulate inconsistencies — different cache-key formats across
middlewares, different audit-emission patterns across routes — that only get harder to
unify after the fact. Moving @repo/tenancy into Phase A avoids the worst of that
drift, and Phase C finishes the remaining consolidations before the admin and
control-plane surface grows too broad to refactor cheaply.
The cost is paying for boundary design once, upfront. The win is that Phase A and B handlers can target the deep boundary from day one and never grow shallow.
Pitfalls when building deep modules
Section titled “Pitfalls when building deep modules”A few traps recur when consolidating like this:
- One
AGENTS.mdper package. A project convention: every new package gets one, scoped to that package’s role and conventions. - No cyclic dependencies.
@repo/tenancyimports schemas from@repo/db, so@repo/dbmust never import from@repo/tenancy. The same rule holds for@repo/auth-tokens. - Refactor order is strict. Schema migrations come before the tenancy package;
the tenancy package before tenant-operations; and
@repo/auth-tokensbefore tenant-operations, because tenant-operations usesorg.sessionVersion. AdminApiEntrypointis the only route fromapps/admintoapps/server. Don’t expose admin RPC methods on the existingApiEntrypoint, which is also reachable fromapps/auth.- Test at the boundary, not inside it. The whole point of deepening is testability at the small interface — resist the urge to write unit tests for every internal helper.