Deep Modules

A deep module is a small, simple interface that hides a lot of implementation — a narrow door into a big room. The term is John Ousterhout’s; the idea is the leverage: callers work against a few well-named methods instead of re-deriving the same rules from prose every time they touch the concept. The opposite is a shallow module — one that exposes almost as much interface as it has implementation, so it barely earns its keep (think a wrapper class whose methods map one-to-one onto a SQL query each).

Six concepts in this system started out shallow and scattered: the same idea spread across several files, each consumer re-implementing the rules. This chapter consolidates each one behind a single entry point. Read every section as the same four beats:

The problem — the concept lives in N files, and the rules drift.
The module — one small typed interface, the only way in.
What it hides — the implementation now behind that door.
The test boundary — you test the small interface, not the internals.

The payoff is that every later consumer targets the deep boundary and never grows shallow again.

One of the six, @repo/tenancy, is already a runtime boundary that auth, server, and custom-hostname flows all depend on, so it lands early (Phase A). The other five are consolidations applied once the core multi-tenant behavior is stable (Phase C).

Implementation order

The order is strict because each step builds on the one before it — schema before the package that queries it, the token verifier before the service that reads org.sessionVersion:

The Phase 0 spike validates the Better Auth SSO schema and the Cloudflare hostname state model.
Phase A schema migrations land.
Phase A: @repo/tenancy — every worker uses it.
Phase C: @repo/auth-tokens — the verifier-side helper.
Phase C: authenticateOperator plus @repo/authorization/operator.
Phase C: the customHostnameLifecycle service in apps/server.
Phase C: ssoProviderRepository, with its Postgres view.
Phase C: the tenantOperations service.
Workers refactor: route handlers shrink to thin wrappers around the modules above.

Tenancy package

Tenant context is the first thing every request needs and the easiest thing to let fragment. Spread across parseHostname, two tenant middlewares (one in apps/server, one in apps/auth), cache-fanout.ts, and the bumpTenantCacheVersion RPC, it becomes five files for one concept — and the cache-key shape leaks into invalidation code in three separate workers. Pulling it into one package gives every worker a single way to turn a host into an organization.

export type TenantContext = {
  organizationId: string;
  host: string;
  slug?: string;
  kind: "subdomain" | "custom";
  enforceSSO: boolean;
  sessionVersion: number;
};

export type ResolveDeps = {
  db: DrizzleClient;
  cache: Cache;
  kv: KVNamespace;
  wildcardSuffix: string;
  adminHost: string;
};

export async function resolveTenant(host: string, deps: ResolveDeps): Promise<TenantContext | null>;

// Asymmetric invalidator: workers that ONLY invalidate their own colo,
// vs the admin worker that fans out to peers.
export type Invalidator = {
  invalidateOwn(spec: { kind: "subdomain" | "custom"; key: string }): Promise<void>;
  bumpOwnVersion(): Promise<void>;
};

export type FanOutInvalidator = Invalidator & {
  fanOut(spec: { kind: "subdomain" | "custom"; key: string }): Promise<void>;
  fanOutBumpVersion(): Promise<void>;
};

// Workers create their own variant in their entrypoint:
export function createInvalidator(env: { CACHE: KVNamespace }): Invalidator;
export function createFanOutInvalidator(env: { CACHE: KVNamespace; API: AuthRpc; AUTH: AuthRpc }): FanOutInvalidator;

What the package hides behind that interface:

The parseHostname rules — lowercase plus NFC normalization, the regex, reserved names, and the tombstone check.
The Cache API key shape — the version prefix and the kind/key segments.
The KV version-bump TTL.
The deleted-org filter (WHERE deleted_at IS NULL).
The admin-host exclusion.
The reverse-lookup join paths: a subdomain resolves through organizations.slug; a custom hostname joins tenant_custom_hostnames to organizations.

The invalidator is split in two on purpose. apps/auth has no binding back to apps/admin, so it gets the plain Invalidator (own-colo only) and reuses the existing apps/auth → apps/server.invalidateTenant(...) binding for cross-server invalidation. The admin worker gets the FanOutInvalidator, which calls both apps/server and apps/auth. The asymmetry of the binding graph is reflected in the type, so a worker can’t accidentally call a fan-out method it has no binding for.

Test boundary: integration tests against an in-process Postgres with a mocked Cache and KV, asserting:

Resolution by slug returns the expected org.
Resolution by custom hostname joins correctly.
Soft-deleted orgs return null.
A cache miss writes a positive entry; a cache hit returns without touching the DB.
The negative-cache TTL is short.
A version bump invalidates entries written under the old version.

This one suite replaces four separate test files from the earlier design.

Auth tokens package

JWT verification started life as a protocol written in prose: every consumer was told to check aud, iss, org.host, org.id, and that sessionVersion >= db. No module owned it, so each new consumer re-implemented the same checks from an English description — exactly the kind of duplication that drifts. The fix is to make the protocol a module that returns a typed result.

export type AuthorizedClaims = {
  sub: string;
  email: string;
  roleSlugs: string[];
  platform: "web" | "mobile";
  org: { id: string; host: string; sessionVersion: number };
};

export type VerifyError =
  | { kind: "expired" }
  | { kind: "wrong_aud"; actual: string; expected: string }
  | { kind: "wrong_iss"; actual: string; expected: string }
  | { kind: "wrong_org"; actual: string; expected: string }
  | { kind: "wrong_host"; actual: string; expected: string }
  | { kind: "stale_session"; claim: number; current: number }
  | { kind: "bad_signature" };

export type VerifyOpts = {
  expectedHost: string;
  expectedOrgId: string;
  jwks: JWKSResolver;
};

// Stateful variant — for internal verifiers with DB access (uses the up-to-date sessionVersion).
export async function verifyTenantJwt(
  token: string,
  opts: VerifyOpts & { db: DrizzleClient },
): Promise<AuthorizedClaims | VerifyError>;

// Stateless variant — for external verifiers (the caller supplies the version they last saw).
export async function verifyTenantJwtStateless(
  token: string,
  opts: VerifyOpts & { expectedMinSessionVersion: number },
): Promise<AuthorizedClaims | VerifyError>;

The two variants exist because two kinds of verifier exist. An internal verifier in another worker has DB access and can read the live sessionVersion, so it uses verifyTenantJwt. An external, downstream service has no DB access, so it uses verifyTenantJwtStateless and supplies the most recent version it saw.

Minting deliberately stays in Better Auth. Its jwt plugin keeps owning key management and JWKS distribution — JWKS (JSON Web Key Set) being the public phone book of signing keys: a verifier fetches it once, then validates token signatures offline against it. This package only consumes tokens; the design just extends definePayload to add the org claim. Replacing Better Auth’s mint side would break its session helpers and force the team to manage JWKS by hand — a large maintenance burden for no gain. Internal verifiers (other workers) and external verifiers alike fetch JWKS from Better Auth’s /api/auth/jwks endpoint and cache it with createRemoteJWKSet.

The test boundary is round-trip tests covering every claim combination and every failure mode — the coverage the prose protocol never had.

Tenant operations

Every operator-on-tenant mutation — create, suspend, restore, delete — has to coordinate four things in the same transaction: the DB writes, a dual-scope audit record, a session-version bump, and (after commit) a cache invalidation. Spread across many endpoint handlers, nothing structurally guarantees that the next endpoint someone writes remembers all four. A service that owns the coordination turns “remember four steps” into “call one method.”

type TenantOperator = { kind: "global_admin"; admin: GlobalAdmin } | { kind: "system"; reason: string };

export class TenantOperations {
  constructor(private deps: { db; auditLogService; invalidator: FanOutInvalidator }) {}

  async create(payload: { slug; name; primaryAdminEmail }, by: TenantOperator): Promise<{ orgId; invitationId; hostedAt }>;
  async suspend(orgId: string, by: TenantOperator, reason: string): Promise<void>;
  async restore(orgId: string, by: TenantOperator): Promise<void>;
  async delete(orgId: string, by: TenantOperator): Promise<void>;  // soft-delete + tombstone slug
}

Each method runs the same shape of transaction, with the version bump, session deletes, and slug tombstone switched on only for the operations that need them:

db.transaction([
  DB writes (insert/update),
  dual-scope audit (createDualScope inside the tx),
  session-version bump (suspend / restore / delete only),
  session deletes (suspend / delete only),
  slug tombstone (delete only),
])
post-commit: invalidator.fanOut(spec)   // + bump version on rename (rename deferred to v2)

The by parameter is a union because not every mutation is operator-initiated. Billing-driven suspension and scheduled DPA-deletion run as the system, which needs a typed actor that isn’t a global admin. The audit log already supports actor_type: "system"; the union types the call site so a system mutation can’t masquerade as an operator.

rename is deliberately absent. A slug rename invalidates SSO callback URLs registered with external IdPs — the IdP holds an absolute URL pointing at the old hostname, so the rename breaks SSO until the tenant updates their IdP config. “Hide this from callers” is the wrong abstraction here, because the operator genuinely has to coordinate with the tenant. v2 adds rename with an explicit operator runbook rather than pretending it’s a transparent operation.

With the service in place, the admin endpoints shrink to wrappers:

suspendTenant.guard = [requireOperator("tenant.suspend")];
suspendTenant.handler = async (c) => {
  const { id } = c.req.param();
  const { reason } = await c.req.valid("json");
  await tenantOperations.suspend(id, { kind: "global_admin", admin: c.var.globalAdmin }, reason);
  return c.json({ ok: true });
};

Five lines after validation, and the four-piece coordination is invisible to the route. The test boundary lives on the service, not the handlers: each method is tested for its transactional invariants — partial-failure rollback, the dual audit row count, session-version monotonicity, and that post-commit invalidation is actually called.

Operator authentication

The admin worker’s identity boundary spans three things: verifying the Cloudflare Access JWT, the enrollment-token flow on first login, and DB-side activity tracking. Left as separate middleware snippets across several places, it’s hard to reason about as one boundary. Folding it into one function gives the admin worker a single “who is this operator?” call that returns a typed result.

type AuthFailure =
  | { kind: "missing_token" }
  | { kind: "invalid_token" }
  | { kind: "service_token" }
  | { kind: "enrollment_required" }
  | { kind: "deactivated" };

class JwksCache {
  async get(): Promise<ReturnType<typeof createRemoteJWKSet>>;
  reset(): void;
}

export async function authenticateOperator(
  c: AdminContext,
  deps: { jwks: JwksCache; db: DrizzleClient },
): Promise<{ ok: true; admin: GlobalAdmin } | { ok: false; failure: AuthFailure }>;

What it hides:

CF Access JWT verification against the team JWKS.
Service-token rejection.
The first-login enrollment-token claim flow.
The lastActiveAt ping.
A stable failure mapping for the SPA (ENROLLMENT_REQUIRED, deactivated, invalid token).

The test boundary is one place: JWT success and failure, service-token rejection, the enrollment-token claim race, a deactivated user, and JWKS reset behavior are all exercised against this single interface.

Operator authorization

The earlier design answered “what can an operator do?” with four separate pieces — the policies, a whereGlobalAdminRole builder, an adminBypassTenantIsolation middleware, and a buildGlobalAdminPrincipal attribute layout. A reader couldn’t answer “can support suspend a tenant?” without cross-referencing all four. A single permission matrix, with the action type derived from it, makes the answer one lookup.

export const OPERATOR_PERMISSIONS = {
  "tenant.create":         ["super_admin", "support"],
  "tenant.suspend":        ["super_admin", "support"],
  "tenant.restore":        ["super_admin", "support"],
  "tenant.delete":         ["super_admin"],
  "tenant.invite_admin":   ["super_admin", "support"],
  "tenant.list":           ["super_admin", "support", "read_only", "security"],
  "tenant.view":           ["super_admin", "support", "read_only", "security"],
  "platform.view_audit_logs_global": ["super_admin", "support", "read_only", "security"],
  "platform.view_system_metrics":    ["super_admin", "support", "read_only", "security"],
  "platform.manage_feature_flags":   ["super_admin", "support"],
  "platform.manage_global_admins":   ["super_admin"],
} as const satisfies Record<string, readonly GlobalAdminRole[]>;

// Type DERIVED from matrix keys — no separate union to drift.
export type OperatorAction = keyof typeof OPERATOR_PERMISSIONS;

export const requireOperator = (action: OperatorAction) =>
  createMiddleware<AdminEnv>(async (c, next) => {
    const operator = c.get("globalAdmin");
    if (!operator) return c.text("Forbidden", 403);
    if (!OPERATOR_PERMISSIONS[action].includes(operator.role)) {
      return c.text("Forbidden", 403);
    }
    return next();
  });

export function canOperator(admin: GlobalAdmin, action: OperatorAction): boolean {
  return OPERATOR_PERMISSIONS[action].includes(admin.role);
}

The key move is that OperatorAction is keyof typeof OPERATOR_PERMISSIONS. Every action in the matrix is automatically a valid action, and an unknown action is a compile error — there’s no second union to keep in sync. The as const satisfies Record<string, readonly GlobalAdminRole[]> clause keeps the literal types while still checking the shape.

This matrix coexists with the existing whereGlobalAdminRole policy builder. OPERATOR_PERMISSIONS is the source of truth for operator-only actions on the tenant and platform resources. The whereGlobalAdminRole DSL builder remains for the rare case where a global_admin touches an org-scoped resource through the existing authorize("...") Hono adapter — for example, reading a tenant’s audit log via the existing audit-logs route. In v1 that’s rare (the admin worker uses requireOperator everywhere), but the builder stays for future extension.

The test boundary is a matrix test generated directly from OPERATOR_PERMISSIONS: every (action, role) pair against its allow and deny case, mirroring the existing __tests__/typed-actions.test.ts pattern.

SSO provider repository

The earlier design had a secrets.ts with two functions wrapping two SQL calls — the textbook shallow module. Worse, plaintext IdP secrets were passed around freely, so a single accidental log line could leak every tenant’s IdP credentials. The repository makes the safe path the only path: reads never return plaintext, and plaintext is reachable only inside a scoped closure.

// Co-located with the sso-config module, NOT a separate package.
export class SsoProviderRepository {
  constructor(private deps: { db: DrizzleClient; secretsKey: string }) {}

  // Reads NEVER return plaintext.
  async findByOrg(orgId: string): Promise<Omit<SsoProvider, "encryptedSecret">[]>;
  async findById(providerId: string): Promise<Omit<SsoProvider, "encryptedSecret"> | null>;

  // Plaintext access only via a scoped closure.
  async withDecryptedSecret<T>(providerId: string, fn: (secret: string) => Promise<T>): Promise<T>;

  async create(input: { orgId; providerId; issuer; clientId; clientSecret; ... }): Promise<SsoProvider>;
  async rotateSecret(providerId: string, newClientSecret: string): Promise<void>;
}

There’s a complication: Better Auth’s SSO plugin reads provider rows directly from node_modules, so those reads can’t be intercepted. To encrypt at rest while keeping Better Auth working, the underlying table stores ciphertext and a Postgres view exposes plaintext for Better Auth alone:

CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Underlying table stores ciphertext.
ALTER TABLE sso_providers ADD COLUMN client_secret_encrypted bytea;
-- (Migrate existing plaintext into the encrypted column.)
ALTER TABLE sso_providers DROP COLUMN client_secret;

-- View exposes plaintext; Better Auth reads from the view.
CREATE VIEW sso_providers_decrypted AS
  SELECT id, ..., pgp_sym_decrypt(client_secret_encrypted, current_setting('app.sso_key')) AS client_secret
  FROM sso_providers;

The decryption key is provided per session with SET LOCAL app.sso_key = '...'. Better Auth’s adapter is configured to read from sso_providers_decrypted instead of sso_providers, while the application’s own code reads the raw, encrypted sso_providers table by default. withDecryptedSecret opens a connection, sets the session key, queries the view, hands the secret to the closure, and closes the connection so the key can’t be reused.

Custom hostname lifecycle

The earlier design split one concept across three pieces: cloudflare-api.ts, the tenancy HTTP module, and the hostname-reconciler.ts cron handler. A single service owns the whole lifecycle — add, verify, reconcile, remove — so the cron and the HTTP routes call the same code path.

export class CustomHostnameLifecycle {
  constructor(private deps: { db; cfApi; auditLogService; emailService; invalidator: FanOutInvalidator }) {}

  async add(orgId: string, hostname: string, by: TenantOperator): Promise<TenantCustomHostname>;
  async verifyTxt(hostnameId: string): Promise<{ verified: boolean; errors?: string[] }>;
  async reconcile(hostnameId: string): Promise<{ statusChanged: boolean }>;
  async remove(hostnameId: string, by: TenantOperator): Promise<void>;

  // Called from cron — wraps reconcile() across all non-terminal rows.
  async reconcileAll(): Promise<{ scanned: number; updated: number }>;
}

It’s co-located in apps/server rather than promoted to a package because it has three call sites and all three live in apps/server: the HTTP module routes, the cron scheduled handler, and the admin worker’s support actions (which reach it via ApiBinding RPC). One service, three consumers — no package needed. The full state machine these methods drive is covered in Custom Hostnames.

The test boundary is state-transition tests with a mocked CF API, and the reconciler cron becomes a thin caller of reconcileAll().

Refactor effort per module

The consolidations are bounded work. Here’s the scope of each:

Module	Scope	New code	Refactored code
`@repo/tenancy`	new package	~400 LOC	tenant middlewares in 2 workers
`@repo/auth-tokens`	new package	~200 LOC	downstream verifiers (none in v1; future-proof)
`tenantOperations`	new service in `apps/server`	~300 LOC	4 admin route handlers shrink to wrappers
`authenticateOperator` + `@repo/authorization/operator`	authn/authz extension	~180 LOC	admin worker middleware + route guards
`ssoProviderRepository`	new module + Postgres view	~250 LOC	sso-config module + Better Auth adapter config
`customHostnameLifecycle`	new service	~400 LOC	3 call sites refactor to wrappers

After @repo/tenancy lands in Phase A, the remaining Phase C work is roughly two weeks of focused refactoring.

Why deepen now, not later

The natural inclination is to ship Phases A and B first and consolidate later. Done that way, the workers accumulate inconsistencies — different cache-key formats across middlewares, different audit-emission patterns across routes — that only get harder to unify after the fact. Moving @repo/tenancy into Phase A avoids the worst of that drift, and Phase C finishes the remaining consolidations before the admin and control-plane surface grows too broad to refactor cheaply.

The cost is paying for boundary design once, upfront. The win is that Phase A and B handlers can target the deep boundary from day one and never grow shallow.

Pitfalls when building deep modules

A few traps recur when consolidating like this:

One AGENTS.md per package. A project convention: every new package gets one, scoped to that package’s role and conventions.
No cyclic dependencies. @repo/tenancy imports schemas from @repo/db, so @repo/db must never import from @repo/tenancy. The same rule holds for @repo/auth-tokens.
Refactor order is strict. Schema migrations come before the tenancy package; the tenancy package before tenant-operations; and @repo/auth-tokens before tenant-operations, because tenant-operations uses org.sessionVersion.
AdminApiEntrypoint is the only route from apps/admin to apps/server. Don’t expose admin RPC methods on the existing ApiEntrypoint, which is also reachable from apps/auth.
Test at the boundary, not inside it. The whole point of deepening is testability at the small interface — resist the urge to write unit tests for every internal helper.