Skip to content

Gotchas & Lessons

This is a catalogue of traps caught while designing and reviewing the multi-tenant build, grouped by how they would have hurt:

The tables are for scanning — skim them as a checklist when you touch the multi-tenancy surface. Below them, Lessons learned distils the durable takeaways, and the v2 backlog lists what we deferred.

These are platform and library behaviors that contradict a reasonable first guess. Each one would have surfaced as a crash, a silent drop, or a 404 rather than a clean error at design time.

GotchaWhat it actually doesWhere it bites
Cloudflare Queues are not pub/subOnly one active consumer per queue. A “queue fan-out” design for cache invalidation silently loses messages or throws when a second consumer registers.Cache invalidation
certificate_authority: "google" is Enterprise-onlyAPI error 1459 on Pro/Business.Custom hostname creation
custom_metadata is Enterprise-onlySilently dropped on Pro/Business, so a reverse lookup that relies on it fails.Custom hostname creation
Wrong CNAME targetapp.example.com.cdn.cloudflare.net is undocumented; the correct target is customers.example.com, a proxied CNAME on your zone.Custom hostname onboarding
Better Auth’s hooks API takes a single middleware functionIn 2026 it is one createAuthMiddleware(...) function, not the older { matcher, handler } array shape.All Better Auth hook plugins
Request headers are immutableAUTH.fetch(c.req.raw) after mutating headers throws. You must build new Request(c.req.raw, { headers }).Auth worker proxy
generateIdForModel("tenantHostname") falls through to the ent_* prefixThe switch in ids.ts is closed, so a new model silently gets the wrong prefix. Call generatePrefixedCuid(ID_PREFIXES.tenantHostname) directly.New-table ID generation
organization.metadata is text, not jsonbIt is a Better Auth-managed column, so JSON queries against it fail.Storing the enforce_sso flag
audit_logs.actor_id had an FK to usersOperator gad_* IDs violate it. Drop the FK before writing operator actor IDs.Admin worker audit
Better Auth accept-invitation requires an already-authenticated userThe endpoint takes an { invitationId } body for a signed-in user — it is not a user-creation endpoint. Bootstrapping a user from an invite needs custom orchestration.Tenant admin onboarding
Better Auth createUser is not idempotentIt returns USER_ALREADY_EXISTS on a duplicate email, so the recovery path must catch the error and look up the existing user.Accept-invite retry
sendInvitationEmail is not wired in createOrganizationPluginOut of the box no invitation email is sent at all; it must be added.Operator-led onboarding
An apps/app worker without a fetch handler returns 404 even with an ASSETS bindingThe minimal (req, env) => env.ASSETS.fetch(req) handler is required.Tenant SPA serving
Missing not_found_handling: "single-page-application" in the apps/app wranglerTanStack Router client-side routes (e.g. /dashboard) 404 on reload.Tenant SPA
Custom Domain syntax is pattern: "admin.example.com"No /* and no zone_name; the pattern: "admin.example.com/*" with zone_name form is wrong.apps/admin wrangler
secrets.required is now a real Wrangler config keyIt is used for validation and type generation, but secret values are still managed with wrangler secret put. Assuming it is ignored is stale.All worker wrangler files
Better Auth’s dynamic baseURL reads forwarded host/proto firstRaw proxying into the auth worker lets forwarded headers distort callback and trusted-origin behavior.Auth worker proxy
The Turbo task is generate-openapi, not openapi:cacheThe fabricated task name would fail the build pipeline.Web app build
workers_dev: false was missingThe default name.account.workers.dev URL bypasses Cloudflare Access entirely.Critical security exposure
There is no whereRole policy DSL builderThe codebase has whereOwner, whereTargetIsSelf, where(predicate), withRelation, and withOrgRole — no whereRole. The per-role matrix is unimplementable until you add whereGlobalAdminRole.Operator authorization
systemAdminRoles short-circuits condition evaluationAdding global_admin to systemAdminRoles while also using whereGlobalAdminRole policies makes the bypass kill the per-role check — every operator becomes super_admin. Do not add global_admin to systemAdminRoles.Operator authorization

Security holes that would have been exploitable

Section titled “Security holes that would have been exploitable”

Each row is an attack the naive design enabled and the mitigation that closed it. The parenthetical D-numbers point at the matching entry in the Decision Log; the full threat model lives in Security.

GotchaAttackMitigation
Queue fan-out plus per-colo cache.deleteCache invalidation does not reliably propagate, so a suspended tenant keeps serving traffic.RPC fan-out plus KV cache versioning (D28)
Email-fallback for first-login cfAccessSubAn attacker registers the same email at an IdP, races the first login, and silently takes over an operator account.Enrollment-token model (D31)
admin.support.query as a bufferable audit eventAn operator scrapes 1000 tenants, queuing 1000 events that may be lost on worker eviction.Classified CRITICAL plus row cap plus rate limit (D33)
Tenant suspension did not revoke active sessionsA 1h–7d window where the suspended tenant’s users keep operating on existing JWTs.session_version bump plus session DELETE in the same transaction (D34)
INTERNAL_ADMIN_TOKEN shared secret with no clear injection pointA leak via logs or error stacks bypasses the organization.create gate, with no clear rotation mechanism.Removed entirely; the service binding is the perimeter and the admin inserts orgs via Drizzle directly (D35)
audit_logs had no append-only invariant at the DB levelA super_admin who is also DB-credentialed could mutate audit history.A Postgres trigger raises on UPDATE/DELETE (D30)
accountLinking.allowDifferentEmails: true (a default in some Better Auth versions)A tenant-controlled SSO IdP attaches a different email to an existing user, enabling cross-tenant takeover.Set explicitly to false
provisionUser runs after token exchangeA confused-deputy attack: an Acme IdP response replayed at globex’s callback creates a session for the wrong tenant.ssoCallbackGuardPlugin runs before token exchange
trustedOrigins echo-back of the inbound hostIf Host is ever spoofable, an attacker marks https://attacker.com as trusted.A function validates the host against the tenant set
OIDC client secrets stored as plaintext in the DBA backup leak compromises every tenant’s IdP integration.pgcrypto plus a Postgres view plus log redaction (D13, D73)
Better Auth organization.create mounted publicly by defaultAn authenticated tenant user can create rogue orgs.An unconditional before hook (D22, D35)
SameSite=strict does not isolate sibling subdomainsTenant subdomains under the same registrable domain are still same-site, and strict cookies can interfere with OAuth/OIDC callback state.Host-only cookies plus explicit origin/CSRF checks (D15)
JWT aud/iss global, no org claimA JWT minted on tenant A validates against tenant B’s downstream services.Per-tenant aud/iss plus org.host/org.id/sessionVersion claims (D12, D34)
disableSignUp: false (the current template default)With operator-led onboarding, anyone could still sign up via Better Auth’s standard flow.disableSignUp: true (D32)

These would not crash and are not exploitable — they are the quiet bugs that produce wrong behavior, sequential scans, or build-order failures.

GotchaIssue
The cache API key shape leaks across workersAll three workers must agree on a string format that lived nowhere as a single function. Centralized in @repo/tenancy (D51).
OpenAPI build chicken-and-eggThe web app’s code-gen depends on the worker’s openapi.cache.json, but worker builds depend on nothing. Fixed with Turbo dependsOn: ["^generate-openapi"].
Cross-package wrangler asset-directory referenceapps/admin’s wrangler points at ../admin-ui/dist, so build order matters: apps/admin-ui#build must precede apps/admin#deploy.
The auth worker has no service binding back to the admin workerCache invalidation must be asymmetric: the admin fans out, while auth uses apps/auth → apps/server.invalidateTenant(...) instead.
Self-FKs on global_admins.created_by and deactivated_byDrizzle’s circular self-reference pattern needs a (): AnyPgColumn type cast.
Better Auth’s SSO plugin reads the provider table directly from node_modulesThose reads can’t be intercepted, so encryption coexists with them via the sso_providers_decrypted view (D73).
pgcrypto SET LOCAL app.sso_key per sessionThe decryption key must not persist in connection state across requests. It is closure-scoped via withDecryptedSecret.
The apex host case is realapp.example.com is a legitimate request with no tenant. Routes that require a tenant must default-deny when c.var.tenant === null, with an allowlist of valid apex routes.
Reserved-slug enforcement missing at the DB levelA slug UNIQUE constraint catches collisions, but format and length are not enforced — add a CHECK constraint or rely on app-layer validation plus the UNIQUE constraint.
parseHostname must explicitly reject admin.example.comOtherwise it is classified as a custom tenant lookup and leaks a 404 timing oracle.
audit_logs needs (actor_type, created_at DESC) and (organization_id, created_at DESC) indexesCross-tenant operator queries and tenant-scoped audit views are common; without indexes they are sequential scans.

If you skip the tables, read this section. Each lesson is one trap above, generalised into something you can carry to your own multi-tenant build — the kind of thing you wish someone had told you before you wrote the code, not after the incident.

Cache invalidation will be the hardest thing you build — design it first, not last. The intuitive answer (a queue everyone subscribes to) is the wrong primitive: Cloudflare Queues allow only one consumer, so a fan-out design silently drops messages. What worked was RPC fan-out plus KV cache versioning. And the workers aren’t symmetric — the admin fans out to everyone, but the auth worker has no binding back to admin, so it invalidates a different way. If a “cache invalidation” line item looks small on your plan, move it to the top.

Pass tenant context as a typed RPC parameter, never as a header. The moment tenancy rides in an HTTP header you’ve signed up for algorithm choice, replay protection, and downgrade attacks — a whole security surface, for free, that you didn’t want. A service-binding RPC call with a typed argument makes that entire class of bug unrepresentable. Prefer the boring typed call.

Per-host cookies are necessary but not sufficient. It is tempting to think subdomains isolate tenants. They don’t: a.example.com and b.example.com are same-site, so a strict-SameSite cookie does nothing to stop sibling-tenant confusion. The real boundary is host-only cookies plus an explicit origin/CSRF check on every mutation.

One JWT check is never enough — scope tokens on five axes. aud alone, iss alone, even both together let a token minted for tenant A validate against tenant B. Per-tenant aud/iss narrows it; the org claim pins the tenant; and sessionVersion is the part people forget — without it you have no way to revoke, which is exactly what you need the day you suspend a tenant.

For operator-led SaaS, turn self-signup off and mean it. Self-serve and operator-led onboarding don’t mix gracefully — leave the default disableSignUp: false in place and “anyone can sign up” quietly co-exists with your invite-only flow. Set disableSignUp: true and build the one onboarding path you actually want.

accept-invitation assumes the user already exists — it won’t create one. It’s designed for a signed-in user accepting an org invite, not for bootstrapping a brand-new account from an email link. If your invite is the account-creation moment, you write that orchestration yourself.

Run tenant-binding checks before the IdP code is exchanged, not after. provisionUser fires after token exchange, which is too late: a response meant for Acme, replayed at Globex’s callback, has already minted a session. The guard has to sit in front of the exchange.

Treat account-linking defaults as hostile until you’ve pinned every one. A permissive default (allowDifferentEmails: true) lets a tenant-controlled IdP attach a different email to an existing user — a cross-tenant takeover. Set accountLinking.enabled and allowDifferentEmails explicitly, keep trustedProviders: [] empty, and approve linking by hand inside provisionUser.

Re-check the hooks API against the version you’re on. In 2026 a hook is a single createAuthMiddleware(...) function, not the older { matcher, handler } array. Library shapes drift between majors; a tutorial from last year will compile against your types and then misbehave.

You don’t need Enterprise for v1 — but you do need to know which line items are gated. Pro/Business covers the v1 build. custom_metadata, certificate_authority: "google", and wildcard custom certs are Enterprise-only, and the cruel part is how they fail: custom_metadata is silently dropped, certificate_authority throws API error 1459. Confirm the plan tier of every feature you lean on before you depend on it.

Verify domain ownership on your side before you ask the CA for a cert. HTTP DCV alone won’t let a competitor steal a cert for your customer’s domain — they can’t serve the challenge — but they can burn your cert quota by requesting hostnames you’ll never activate. A TXT pre-verification step on your side closes the squatting window.

Point the CNAME at a proxied CNAME on your own zone. Not a worker URL, and not the ...cdn.cloudflare.net form (which is undocumented and wrong). The correct target is a proxied record on your zone, e.g. customers.example.com.

Webhooks exist for SSL and hostname state — know they’re there before you reach for polling. v1 polls every 60 seconds, which is simple and fine; just don’t mistake polling for the only option when activation latency starts to matter.

workers_dev: false is one line, and forgetting it un-protects everything. Access binds protection at the hostname level, so the default name.account.workers.dev URL sails right past your Access policy. The auth panel you carefully gated is reachable, unauthenticated, on a URL you didn’t think about.

You can’t verify MFA inside the worker — there’s no amr claim. Don’t try to assert “this user did MFA” from the Access JWT; rely on the Access policy and IdP-side enforcement instead.

Reject service tokens on human-only panels. A service-token JWT has a different shape (type: "app", common_name set, no email); if your admin panel assumes a human, an automated token can walk in. Check the shape and refuse it.

payload.sub isn’t guaranteed stable — and the obvious workaround is a takeover hole. Falling back to email for first-login identity is the natural fix and also the exact vector an attacker uses to race-register an operator’s email at an IdP. Bind first login with an enrollment token instead.

Land your core runtime boundary early; deferring it is technical debt that compounds. @repo/tenancy isn’t a “nice refactor for later” — it’s a first-order runtime boundary every worker depends on, so it has to ship in Phase A. The genuinely deep modules that can wait should wait (Phase C); the trick is telling the two apart.

Decide your dependency direction once and enforce it. @repo/tenancy imports schemas from @repo/db, which makes the reverse import dangerously easy to add by reflex — and that creates a cycle. Pick the arrow (db never imports tenancy) and hold the line.

Test deep modules at the boundary, not at every internal helper. The whole point of a deep module is a small interface over a lot of behavior; test that interface and resist the urge to pin down every private function — those tests just make refactoring painful.

Honor the project conventions that are easy to forget. Each new package needs an AGENTS.md. Small, mechanical, skipped exactly when you’re moving fast.

These never made it into a locked decision — they are deferred to a later version or to implementation-time follow-up. v2

  1. Custom-hostname allowlist refresh strategy — per-request DB lookup versus in-memory with a 5-minute refresh. Leaning toward in-memory.
  2. Webhook integration for hostname state changes, replacing or supplementing the 60-second polling.
  3. Queue-driven hostname reconciler, once sustained pending hostnames exceed ~6,000.
  4. Tenant-scoped user identity, to close passkey/two-factor cross-tenant data bleed.
  5. enforce_sso default — auto-flip to true on the first verified SSO provider, or always require explicit opt-in. Leaning opt-in.
  6. In-app TOTP for operators, as defense in depth above Cloudflare Access MFA.
  7. Customer notification on operator access — some SaaS notify the customer when a support engineer reads their data. v1 ships the audit log only; v2 considers email.
  8. Approval workflow for destructive operator actions (delete tenant, deactivate a super_admin). v1 ships single-actor; v2 may add a two-person rule.
  9. Operator activity dashboard — surface lastActiveAt and daily action counts.
  10. Per-tenant feature flags — the schema is not designed yet; v1 ships scaffolding.
  11. MFA verification in the worker via the Cloudflare Access Identity API.
  12. Logout UX in the admin panel — proxy /cdn-cgi/access/logout for an explicit sign-out, versus relying on a browser bookmark.
  13. Marketing / find-your-team page on the apex (D76) — v1 ships a static page; v2 adds the backend lookup.