Custom Hostnames
A tenant on a default subdomain (acme.app.example.com) is fine for getting started,
but customers eventually want the product to live on a domain they own, like
app.acme.com. That means provisioning a TLS certificate for a hostname you don’t
control, tracking its issuance over minutes-to-days, and doing it without letting one
tenant burn the platform’s shared certificate quota. Cloudflare for SaaS handles the
hostname and certificate machinery, and because the stack already runs on Workers the
integration is a natural fit. The work that’s left is a clean onboarding flow, an
internal lifecycle state machine, and a reconciler that keeps the database honest
about what Cloudflare actually did.
What Cloudflare for SaaS gives you
Section titled “What Cloudflare for SaaS gives you”Cloudflare for SaaS is bundled into the Pro and
Business plans with no separate SKU. Each zone
includes 100 custom hostnames, with $0.10/hostname overage and a hard ceiling of
50,000 outside Enterprise. You provision a hostname with a single API call —
POST /zones/{zone_id}/custom_hostnames — and Cloudflare auto-issues the SSL
certificate via Domain Control Validation (DCV — how the CA confirms the domain is
really under your control before it signs a cert for it).
DCV comes in a few flavors. HTTP DCV is the simplest: Cloudflare validates the moment
the tenant’s CNAME (the DNS alias pointing app.acme.com at the platform) goes live,
with no extra DNS record for the tenant to manage. That’s the method this design uses.
Two request fields look tempting but will bite you on Pro/Business:
The onboarding flow
Section titled “The onboarding flow”The naive approach is to call Cloudflare the instant a tenant types a hostname. That’s exactly the abuse channel to avoid (see Why TXT pre-verification below). Instead the flow is two-phase. First the tenant proves they control the domain on the platform’s side by adding a TXT record only the real owner could place — that’s the TXT pre-verification gate. Only after that gate passes does the platform touch Cloudflare at all: it registers the hostname and asks the tenant to CNAME.
-
Tenant adds a TXT record (platform-side verification). The tenant submits the hostname:
Add a pending hostname POST /api/tenancy/hostnamesContent-Type: application/json{ "hostname": "app.acme.com" }A per-org rate limit applies — at most 10 pending hostnames at a time, and 50 per day. The server inserts a
tenant_custom_hostnamerow withlifecycle_status: "awaiting_txt",verification_verified_at: NULL, andcf_hostname_id: NULL. No Cloudflare API call happens yet. The response carries a per-orgverification_token(a cuid) and the record to add:Verification instructions surfaced to the tenant Add a DNS TXT record:_app-example-verify.app.acme.com -> <verification_token>Then click "Verify" to continue. -
The platform verifies the TXT. The tenant clicks “Verify”:
Verify the TXT record POST /api/tenancy/hostnames/{id}/verifyThe server resolves the TXT record over DNS-over-HTTPS (
https://cloudflare-dns.com/dns-query). On a match it setsverification_verified_at = now()and writes ahostname.verifiedaudit event (CRITICAL, dual-scope). -
The platform registers the hostname with Cloudflare. Only after verification does the server call the Cloudflare API. Note that HTTP DCV (
method: "http") is what makes the later CNAME-only flow work, andcertificate_authorityis deliberately absent:apps/server — register the custom hostname await fetch(`https://api.cloudflare.com/client/v4/zones/${zoneId}/custom_hostnames`, {method: "POST",headers: { Authorization: `Bearer ${cfApiToken}`, "Content-Type": "application/json" },body: JSON.stringify({hostname: "app.acme.com",ssl: { method: "http", type: "dv", settings: { min_tls_version: "1.2" } },}),});The row is updated with the returned
cf_hostname_id,lifecycle_status: "pending_cloudflare", and the raw Cloudflare validation fields. The tenant is now shown the CNAME to create:CNAME instructions surfaced to the tenant Create a CNAME record:app.acme.com -> customers.example.comExpect ~2-5 minutes of TLS errors during initial cert issuance after theCNAME is live. To eliminate downtime, you can pre-validate via the/.well-known/pki-validation/ token shown below. -
The reconciler tracks status. A cron on
apps/serverruns every 60 seconds. It polls non-terminal hostnames and stores the raw Cloudflare validation state separately from the internal lifecycle status. See The reconciler below. -
Notification on activation. When Cloudflare reports
status === "active"and the row was previously not active, the reconciler emits ahostname.activatedaudit (CRITICAL, dual-scope) and sends aHostnameVerifiedEmailto the org’s admins — exactly once.
Why TXT pre-verification
Section titled “Why TXT pre-verification”Here’s the subtlety: HTTP DCV already protects the certificate. An attacker who
submits { hostname: "app.competitor.com" } can never finish issuance, because the
competitor’s server never serves the validation challenge. So why add a TXT gate at all?
Because the registration itself is the resource being abused. Every
POST /zones/{zone_id}/custom_hostnames consumes a slot in the platform’s shared,
zone-wide hostname quota and counts toward Cloudflare’s abuse heuristics — regardless of
whether the cert ever issues. One script firing thousands of competitor hostnames could
exhaust the quota or get the whole zone flagged, hurting every legitimate tenant.
The TXT record is a cheaper proof of control that runs entirely on the platform’s side,
before a single Cloudflare slot is spent. An attacker can’t place
_app-example-verify.app.competitor.com in a zone they don’t own, so they never get
past the gate — and Cloudflare is never touched on their behalf.
The CNAME target
Section titled “The CNAME target”Tenants CNAME to customers.example.com, a proxied CNAME on the platform’s zone that
points at the fallback origin — the single backend Cloudflare for SaaS routes every
custom hostname to when no per-hostname origin is configured. It’s set once at the zone
level in the Cloudflare for SaaS config, so every tenant domain lands on the same
Workers app.
The internal lifecycle
Section titled “The internal lifecycle”Cloudflare’s own validation states are noisy and its timeouts are not infinite — a
hostname that never validates moves through Moved and is eventually Deleted after a
7-day backoff. So the database keeps the raw Cloudflare state in cf_status /
cf_ssl_status and maps it onto a small internal lifecycle enum that the rest of the
system reasons about. Decoupling the two is what lets the product survive Cloudflare
renaming or adding states.
stateDiagram-v2 [*] --> awaiting_txt: row created (no CF call) awaiting_txt --> pending_cloudflare: TXT verified + registered with CF pending_cloudflare --> active: CF reports active pending_cloudflare --> error: caa_error / validation failure error --> active: tenant fixes DNS, CF revalidates pending_cloudflare --> moved: CF validation times out moved --> deleted: CF deletes after 7-day backoff active --> deleted: tenant DELETEs the hostname active --> moved: CF detaches the hostname deleted --> [*]
The terminal-ish states are active (working), moved (Cloudflare detached it but
hasn’t deleted yet), deleted (tombstoned in the database — never hard-deleted, for
audit/history), and error (a recoverable validation failure the tenant can fix).
The reconciler
Section titled “The reconciler”Polling is the source of truth. The reconciler is a scheduled handler that picks up
every registered-but-not-terminal hostname, asks Cloudflare for its current state, and
writes the mapped lifecycle status plus the raw Cloudflare fields back into the row in
a transaction. A null response from Cloudflare means the hostname was deleted after
its backoff, which tombstones the row.
// Phase A standalone; folded into customHostnameLifecycle.reconcileAll() in Phase C.export default { async scheduled(event, env, ctx) { await withDrizzleClient(env, async (db) => { const rows = await db.select().from(tenantCustomHostnames) .where(and( isNotNull(tenantCustomHostnames.cfHostnameId), notInArray(tenantCustomHostnames.lifecycleStatus, ["active", "deleted"]), )) .limit(100);
for (const row of rows) { const cfState = await cfApi.getCustomHostname(env, row.cfHostnameId); await db.transaction(async (tx) => { if (cfState === null) { // CF deleted after 7-day backoff — tombstone our row await tx.update(tenantCustomHostnames).set({ lifecycleStatus: "deleted" }).where(eq(tenantCustomHostnames.id, row.id)); await auditLogService.create({ event: AUDIT_EVENTS.HOSTNAME.DELETED.event, actorType: "system", targetType: "hostname", targetId: row.id, metadata: { hostname: row.hostname, reason: "cf_deleted_after_backoff" }, }, tx); return; } await tx.update(tenantCustomHostnames).set({ lifecycleStatus: mapCloudflareStatus(cfState.status), cfStatus: cfState.status, cfSslStatus: cfState.ssl.status, verificationErrors: [...(cfState.verification_errors ?? []), ...(cfState.ssl.validation_errors ?? [])], lastReconciledAt: new Date(), }).where(eq(tenantCustomHostnames.id, row.id));
if (cfState.status === "active" && row.lifecycleStatus !== "active") { await auditLogService.create({ event: AUDIT_EVENTS.HOSTNAME.ACTIVATED.event, actorType: "system", targetType: "hostname", targetId: row.id, metadata: { hostname: row.hostname }, }, tx); } }); } }, { waitUntil: (p) => ctx.waitUntil(p) }); },};A few operational details that aren’t obvious from the code:
- Cron and Hyperdrive. Wrap the handler body in
withDrizzleClient(...)exactly like a request handler.placement.mode: "smart"does not apply to scheduled handlers, so you accept the latency to the Hyperdrive pool’s region. - Trace sampling. The server worker samples traces at 1% by default. For the
scheduledhandler, bump that to 100% — cron runs are rare (1440/day) and traced runs are the only forensics you get. Add per-row structured logs as a second layer.
Webhooks as a latency optimization
Section titled “Webhooks as a latency optimization”Cloudflare for SaaS offers webhooks for hostname/SSL state changes — validation, issuance, deployment, deletion, and renewal. The v1 design stays on polling as the source of truth; webhook integration is a low-risk latency optimization to consider if faster activation notifications matter, without waiting for the next 60-second scan. Either way, the reconciler remains the durability backstop.
The Cloudflare API token
Section titled “The Cloudflare API token”A single token, stored as a Cloudflare Secret (CLOUDFLARE_API_TOKEN), drives all of
this. Its scopes are deliberately narrow: Zone:Read,
SSL and Certificates:Edit, and Custom Hostnames:Edit — on the SaaS zone only, never
account-wide. The runbook rotates it quarterly.
Failure modes worth designing for
Section titled “Failure modes worth designing for”A few real-world conditions need explicit handling in the UI and the reconciler.
The TLS error window
Section titled “The TLS error window”When a tenant flips their CNAME, traffic immediately reaches the platform while the certificate is still being issued, so the browser sees a TLS handshake error for roughly 2-5 minutes. Two mitigations are surfaced in the admin UI, and tenants choose:
- The UI warns up-front during onboarding, and most tenants accept the brief errors.
- Optional pre-validation via
/.well-known/pki-validation/{token}served on the tenant’s existing origin before they flip DNS eliminates the window entirely.
CAA records block issuance
Section titled “CAA records block issuance”If the tenant’s apex zone has CAA records that don’t permit pki.goog or
letsencrypt.org, issuance silently fails and Cloudflare returns a caa_error in
verification_errors. The UI surfaces the required records:
Add to your DNS: acme.com CAA 0 issue "pki.goog" acme.com CAA 0 issue "letsencrypt.org"Tenant on another CDN
Section titled “Tenant on another CDN”A tenant already fronted by Fastly or Akamai may have DNS obfuscation that breaks DCV; the hostname won’t validate. Document this so support recognizes it.
Apex tenant domains
Section titled “Apex tenant domains”Apex domains like acme.com itself are not supported in v1 — apex proxying for
tenant-owned domains requires Cloudflare Enterprise BYOIP. Tenants must use a subdomain
such as app.acme.com or acme-portal.acme.com.
API rate limit
Section titled “API rate limit”The Cloudflare API rate limit is 1200 requests per 5 minutes globally. At 6,000 pending hostnames the reconciler stays within budget, but bursts could trip it, so the API wrapper uses exponential backoff.
Deletion
Section titled “Deletion”Deletion never hard-deletes the row — the history is kept for audit. It is also guarded so a tenant can’t accidentally lock themselves out:
DELETE /api/tenancy/hostnames/{id}The service guard refuses if removing this hostname would leave the org with no access
path — that is, there’s no other custom hostname and enforceSSO is configured for a
host other than the default subdomain. Otherwise it:
- Calls
DELETEon the Cloudflare API. - Sets
lifecycle_status = 'deleted'(the row is tombstoned, not removed). - Writes a
hostname.deletedaudit event (CRITICAL, dual-scope). - Invalidates the cache via service-binding fan-out — the positive cache for the
hostname is purged in both
apps/serverandapps/auth, the same mechanism covered in Tenant Resolution.
The schema
Section titled “The schema”The table is the single record of both the platform’s view (the lifecycle status,
verification token and timestamp) and Cloudflare’s raw view (cf_status,
cf_ssl_status, verification_errors). The composite index on
(lifecycle_status, last_reconciled_at) is what makes the reconciler’s “oldest
non-terminal rows first” query cheap.
tenantCustomHostnames = pgTable("tenant_custom_hostnames", { id: varchar(255).primaryKey().$defaultFn(() => generatePrefixedCuid("tnh")), organizationId: text().notNull().references(organizations.id, { onDelete: "cascade" }), hostname: text().notNull().unique(), cfHostnameId: text().unique(), lifecycleStatus: text({ enum: ["awaiting_txt", "pending_cloudflare", "active", "moved", "deleted", "error"] }).notNull().default("awaiting_txt"), cfStatus: text(), cfSslStatus: text(), verificationErrors: jsonb<string[]>().notNull().default([]), verificationToken: text().notNull(), verificationVerifiedAt: timestamp({ withTimezone: true }), lastReconciledAt: timestamp({ withTimezone: true }), createdAt: createdAt(), updatedAt: updatedAt(),}, (t) => [ index("tch_organization_id_idx").on(t.organizationId), index("tch_status_reconciled_idx").on(t.lifecycleStatus, t.lastReconciledAt),]);The full schema and the phased migration order live in Schema & Migrations.
Gotchas worth remembering
Section titled “Gotchas worth remembering”- The create response may not include
validation_records. To surface pre-validation tokens, do a delayed follow-upGETrather than assuming thePOSTbody contains them. - Validation timeout is 7 days, not infinite. Cloudflare moves a hostname through
Movedand laterDeletedif validation never completes, so the database must keep the raw Cloudflare state and map it into a separate internal lifecycle. - Tenants on another CDN won’t validate. Fastly/Akamai DNS obfuscation breaks DCV.
- The API rate limit is 1200/5min globally. Fine at 6,000 pending hostnames, but bursts can trip it — use exponential backoff in the API wrapper.
- No
certificate_authorityfield on the request body — it’s Enterprise-only. Don’t send it even when you think the default should be Google; let Cloudflare pick. - No
custom_metadatafield either — treat it as unavailable unless the account is explicitly entitled. Look up bycf_hostname_idorhostname. - Wildcard custom certs (e.g.
*.acme.com) are Enterprise-only. Only single hostnames in v1. - The CNAME target is
customers.example.com(proxied on the platform zone), not the worker URL.