Debugging & Profiling
This is the developer’s “something is wrong” loop: a program that crashes, hangs, corrupts shared state, or burns CPU and memory you can’t explain. Go ships strong first-party tooling for all four — a debugger (dlv), the race detector (-race), the profiler (pprof), execution tracing (go tool trace), and runtime knobs (GODEBUG).
To make this concrete we’ll lean on a real distributed system: multigres (a set of small Go services that front PostgreSQL). It builds flag-driven profiling on top of runtime/pprof and net/http/pprof, and runs the race detector as a first-class make target — a good case study in how a production codebase wires diagnostics into every service.
Throughout, keep two categories separate: what is generic Go (delve, GODEBUG, go tool pprof/trace) versus what a system bolts on top (the --pprof / --pprof-http flags, a make test-race target, an end-to-end profiling harness). The generic tools transfer everywhere; the wiring is one team’s choices.
What to run when
Section titled “What to run when”When a symptom shows up, the hardest part is picking the right tool. This table is the map for the rest of the page.
| Symptom | Tool | Entry point |
|---|---|---|
| Logic bug, want to step through code | dlv (generic) | dlv debug ./go/cmd/multigateway |
| Test fails intermittently / shared-state corruption | race detector | make test-race |
| Flaky test (timing-dependent) | repeat runs | go test -count=10 |
| High CPU, want hot functions | CPU profile | --pprof cpu or /debug/pprof/profile |
| Growing memory / allocation churn | heap / allocs profile | --pprof mem=heap or /debug/pprof/heap |
| Hang, deadlock, “where are my goroutines?“ | goroutine dump | /debug/pprof/goroutine?debug=2 or SIGQUIT |
| Lock contention | mutex / block profile | --pprof mutex / --pprof block |
| Scheduler latency, GC pauses, where time goes | execution trace | --pprof trace then go tool trace |
| GC behavior / scheduler internals | runtime knob | GODEBUG=gctrace=1 (generic) |
| Microbenchmark + profile a hot package | benchmarks | go test -bench ... -cpuprofile/-memprofile |
1. Stepping with delve (dlv)
Section titled “1. Stepping with delve (dlv)”Delve is the standard Go debugger. It isn’t vendored or wrapped here — there’s no dlv invocation in the Makefile and nothing under tools/. You install it yourself and point it at the binaries under go/cmd/:
# Install (once), generic Go toolinggo install github.com/go-delve/delve/cmd/dlv@latest
# Debug a service: dlv builds it and drops you at a promptdlv debug github.com/multigres/multigres/go/cmd/multigateway -- --pprof cpu
# Debug a single test (build + run under the debugger)dlv test ./go/common/parser/ -- -test.run TestParseSimpleSelect
# Attach to an already-running process by PIDdlv attach <pid>Inside the prompt the verbs you reach for most are break <pkg>.<func> / break file.go:NN, continue, next, step, print <expr>, goroutines, goroutine <id>, and stack. The goroutines command is the debugger’s analogue of a goroutine dump (Section 4) — useful when a service is wedged.
Because these services are launched with Cobra and viperutil flags (see cmd & cobra and config & viperutil), pass service flags after -- so dlv doesn’t consume them.
2. The race detector (-race)
Section titled “2. The race detector (-race)”A data race is two goroutines touching the same memory, at least one of them writing, with no happens-before relationship ordering them. The semantics — why this is undefined behavior, not just “a bug” — are in sync & the memory model. The race detector is how you find them: it instruments memory accesses at runtime and reports any pair it observes racing.
A production codebase typically exposes this as a first-class target. Here it’s a one-liner in the Makefile:
test-race: ## Run tests with race detection. go test -short -v -race ./...make test-race # whole suite under the race detector (short tests)Reading the report
Section titled “Reading the report”A race report has three stacks. Learn to read them in this order:
==================WARNING: DATA RACEWrite at 0x00c0000b4010 by goroutine 8: main.(*counter).inc() .../counter.go:21 +0x44 <- the WRITE: who mutated, and where
Previous read at 0x00c0000b4010 by goroutine 7: main.(*counter).value() .../counter.go:26 +0x3c <- the conflicting READ on the SAME address
Goroutine 8 (running) created at: main.run() .../main.go:14 +0x88 <- WHERE the racing goroutine was spawned==================- The address (
0x00c0...) is the same in both stacks — that’s the shared memory. Often a struct field; map it back to afile:line. - Write stack vs. read/write stack: identify the two operations. At least one is a write. Both
file:lines point at the unsynchronized accesses. - “created at” stack: tells you which
go func()launched the offending goroutine — invaluable when the goroutine is anonymous and the stack alone doesn’t say who started it. Tie this to the goroutine model in concurrency.
The fix is always to establish happens-before between the two accesses: guard the field with a sync.Mutex/RWMutex, replace it with sync/atomic, or hand ownership through a channel so only one goroutine touches it. Which one is appropriate is the subject of sync & the memory model — the report tells you where, that module tells you how.
3. pprof: CPU / heap / goroutine / block / mutex
Section titled “3. pprof: CPU / heap / goroutine / block / mutex”This is where the system adds real machinery. The shape worth internalizing: a running process exposes profiles, you pull one, go tool pprof turns it into a report or a call graph you can read.
flowchart LR SVC["Running service<br/>(runtime/pprof + net/http/pprof)"] SVC -->|"--pprof mode"| FILE["mode.pprof file<br/>(written on shutdown)"] SVC -->|"/debug/pprof/..."| HTTP["HTTP profile download"] FILE --> TOOL["go tool pprof"] HTTP --> TOOL TOOL --> TOP["top / list (text)"] TOOL --> WEB["web / -http (call graph, flamegraph)"]
There are two independent mechanisms, controlled by two separate flags, and they’re easy to confuse:
| Flag | Type | Default | What it does |
|---|---|---|---|
--pprof <mode> | string slice | empty (off) | File-based profiler: writes <mode>.pprof to disk, flushed on shutdown |
--pprof-http | bool | on | Exposes net/http/pprof endpoints (/debug/pprof/...) on the HTTP mux |
Both flags are defined in go/common/servenv/servenv.go:
httpPprof: viperutil.Configure(reg, "pprof-http", viperutil.Options[bool]{ Default: true, FlagName: "pprof-http", ...}),pprofFlag: viperutil.Configure(reg, "pprof", viperutil.Options[[]string]{ Default: []string{}, FlagName: "pprof", ...}),They’re wired into every service’s boot during init — RegisterCommonHTTPEndpoints → HTTPRegisterPprofProfile → pprofInit — so any service built on the shared servenv package (multigateway, multipooler, multiorch, and the rest) gets profiling for free. See service anatomy for where this sits in the lifecycle.
3a. HTTP endpoints (--pprof-http)
Section titled “3a. HTTP endpoints (--pprof-http)”HTTPRegisterPprofProfile registers the standard handlers, gated on the flag:
func (sv *ServEnv) HTTPRegisterPprofProfile() { if !sv.httpPprof.Get() { return } sv.HTTPHandleFunc("/debug/pprof/", pprof.Index) sv.HTTPHandleFunc("/debug/pprof/cmdline", pprof.Cmdline) sv.HTTPHandleFunc("/debug/pprof/profile", pprof.Profile) sv.HTTPHandleFunc("/debug/pprof/symbol", pprof.Symbol) sv.HTTPHandleFunc("/debug/pprof/trace", pprof.Trace)}pprof.Index also serves the always-registered profiles (heap, allocs, goroutine, block, mutex, threadcreate) under /debug/pprof/<name>. What each endpoint means:
| Endpoint | Meaning | Blocks? |
|---|---|---|
/debug/pprof/profile?seconds=N | CPU profile over an N-second window | yes, ~N s |
/debug/pprof/heap | live (in-use) memory snapshot | no |
/debug/pprof/allocs | cumulative allocations since process start | no |
/debug/pprof/goroutine | goroutine stack dump (add ?debug=2 for full stacks) | no |
/debug/pprof/trace | runtime execution trace | yes |
# CPU profile (10-second window)go tool pprof http://<addr>/debug/pprof/profile?seconds=10
# Heap snapshotgo tool pprof http://<addr>/debug/pprof/heap3b. File-based profiler (--pprof <mode>)
Section titled “3b. File-based profiler (--pprof <mode>)”The --pprof flag drives a self-contained profiler in go/common/servenv/pprof.go, modeled on github.com/pkg/profile. The flag string maps to a mode, with comma-separated sub-flags. Each supported mode maps to a runtime API:
--pprof value | runtime call | default rate |
|---|---|---|
cpu | pprof.StartCPUProfile | — |
mem / mem=heap | runtime.MemProfileRate; pprof.Lookup("heap") | 4096 |
mem=allocs | runtime.MemProfileRate; pprof.Lookup("allocs") | 4096 |
mutex | runtime.SetMutexProfileFraction | 1 |
block | runtime.SetBlockProfileRate | 1 |
trace | trace.Start | — |
threads | pprof.Lookup("threadcreate") | — |
goroutine | pprof.Lookup("goroutine") | — |
# CPU profile to a temp dir, flushed on shutdownbin/multigateway --pprof cpu ...
# Heap profile with an explicit sampling rate and output dirbin/multigateway --pprof mem=heap,rate=4096,path=/tmp/mgprof ...
# Mutex contention, sample every eventbin/multipooler --pprof mutex,rate=1 ...
# Defer start until the first SIGUSR1bin/multigateway --pprof cpu,waitSig ...The lifecycle, from pprofInit:
- Profiling starts at init unless
waitSigis set. - A
SIGUSR1handler toggles profiling on/off live — send the signal once to stop, again to start, and so on. WithwaitSig, the firstSIGUSR1starts it. - A stop closure is registered on graceful shutdown (
OnTerm), so the.pproffile is written and flushed when the service exits cleanly.
# First signal starts profiling, second stops itkill -USR1 <pid>3c. Analyzing with go tool pprof
Section titled “3c. Analyzing with go tool pprof”go tool pprof /tmp/profileXXXX/cpu.pprofInside the interactive prompt: top (hottest functions by self time), top -cum (by cumulative time), list <Func> (annotated source with per-line cost), web (SVG call graph, needs Graphviz), peek <Func> (callers/callees). For memory profiles, -inuse_space (live) vs -alloc_space (cumulative) selects the sample type. You can also launch the browser UI:
go tool pprof -http=:8080 cpu.pprof4. Goroutine dumps & panics
Section titled “4. Goroutine dumps & panics”When a service hangs rather than crashes, you need to see what every goroutine is doing.
curl 'http://<addr>/debug/pprof/goroutine?debug=2'?debug=2 prints human-readable stacks; each goroutine shows its state (running, chan receive, sync.Mutex.Lock, select, IO wait) and how long it’s been blocked. Scan for many goroutines stuck on the same channel or lock — that’s your deadlock or contention point. The goroutine states, and why they park, map directly onto the concurrency module.
For a process with no HTTP endpoint (or one too wedged to serve it), send SIGQUIT:
kill -QUIT <pid> # dumps ALL goroutine stacks to stderr, then exitsThis is the same signal the Go runtime treats as “print every goroutine stack and abort” — exactly what you want from a hung process. The e2e setup here uses SIGQUIT deliberately as a “kill now, skip the graceful checkpoint” signal.
Panics
Section titled “Panics”An unrecovered panic prints the panicking goroutine’s stack and exits non-zero. The frame just below panic(...) is the actual failure site; frames above it are runtime machinery. When recovering, capture the stack at panic time with runtime/debug.Stack() — the codebase does this in its single-flight cache (stack := debug.Stack()) to attach the originating stack to a recovered panic, so the context isn’t lost after recover() unwinds.
5. Benchmarks with profiling
Section titled “5. Benchmarks with profiling”Benchmarks (testing.B, b.Loop(), b.ReportAllocs(); covered in testing) double as profiling drivers — the same -cpuprofile/-memprofile flags that work on tests work on benchmarks.
The canonical bench here pits the goyacc-generated SQL parser (ParseSQL, see parser, lexer & AST) against pg-query-go:
func BenchmarkMultigresParser(b *testing.B) { queries := loadPostgresTestQueries(b) b.ReportAllocs() for b.Loop() { // Go 1.24+ benchmark loop idiom for _, query := range queries { asts, err := ParseSQL(query) ... } }}b.ReportAllocs() adds allocs/op and B/op columns; for b.Loop() is the modern replacement for for i := 0; i < b.N; i++.
# Run just the parser benchmarks with allocation stats and profilesgo test -run=^$ -bench=BenchmarkMultigresParser -benchmem \ -cpuprofile cpu.out -memprofile mem.out ./go/common/parser/
# Then analyzego tool pprof cpu.outgo tool pprof -alloc_space mem.outWhole-cluster profiling under load
Section titled “Whole-cluster profiling under load”For realistic profiles you want the services profiled while serving traffic, not a microbenchmark. The e2e harness does this, gated by env vars:
RUN_BENCHMARKS=1 CAPTURE_PPROF=1 \SYSBENCH_CLIENTS=1,8,32 SYSBENCH_DURATION=60 \go test -v -timeout 30m -run TestSysBench \ ./go/test/endtoend/queryserving/benchmarking/RUN_BENCHMARKS=1— required; the test self-skips otherwise.CAPTURE_PPROF=1— capture a CPU profile from multigateway and the primary multipooler.CAPTURE_HEAP=1— also capture heap / allocs / goroutine snapshots.
Output lands as gzip-compressed protobuf under /tmp/.../pprof/<scenario>/<target>/cpu-<service>.pb.gz, readable directly by go tool pprof file.pb.gz. This is the real answer to “profile a live cluster”: pull profiles over the HTTP endpoints from running services while a load generator drives them.
6. Execution traces (go tool trace)
Section titled “6. Execution traces (go tool trace)”A trace records every scheduler event — goroutine start/stop, GC, syscalls, network blocking — giving a timeline that CPU profiles (which only sample) cannot. You can emit one two ways: --pprof trace (file profiler, writes trace.pprof) or the /debug/pprof/trace HTTP endpoint.
# File-based: capture, then open the trace UI in a browserbin/multigateway --pprof trace ... # writes trace.pprof on shutdowngo tool trace trace.pprof
# Over HTTP (5-second window):curl -o trace.out 'http://<addr>/debug/pprof/trace?seconds=5'go tool trace trace.outgo tool trace opens a web UI with goroutine analysis, scheduler latency, network/sync blocking profiles, and per-goroutine timelines. Reach for it when CPU/heap profiles say “time is being spent” but not why — e.g. goroutines blocked waiting on the scheduler, GC pauses stalling the query path, or lock convoys.
7. GODEBUG (generic runtime knob)
Section titled “7. GODEBUG (generic runtime knob)”GODEBUG is a Go runtime environment variable that toggles diagnostic output and behavior. It’s not specific to any codebase — you set it in the environment when you run any Go binary:
# Log every GC cycle: heap sizes, pause times, CPU fractionGODEBUG=gctrace=1 bin/multigateway ...
# Log scheduler state every 1000msGODEBUG=schedtrace=1000 bin/multipooler ...
# Add per-P detail to the scheduler traceGODEBUG=schedtrace=1000,scheddetail=1 bin/multipooler ...Other useful keys: asyncpreemptoff=1 (disable async preemption when investigating tight-loop hangs), madvdontneed=1, and inittrace=1 (package init timing). These are diagnostic aids you layer on top of any binary; they pair well with go tool trace and GC-related heap profiles.
Checkpoints
Section titled “Checkpoints”You understand this page when you can:
- Explain why
dlvis a generic external tool here, and rundlv debug ./go/cmd/multigateway/dlv test ./go/common/parser/. - Run
make test-raceand, given aWARNING: DATA RACEreport, point to the write stack, the conflicting read/write stack, and the goroutine-creation stack — and say which synchronization fix applies. - Name the two profiling flags (
--pprof <mode>file-based,--pprof-httpHTTP endpoints), say which is on by default, and where they’re wired. - Map each
--pprofmode to itsruntimecall and know the default rates (mem 4096, mutex 1, block 1). - Capture a CPU profile both ways and analyze it with
go tool pprof(top,list). - Get a goroutine dump from a hung service via
/debug/pprof/goroutine?debug=2orSIGQUIT. - Profile the parser benchmark with
-cpuprofile/-memprofile, and explain why there’s nomake bench.
Exercises
Section titled “Exercises”- Read a race. Run
make test-race(go test -short -v -race ./…). If a race surfaces, identify the read stack, the write stack, and the “created at” stack; map each frame to afile:lineand describe the fix in terms of the memory model module. - File profiler. Build multigateway and start it with
—pprof cpu, then again with—pprof mem=heap,rate=4096. Find the printed.pprofpath in the startup log, then rungo tool pprof <file>and trytopandlist. - HTTP profiler. Confirm
—pprof-httpis enabled for your build, hit/debug/pprof/in a browser, and pullgo tool pprof http://<addr>/debug/pprof/profile?seconds=10. Compare withgo tool pprof http://<addr>/debug/pprof/heap. - Parser benchmark. Run
go test -run=^$ -bench=BenchmarkMultigresParser -benchmem -cpuprofile cpu.out -memprofile mem.out ./go/common/parser/, then analyzecpu.out. RunBenchmarkPgQueryGotoo and compareallocs/op. - Trace. Start a service with
—pprof trace, exercise it, shut it down, then opentrace.pprofwithgo tool traceand inspect the scheduler-latency and goroutine views. - Live toggle. Start a service with
—pprof cpu,waitSig, sendkill -USR1 <pid>to start, exercise it, sendSIGUSR1again to stop; confirm the file only appears after profiling stops. - Goroutine dump. Capture
curl ‘http://<addr>/debug/pprof/goroutine?debug=2’from a service and identify which goroutines are parked on channels vs. mutexes. Then sendSIGQUITto a process and read the stack dump it prints.
Continue to modules, deps & codegen for managing dependencies and generated files — the parser benchmark above measures generated parser code. Or head back to the tooling track start.
See also
Section titled “See also”- testing workflow —
make test-raceand flaky-test detection. - build & make — Makefile targets and building the binaries you attach
--pprofto. - sync & the memory model — happens-before and the data-race definition behind every
-racereport. - concurrency — the goroutine model behind goroutine dumps.
- testing —
testing.B,b.Loop(),b.ReportAllocs(),-bench/-benchmem. - context —
context.WithTimeoutused by the profile-capture harness. - service anatomy — the lifecycle where
pprofInit/HTTPRegisterPprofProfilehook in. - parser, lexer & AST — the goyacc parser the benchmark measures.
- glossary — pprof, race detector, goroutine dump, GODEBUG.
- idioms & gotchas — profiling and race gotchas.