Synchronization & the Memory Model
Go gives you two ways to coordinate goroutines: channels (hand off ownership of data) and shared memory protected by synchronization (mutexes, atomics). Unlike Java or C#, Go has no synchronized keyword, no implicit per-object monitor, and no lock-by-default. Nothing protects a field unless you write the protection yourself.
The flip side is that the rules for when a write becomes visible are precise and worth internalizing early — getting them wrong is undefined behavior, not just a stale read. Throughout we’ll point at a real distributed-systems codebase, multigres (“Vitess for Postgres”), where every one of these primitives shows up in production code.
This chapter builds on concurrency (goroutines and channels) and pointers, values & memory (why these types travel by pointer).
The data race, and why it is undefined behavior
Section titled “The data race, and why it is undefined behavior”A data race is: two goroutines access the same memory location, at least one of them writes, and there is no synchronization ordering the accesses. That is the whole definition. It does not require the accesses to be “at the same time” in any wall-clock sense.
// RACE: two goroutines, one writes, no synchronization.var counter intgo func() { counter++ }() // read-modify-writego func() { counter++ }() // read-modify-writeIt is tempting to think the worst case is “we lose an increment.” In Go a data race is undefined behavior. The compiler and CPU are allowed to reorder, cache, and tear a write — split a multi-word write so another goroutine sees half-old, half-new bytes. A racing program can observe values that no sequential interleaving could ever produce, or crash outright. So do not reason about racy code as “eventually consistent” — reason about it as “broken.”
The practical tool is the race detector, run via the test toolchain:
go test -short -v -race ./...-race instruments every memory access and reports the two stacks involved when an unsynchronized access pair occurs. It only catches races on code paths that actually execute, so it is necessary but not sufficient — it cannot prove the absence of races. Run it under realistic concurrency. How to read its output lives in tooling/debugging & profiling and tooling/testing workflow.
Happens-before: the only thing the memory model guarantees
Section titled “Happens-before: the only thing the memory model guarantees”The memory model is a set of “happens-before” edges. A write W is guaranteed visible to a read R only if there is a chain of happens-before edges from W to R. The edges you will use in practice:
| Primitive | The edge it creates |
|---|---|
| Mutex | An Unlock happens-before any subsequent Lock of the same mutex. |
| Atomics | An atomic Store happens-before any atomic Load that observes that stored value. |
| Channels | A send happens-before the corresponding receive completes; a close happens-before a receive that returns zero because the channel is closed. |
| Once | The function passed to once.Do(f) completes happens-before any once.Do(...) call returns. |
| WaitGroup | The Done calls happen-before the Wait that they release. |
| Goroutine start | The go statement happens-before the goroutine begins. |
Everything else is unordered. Two writes with no connecting edge can be seen in either order, or not at all. This is why you cannot “just” set a flag in one goroutine and spin-read it in another — without an edge, the reader may never see the new value.
Concretely, a mutex hands a write across goroutines like this: the unlock on the writer is what makes its store visible to whoever locks next.
sequenceDiagram participant W as Writer goroutine participant M as sync.Mutex participant R as Reader goroutine W->>M: Lock() W->>W: counter = 42 W->>M: Unlock() Note over M: Unlock happens-before<br/>the next Lock R->>M: Lock() R->>R: read counter (sees 42) R->>M: Unlock()
Every primitive in this chapter exists to create one of these edges. The glossary has the formal phrasings of happens-before, data race, and CAS.
sync.Mutex: embed it, protect named fields
Section titled “sync.Mutex: embed it, protect named fields”The dominant pattern in Go is: put an unexported mu sync.Mutex as a field, document exactly which fields it guards, and have every method Lock/defer Unlock around them. The zero value of a Mutex is an unlocked, ready-to-use mutex — there is no constructor and no initialization step.
type Counter struct { mu sync.Mutex // guards n n int}
func (c *Counter) Inc() { c.mu.Lock(); defer c.mu.Unlock(); c.n++ }func (c *Counter) Value() int { c.mu.Lock(); defer c.mu.Unlock(); return c.n }defer c.mu.Unlock() immediately after Lock() is the idiom: the unlock survives early returns and panics, so you cannot accidentally leave the mutex held. Note the receiver is *Counter, not Counter — a method on a value receiver would lock a copy of the mutex, protecting nothing (more on copying below).
The real thing
Section titled “The real thing”A connection-state struct in the codebase is the clean whole-file case study. It leads with the mutex and a one-line contract, then every accessor follows the same shape:
// All methods are thread-safe.type ConnectionState struct { // mu protects all mutable fields in this struct. mu sync.Mutex
User string Settings *Settings PreparedStatements map[string]*query.PreparedStatement}
func (s *ConnectionState) GetUser() string { if s == nil { return "" } s.mu.Lock() defer s.mu.Unlock() return s.User}Three things to absorb:
-
The comment names what
muguards. “mu protects all mutable fields in this struct” is not decoration — it is the contract that tells a future reader that touchingUser,Settings, orPreparedStatementswithout holdingmuis a bug. The pairing of lock and data is the only thing keeping the invariant; Go will not check it for you. -
The nil-receiver guard comes before the lock.
if s == nil { return "" }runs first. Reverse the order ands.mu.Lock()would dereference a nil pointer and panic. This guard recurs in nearly every method so callers holding a possibly-nil*ConnectionStatecan call methods uniformly. -
Map mutation is protected too. Go maps are not safe for concurrent write/read — a concurrent map write is detected by the runtime and crashes the process with
fatal error: concurrent map writes, which you’ll see without even needing-race. The mutex is what makes the map safe.
Narrow the critical section: hold the lock only around the mutation
Section titled “Narrow the critical section: hold the lock only around the mutation”defer Unlock is the safe default, but sometimes you want the lock held for as short a window as possible — long enough to mutate the shared fields, then released before slow or blocking work. An action-lock helper does exactly this: the semaphore (a potentially blocking call) is taken outside the mutex, and the mutex is held only to bump two counters.
// Try to acquire the semaphore (may block).if err := al.sema.Acquire(ctx, 1); err != nil { return ctx, mterrors.Wrap(err, "failed to acquire action lock")}
// Generate a unique ID for this acquisition.al.mu.Lock()lockID := al.nextIDal.nextID++al.currentID = lockIDal.mu.Unlock()Release does the same — reads currentID under the lock, validates it outside the lock, then re-takes the lock to clear state before releasing the semaphore last.
Note al.sema is a *semaphore.Weighted from golang.org/x/sync/semaphore — an external module, not the standard library. There is no sync.Semaphore. The struct here legitimately mixes three coordination tools at once: a mutex for the counters, the weighted semaphore for mutual exclusion of the action, and an atomic.Bool (covered below) for the released flag.
sync.RWMutex: only when reads vastly dominate
Section titled “sync.RWMutex: only when reads vastly dominate”RWMutex adds RLock/RUnlock for readers that may proceed concurrently with each other, while Lock/Unlock give a writer exclusive access. Use it only for read-mostly state.
A server-environment struct uses one for readiness checks, keeping a plain Mutex for one-shot init state and a separate RWMutex only for the ready-check slice:
mu sync.Mutexinited boollisteningURL url.URLreadyMu sync.RWMutexreadyChecks []func() errorThe reason: the /ready HTTP endpoint can be polled frequently by many load balancers and orchestrators at once, and each handler only reads the slice. The writer takes the full Lock to append; the readers take RLock:
sv.readyMu.RLock()checks := sv.readyCheckssv.readyMu.RUnlock()for _, check := range checks { // ... call each check without holding the lock}Notice the reader copies the slice header out under RLock and then runs the (potentially slow) checks with the lock released — the narrow-critical-section discipline again.
The naming convention readyMu next to readyChecks is deliberate: the lock name echoes the data it guards so the pairing is obvious at the field level.
sync.Once: run something exactly once
Section titled “sync.Once: run something exactly once”Once.Do(f) runs f exactly once across all goroutines, ever. Concurrent callers block until the first f returns, then return without re-running it. The zero value is ready to use.
var ( once sync.Once instance *Client)
func Get() *Client { once.Do(func() { instance = newClient() }) return instance}In the codebase, a pooler record uses a Once to start a background publisher goroutine exactly once even if Register is called repeatedly:
func (r *poolerRecord) Register(parent context.Context, alarm func(string)) { r.registerOnce.Do(func() { ctx, cancel := context.WithCancel(parent) r.publisherMu.Lock() r.publisherCancel = cancel r.publisherMu.Unlock() r.publisherWG.Go(func() { r.runPublisher(ctx) }) // ... kick off initial registration retry loop })}The doc comment for Register says “Idempotent.” — the Once is what makes that true. Without it, a second Register would spawn a second publisher goroutine and a second cancel function, leaking the first. The happens-before guarantee matters here too: everything the Do body did (storing publisherCancel, launching the goroutine) is visible to any later caller whose Do returns immediately.
sync.WaitGroup: fan out, then join
Section titled “sync.WaitGroup: fan out, then join”A WaitGroup counts outstanding goroutines. Add(n) raises the counter, each goroutine calls Done() (usually deferred) to decrement, and Wait() blocks until the counter reaches zero.
Classic form
Section titled “Classic form”A recovery loop is the textbook fan-out/join: process each shard’s problems in parallel, then wait for all of them.
var wg sync.WaitGroupfor _, shardProblems := range problemsByShard { wg.Add(1) go func(problems []types.Problem) { defer wg.Done() re.processShardProblems(ctx, problems[0].ShardKey, problems) }(shardProblems)}wg.Wait()The func(problems []types.Problem) { ... }(shardProblems) passes the loop variable as an argument. Before Go 1.22 the loop variable was shared across iterations, so capturing shardProblems by closure would have every goroutine see the last value — passing it as an argument was the fix. As of Go 1.22 each iteration gets a fresh variable, so this is now belt-and-suspenders rather than strictly required. It is still perfectly readable, so leaving it is fine.
Modern form: wg.Go (Go 1.25)
Section titled “Modern form: wg.Go (Go 1.25)”Go 1.25 added WaitGroup.Go, which fuses Add(1), go, and Done() into one call:
r.publisherWG.Go(func() { r.runPublisher(ctx)})This is equivalent to wg.Add(1); go func() { defer wg.Done(); r.runPublisher(ctx) }() but impossible to misuse — you cannot forget the Done, and the Add is guaranteed to precede the goroutine. Shutdown then calls r.publisherWG.Wait() after cancelling the context, so it blocks until the publisher has fully stopped. Prefer wg.Go for new code; recognize the classic form because it is still everywhere.
sync/atomic: lock-free single-word access
Section titled “sync/atomic: lock-free single-word access”The sync/atomic package provides operations that read-modify-write a single memory word atomically, without a lock. Modern Go gives you typed wrappers — atomic.Int64, atomic.Int32, atomic.Uint64, atomic.Bool, atomic.Pointer[T] — with methods .Load, .Store, .Add, .Swap, .CompareAndSwap. The zero value is ready (it reads as zero).
type Stats struct { hits atomic.Int64}
func (s *Stats) Hit() { s.hits.Add(1) }func (s *Stats) Count() int64 { return s.hits.Load() }Expose atomics through methods, never the field
Section titled “Expose atomics through methods, never the field”A connection pool makes its whole metrics struct out of atomic counters, unexported and reachable only through Load() accessors:
type Metrics struct { maxLifetimeClosed atomic.Int64 getCount atomic.Int64 waitCount atomic.Int64 waitTime atomic.Int64 // ...}
func (m *Metrics) GetCount() int64 { return m.getCount.Load() }func (m *Metrics) WaitTime() time.Duration { return time.Duration(m.waitTime.Load()) }// ... one Load() accessor per fieldNote WaitTime stores nanoseconds as an int64 and wraps the load in time.Duration — a time.Duration is an int64, so it travels through an atomic.Int64 cleanly. Each counter is independent, so atomics are the right tool: no two of them need to change as a unit.
Mutex for grouped invariants, atomics for independent counters
Section titled “Mutex for grouped invariants, atomics for independent counters”The contrast is sharp in a heartbeat reader:
lagMu sync.MutexlastKnownLag time.DurationlastKnownTime time.TimelastKnownError error
reads atomic.Int64readErrors atomic.Int64reads and readErrors are standalone counters, so they are atomics. But lastKnownLag, lastKnownTime, and lastKnownError form a triple that must be consistent together — the lag value, the time it was measured, and any error must all reflect the same heartbeat read. So they share one Mutex.
Compare-and-swap loops
Section titled “Compare-and-swap loops”CompareAndSwap(old, new) writes new only if the current value still equals old, returning whether it succeeded. It is the building block for lock-free read-modify-write: load the old value, compute the new one, try to swap, and retry if someone else changed it underneath you.
A demand tracker keeps a concurrent maximum this way, with no mutex at all:
for { old := d.buckets[currentIdx].Load() if sampled <= old { break } if d.buckets[currentIdx].CompareAndSwap(old, sampled) { break }}Read this carefully. If sampled is not bigger than what is there, we are done. Otherwise we attempt to store sampled, but only if the bucket still holds the old we read. If another goroutine bumped it in between, the CAS fails, we loop, re-Load the new old, and re-decide. Why not just if sampled > old { Store(sampled) }? Because between the Load and the Store, another goroutine could store a larger value, which our Store would then clobber — losing the higher max. The CAS loop closes that window.
atomic.Pointer[T]: lock-free read-mostly snapshots (copy-on-write)
Section titled “atomic.Pointer[T]: lock-free read-mostly snapshots (copy-on-write)”atomic.Pointer[T] swaps a whole pointer atomically. Combined with treating the pointed-to value as immutable after publication, it gives you a read-lock-free alternative to RWMutex: readers Load() the pointer and use the value with no lock; a writer builds a brand-new value and Stores the new pointer (copy-on-write). Nobody ever mutates the value in place.
A cancel manager shows this next to a Mutex doing the opposite, in the same struct:
// prefixCache maps PID prefix to gateway gRPC address. Replaced atomically// on cache miss or periodic refresh; reads are lock-free.prefixCache atomic.Pointer[map[uint32]string]
// clientsMu protects clients.clientsMu sync.Mutexclients map[string]*gatewayConnThe contrast is the whole lesson:
prefixCacheis a swap-whole-snapshot cache. ReadersLoad()it without locking. The map it points to is never mutated after being stored — a refresh builds a fresh map and stores its pointer.clientsis mutated in place (entries added on demand), so it needs theclientsMumutex.
The same technique drives the lock-free accessors on the pooler record’s desired atomic.Pointer[...] field:
func (r *poolerRecord) Type() clustermetadatapb.PoolerType { return r.desired.Load().Type }func (r *poolerRecord) Hostname() string { return r.desired.Load().Hostname }Mutation goes through a Mutate method, which clones the proto, applies changes to the clone, then stores the new pointer — never touching the published value.
CompareAndSwap(nil, ...) as a do-this-once guard
Section titled “CompareAndSwap(nil, ...) as a do-this-once guard”A pool uses a close atomic.Pointer[chan struct{}] both as an open/closed flag and as a one-shot guard:
func (pool *Pool[C]) open() { closeChan := make(chan struct{}) if !pool.close.CompareAndSwap(nil, &closeChan) { // already open return } // ... first opener proceeds}If close is still nil, we atomically install our close channel and proceed; if some other goroutine got there first, the CAS fails and we bail out. This is a sync.Once-like guard built from a single atomic, with the bonus that the stored value (the channel) is what the rest of the code uses to signal shutdown.
Never copy a struct that contains a sync or atomic type
Section titled “Never copy a struct that contains a sync or atomic type”This deserves its own section because it is the most common Go concurrency bug after the bare data race.
type Counter struct { mu sync.Mutex n int}
func bad(c Counter) { c.mu.Lock() } // c is a COPY: this locks a different mutexCopying a sync.Mutex, RWMutex, WaitGroup, Once, or any atomic.* value duplicates its internal state and silently breaks it — two copies of a mutex protect nothing, a copied WaitGroup loses its counter, a copied atomic is a separate variable. go vet’s copylocks analyzer flags this at build time (and golangci-lint runs it; see tooling/lint & format).
This is why these structs almost always travel by pointer. Every connection-state method has a pointer receiver; the demand tracker, pooler record, pool, reader, and cancel manager are all passed and stored as pointers. It is also why the demand tracker indexes its []atomic.Int64 in place instead of ranging over copies. See pointers, values & memory for the deeper treatment of receiver choice and copylocks.
Lock ordering: preventing the AB/BA deadlock
Section titled “Lock ordering: preventing the AB/BA deadlock”When one goroutine holds lock A and waits for B while another holds B and waits for A, both block forever. The defense is discipline: whenever code holds nested locks, always acquire them in the same documented order everywhere.
A discovery struct writes the hierarchy straight into the comments:
// State (protected by mu)// Lock order: acquire this BEFORE CellPoolerDiscovery.mumu sync.MutexcellWatchers map[string]*CellPoolerDiscoverylastCellRefresh time.Time
// Listeners for pooler changes (protected by listenersMu)listenersMu sync.Mutexlisteners []PoolerChangeListenerThe parent discovery’s mu must be taken before any child’s mu. Because every code path obeys the same order, the cycle that causes a deadlock can never form. There is no language feature enforcing this — the comment is the enforcement, so it is your job to keep it true.
Channels vs. mutexes: which to reach for
Section titled “Channels vs. mutexes: which to reach for”Go’s slogan is “Do not communicate by sharing memory; instead, share memory by communicating.” That is guidance, not law. Both tools are first-class, and real code uses both — often in the same struct.
Use channels when you are handing off ownership of a value, building a pipeline, or signaling an event. The publisher in the pooler record uses a size-1 buffered wakeup chan struct{} to signal “there is work to publish” without accumulating duplicate signals — a non-blocking send schedules at most one pending wakeup. The data itself lives behind an atomic.Pointer. So the signal is a channel and the state is an atomic, side by side.
Use a mutex (or atomic) + fields when you are protecting shared mutable state with a simple invariant: a counter, a map, a cache, a small consistent group of fields. Forcing that through a channel (a goroutine that owns the state and serves requests over a channel) would be more code and slower for no benefit.
Rules of thumb:
| Access pattern | Reach for |
|---|---|
| One owner producing values for others to consume | channel |
| Many goroutines updating one independent counter | atomic |
| Many goroutines reading/writing a shared map or a group of fields with an invariant | mutex |
| Many readers, rare writers, value treatable as immutable | atomic.Pointer copy-on-write |
| ”Do this exactly once” | sync.Once (or CompareAndSwap(nil, ...)) |
There is no purity test here. The skill is matching the primitive to the access pattern — exactly what a single struct combining Once + WaitGroup + atomic.Pointer + Mutex demonstrates.
Checkpoints
Section titled “Checkpoints”-
Why is a Go data race “undefined behavior” rather than just a possibly-stale read?
Answer
Because the compiler and CPU may reorder, cache, and tear (split) memory operations when no happens-before edge constrains them. A racing program can observe values no sequential interleaving could produce, or crash. You must not reason about it as “eventually consistent” — only-raceand proper synchronization make it safe. -
In the connection-state struct, why must
if s == nil { return ... }come befores.mu.Lock()?Answer
s.muis a field on the pointers. Ifsis nil, evaluatings.mu.Lock()dereferences a nil pointer and panics. The nil guard lets callers invoke methods on a possibly-nil pointer uniformly, so it must run before any field access including the lock. -
Why does the demand tracker use a CAS loop instead of
if sampled > old { bucket.Store(sampled) }?Answer
Between theLoadofoldand theStore, another goroutine could store an even larger value; a plainStorewould clobber it and lose the higher max.CompareAndSwap(old, sampled)only writes if the bucket is stillold, retrying otherwise, so concurrent updates cannot lose the maximum. -
When is a single
atomic.Pointerthe right choice over three separate atomics for a lag/time/error triple?Answer
Never with three separate atomics — they are not atomic together, so a reader could mix fields from different reads. Either use one mutex, or store all three inside one immutable struct behind a singleatomic.Pointerand swap the whole struct copy-on-write. Both make the triple change as one unit.
Exercises
Section titled “Exercises”-
In a thread-safe struct of your own, list every method that takes the lock and confirm each uses a pointer receiver. Then explain why returning a
*Settingspointer is not sufficient to safely mutate thatSettingsafterward, and how aCloneunder the lock avoids the problem. -
Take the CAS loop above. On paper, trace two concurrent callers where a naive
if sampled > old { Store }loses the higher value, then explain why the bucket slice is[]atomic.Int64indexed in place rather than ranged over. -
Find two uses of
atomic.Pointer[T]in any concurrent codebase. For each, decide whether the pointed-to value is treated as immutable afterStore, and describe the bug that appears if a reader mutated the loaded value in place. -
Compare the two
WaitGroupstyles: classicAdd(1)/go/defer Done()/Wait()versuswg.Go(...). State in one sentence whatwg.Gofuses, and one footgun it removes. -
Explain why a lag/time/error triple shares one mutex while independent counters are
atomic.Int64. Construct a concrete interleaving where replacing the mutex with three atomics lets a reader observe an inconsistent triple. -
Read a
test-racetarget in any Makefile. Explain what-raceinstruments, why it is necessary-but-not-sufficient, and why the target often combines-racewith-short.
Standard Library & Idioms — the everyday packages and patterns you reach for next.
See also
Section titled “See also”- Concurrency — goroutines and channels; the channels-vs-mutexes guidance ties back here.
- Context — an
atomic.Boolreleased flag carried inside acontext.Value, and cancellation driving a goroutine’s lifecycle. - Pointers, Values & Memory — why mutex/atomic-containing structs must use pointer receivers and must not be copied (
copylocks). - Types, Structs & Methods — method-set and receiver-type background for the accessor-method-over-atomic-field idiom.
- Debugging & Profiling — the race detector in practice.
- Lint & Format —
go vet/copylocks. - Idioms & Gotchas — copylocks, lock ordering, copy-on-write
atomic.Pointer. - Glossary — happens-before, data race, CAS.