Pointers, Values & Memory
Go has no hidden references and no manual memory management, but it does have a precise model of what gets copied when. Getting that model right is the difference between code that’s fast and correct and code that mysteriously aliases, leaks, or corrupts data. We’ll build the model from the ground up — value semantics, then the “reference-like” header types (slices, maps, strings) — using a real distributed-systems codebase, multigres (“Vitess for Postgres”), whose SQL lexer, parser, and wire codec lean hard on these details for performance.
If you haven’t met defined types and value-vs-pointer receivers yet, Types, Structs & Methods covers them. New to the repo? Start at the orientation.
Value semantics: every assignment is a copy
Section titled “Value semantics: every assignment is a copy”Go has no hidden references. Every assignment and every function argument is a copy of the value. For a struct, “the value” is all its fields laid out contiguously — so copying a struct copies every field.
type Point struct{ X, Y int }
func move(p Point) { p.X = 99 } // mutates a COPY
func main() { a := Point{1, 2} b := a // b is an independent copy b.X = 5 // a.X is still 1 move(a) // a.X is still 1}This is the same model as C structs or C# struct — not Java/Python objects, where a variable holds a reference. To mutate the caller’s value, or to avoid copying a large struct, you pass a pointer.
func move(p *Point) { p.X = 99 } // p.X is shorthand for (*p).X
a := Point{1, 2}move(&a) // a.X is now 99Why this matters for receivers
Section titled “Why this matters for receivers”A method’s receiver follows the same rules. A value receiver operates on a copy; a pointer receiver operates on the original. The codebase makes this distinction deliberately. In its sqltypes package, Value (a small 3-word slice header — more below) uses value receivers, while Result/Row (larger structs that also need nil-checking) use pointer receivers:
// Value represents a nullable column value.// nil means NULL, []byte{} means empty string.type Value []byte
func (v Value) IsNull() bool { return v == nil}func (r *Result) ToProto() *query.QueryResult { if r == nil { return nil } /* ... */ }func (r *Row) ToProto() *query.Row { if r == nil { return nil } /* ... */ }Value is a value receiver because copying it is cheap (it copies the header, not the bytes) and because a Value is meant to behave like a primitive. *Result and *Row are pointer receivers because (1) they are larger structs you don’t want to copy on every call, and (2) the methods need to handle a nil receiver gracefully — if r == nil { return nil } is a legal pointer-receiver pattern (see Errors for nil-receiver methods). You can’t nil-check a value receiver the same way; a value can’t be the nil pointer.
What is “reference-like”? slices, maps, channels, strings
Section titled “What is “reference-like”? slices, maps, channels, strings”Some built-in types are headers — small structs that contain a pointer to backing storage. Copying the header copies the pointer, so the copy shares the backing data. These are slices, maps, channels, and strings. They’re still copied by value (the header is), but the thing they point at is shared.
This is the single most important mental model in this chapter: a slice variable is not the array; it is a (ptr, len, cap) triple that points at an array.
s := []int{1, 2, 3}t := s // copies the 3-word header; t and s share the same backing arrayt[0] = 99 // s[0] is now 99 tooContrast with arrays, which are genuine values (covered later).
Slice internals: ptr, len, cap
Section titled “Slice internals: ptr, len, cap”A slice header is three machine words pointing into a separate backing array:
flowchart LR subgraph Header["slice header (3 words)"] ptr["ptr"] len["len = 3"] cap["cap = 5"] end subgraph Array["backing array (len 5)"] a0["[0]"] a1["[1]"] a2["[2]"] a3["[3] (unused, within cap)"] a4["[4] (unused, within cap)"] end ptr --> a0
- ptr — points at the first element of the backing array.
- len — number of elements you can index (
s[0]..s[len-1]). - cap — number of elements from
ptrto the end of the backing array.
Subslicing creates a new header that points into the same backing array:
s := []byte("hello world")sub := s[6:11] // sub == "world", shares s's backing arraysub[0] = 'W' // s is now "hello World" — aliasing!len(sub) is 5; cap(sub) is 5 (from offset 6 to the end of the 11-byte array). No bytes were copied — sub’s ptr is just &s[6].
The real high-perf user: the lexer’s scan buffer
Section titled “The real high-perf user: the lexer’s scan buffer”The SQL lexer scans by indexing and subslicing a single backing buffer, never copying per-token. Its ParseContext holds the source as a []byte built once from the input string:
scanBuf []byte // the string being scanned (for lexer)// ...scanBuf: []byte(input),scanBufLen: len(input),The lexer then reads scanBuf[pos] by index and takes subslices like scanBuf[startPos:scanPos] to identify tokens. Because subslicing is just header arithmetic (no copy), the whole hot path stays allocation-free. The query path is latency-sensitive, so this matters (see Parser, Lexer, AST & Codegen).
Subslice aliasing in the wire codec — the gotcha that bites
Section titled “Subslice aliasing in the wire codec — the gotcha that bites”RowFromProto decodes a protobuf row whose values are concatenated into one big []byte (pr.Values) plus a Lengths array. It carves each column out as a subslice of that one buffer:
func RowFromProto(pr *query.Row) *Row { if pr == nil { return nil } values := make([]Value, len(pr.Lengths)) offset := 0 for i, length := range pr.Lengths { switch length { case -1: values[i] = nil // NULL case 0: values[i] = []byte{} // empty string, not NULL default: values[i] = pr.Values[offset : offset+int(length)] offset += int(length) } } return &Row{Values: values}}The same pattern shows up when decoding bind parameters. Two consequences follow directly from slice internals:
- Every
Valuehere aliases the onepr.Valuesbacking array. Mutating the bytes of one column would corrupt its neighbors. Retaining one tinyValuekeeps the entirepr.Valuesbuffer alive — the garbage collector frees the backing array only when no slice references it. That’s a classic memory-retention trap. - It is fast on purpose. Decoding a row of N columns does one
make([]Value, N)and zero per-column byte copies.
The defensive copy: “return copy for safety”
Section titled “The defensive copy: “return copy for safety””When the lexer exposes its internal buffer to outside callers, it must not hand out an aliasing slice — a caller could mutate the lexer’s state through it. So it does an explicit make+copy:
func (ctx *ParseContext) GetScanBuf() []byte { // Return copy for safety buf := make([]byte, len(ctx.scanBuf)) copy(buf, ctx.scanBuf) return buf}If this returned ctx.scanBuf directly, a caller writing buf[0] = 'x' would scribble on the lexer’s live scan buffer mid-parse. Returning a copy breaks the aliasing.
append: growth, reallocation, and predictable aliasing
Section titled “append: growth, reallocation, and predictable aliasing”append adds elements to a slice. If len < cap, it writes into the existing backing array and returns a header with a longer len — mutating shared storage in place. If len == cap, it allocates a new, larger backing array, copies the old elements, and returns a header pointing at the new array — at which point the old aliases stop seeing your new writes.
a := make([]int, 2, 4) // len 2, cap 4b := append(a, 7) // fits in cap: b shares a's array, b[2]==7a = a[:3] // a now also sees the 7 — same backing array
c := append(b, 1, 2, 3) // overflows cap 4: c gets a NEW arrayc[0] = 99 // a/b are UNAFFECTED — different backing array nowPreallocate with make to avoid growth churn
Section titled “Preallocate with make to avoid growth churn”ToProto knows the exact final size up front, so it allocates once with make([]byte, 0, totalLen) — length 0, capacity totalLen — then appends without ever reallocating:
lengths := make([]int64, len(r.Values))var totalLen intfor i, v := range r.Values { if v == nil { lengths[i] = -1 } else { lengths[i] = int64(len(v)) totalLen += len(v) }}
values := make([]byte, 0, totalLen)for _, v := range r.Values { if v != nil { values = append(values, v...) // spread one slice into another }}Two things to read carefully:
make([]byte, 0, totalLen)— the third arg is capacity. The slice is empty (len 0) but has room fortotalLenbytes, so the append loop never grows the array. Without the cap hint, append would reallocate roughlylog₂(N)times, copying old data each time.append(values, v...)— the...spreads the elements ofvas individual arguments.append(values, v)(no dots) would be a type error here, sincevis a[]byteandvalueswantsbyteelements.
Note the two make shapes side by side: make([]int64, len(r.Values)) (len == N, for index assignment lengths[i] = ...) versus make([]byte, 0, totalLen) (len 0, cap N, for append). Pick the shape by how you’ll fill it — index-assign needs length; append needs capacity.
When the exact size isn’t known, the lexer uses cap as an upper-bound estimate — decoding a Unicode escape can’t produce more bytes than the input:
result := make([]byte, 0, len(input))// ...result = append(result, []byte(string(cp))...) // rune -> string -> []byte (UTF-8)string(cp) where cp is a rune produces the UTF-8 encoding of that codepoint as a string; []byte(...) of it gives the bytes; ... spreads them. (More on rune-to-string below.)
copy() and the deep-clone idiom
Section titled “copy() and the deep-clone idiom”copy(dst, src) copies min(len(dst), len(src)) elements and returns that count. It does not grow dst. To clone a slice into independent storage you make the destination to the right length, then copy:
src := []int{1, 2, 3}dst := make([]int, len(src))copy(dst, src) // dst is independent; mutating dst won't touch srcThe generated AST cloner shows this in its purest form, preserving the nil/non-nil distinction (see nil-vs-empty below):
func CloneSliceOfOid(n []Oid) []Oid { if n == nil { return nil } res := make([]Oid, len(n)) copy(res, n) return res}copy does a shallow copy — for []Oid (a slice of values) that’s a full deep copy, because the values are the data. But for a slice of pointers, copy would duplicate the pointers, leaving both slices pointing at the same elements. The cloner handles that case with a per-element deep clone instead:
func CloneSliceOfRefOfDefElem(n []*DefElem) []*DefElem { if n == nil { return nil } res := make([]*DefElem, len(n)) for i, x := range n { res[i] = CloneRefOfDefElem(x) // recurse into each pointed-at node } return res}nil slice vs empty slice — load-bearing here
Section titled “nil slice vs empty slice — load-bearing here”A nil slice (var s []byte or []byte(nil)) has ptr == nil, len == 0, cap == 0. An empty slice ([]byte{} or make([]byte, 0)) has a non-nil ptr, len == 0, cap == 0. Both have length zero, so ranging over either does nothing and append works on either — for most code the difference is invisible.
In sqltypes the difference is semantic and must be preserved: a nil Value is SQL NULL; an empty Value ([]byte{}) is the empty string ''. Look back at RowFromProto — the -1 and 0 cases are kept distinct on purpose:
case -1: values[i] = nil // NULLcase 0: values[i] = []byte{} // empty string, not NULLValue.IsNull() is literally v == nil. If you “helpfully normalized” a nil slice to []byte{} anywhere in this path, you’d turn every SQL NULL into an empty string — a correctness bug.
Maps: reference-like, must be made, comma-ok, no order
Section titled “Maps: reference-like, must be made, comma-ok, no order”A map value is a pointer to a runtime hash table. Copying a map variable copies the pointer — both names see the same table. The zero value of a map is nil, and a nil map is readable (every lookup returns the zero value) but not writable (a write panics).
var m map[string]int // nil map_ = m["missing"] // ok: returns 0 (zero value), no panicm["x"] = 1 // PANIC: assignment to entry in nil map
m = make(map[string]int) // now writablem["x"] = 1 // okThe codebase builds its keyword lookup table in an init() using make with a capacity hint:
var keywordLookupMap map[string]*KeywordInfo// ...func init() { keywordLookupMap = make(map[string]*KeywordInfo, len(Keywords)) for i := range Keywords { keywordLookupMap[Keywords[i].Name] = &Keywords[i] }}The len(Keywords) second argument tells the runtime how many entries to expect, so it can size the buckets once instead of rehashing as the map grows. It’s a hint, not a cap — the map can still grow past it.
Taking the address of a slice element, not a loop variable
Section titled “Taking the address of a slice element, not a loop variable”Notice &Keywords[i] — the address of element i in the slice’s backing array. This is deliberate. The tempting alternative is wrong in spirit:
// WRONG mental model:for _, k := range Keywords { keywordLookupMap[k.Name] = &k // address of the loop copy, not of Keywords[i]}k is a copy of each element, so &k takes the address of that copy. (Pre-Go-1.22 the same k variable was reused across iterations, so every stored pointer aliased the last value — a notorious bug. Go 1.22+ makes k per-iteration, so this specific form is no longer the last-value bug, but &k still points at a copy, not at Keywords[i].) Using &Keywords[i] gets a stable pointer into the slice’s storage, so the map holds pointers to the real keyword records — exactly what you want when value identity matters.
The comma-ok lookup
Section titled “The comma-ok lookup”Because a missing key returns the zero value, you can’t distinguish “absent” from “present but zero” by the value alone. The two-result form does:
v, ok := m["k"] // ok is false if absentA settings cache uses it, combining the lookup with pointer identity for deduplication:
if elem, ok := c.cache[key]; ok { // Move to front (most recently used) c.lru.MoveToFront(elem) c.hits++ return elem.Value.(*cacheEntry).settings}Its GetOrCreate returns the same *Settings pointer for identical inputs — that’s a documented contract. It only works because maps are reference-like and pointers carry identity: two callers with the same inputs get pointer-equal results.
Strings vs []byte: immutability and conversion cost
Section titled “Strings vs []byte: immutability and conversion cost”A Go string is a 2-word header (ptr, len) over immutable bytes; a []byte is the 3-word mutable slice you already know:
flowchart LR subgraph S["string header (2 words)"] sptr["ptr"] slen["len"] end subgraph B["[]byte header (3 words)"] bptr["ptr"] blen["len"] bcap["cap"] end imm["immutable bytes"] mut["mutable backing array"] sptr --> imm bptr --> mut
Because one is immutable and the other mutable, converting between them allocates a new backing array and copies the bytes in the general case:
s := "hello"b := []byte(s) // allocates len(s) bytes, copies them (b is mutable, independent)s2 := string(b) // allocates again, copies backThe lexer pays this conversion exactly once — scanBuf: []byte(input) at setup — and then scans by index over the mutable buffer. That’s the right place to pay: one copy at setup, zero in the hot loop.
GetCurrentText does the reverse at a controlled point — subslice (cheap, aliases) then convert to string (one copy, makes the result safe to retain):
func (ctx *ParseContext) GetCurrentText(startPos int) string { if startPos < 0 || startPos > ctx.scanPos { return "" } return string(ctx.scanBuf[startPos:ctx.scanPos])}The subslice aliases the live buffer (no copy), but string(...) copies those bytes into a fresh immutable string — so the returned token text is independent of the lexer and safe to keep even as scanning overwrites that buffer region.
The zero-allocation string(b) == "literal" optimization
Section titled “The zero-allocation string(b) == "literal" optimization”There’s one important exception to “conversion allocates.” The Go compiler recognizes the pattern string(byteSlice) == stringValue and lowers it to a direct byte comparison (memcmp) with no allocation — the temporary string is never materialized on the heap. The lexer documents and relies on this in its hot scanning loop:
// HasPrefixAtScanPos reports whether scanBuf at the current scan position// starts with needle. needle is taken as a string so callers don't have to// allocate a []byte: `string(byteSlice) == stringLit` lowers to a direct// memcmp in the Go compiler with no heap allocation.func (ctx *ParseContext) HasPrefixAtScanPos(needle string) bool { pos := ctx.scanPos if len(ctx.scanBuf)-pos < len(needle) { return false } return string(ctx.scanBuf[pos:pos+len(needle)]) == needle}Avoid the conversion entirely when nothing changes
Section titled “Avoid the conversion entirely when nothing changes”normalizeKeywordCase lowercases a keyword for lookup, but most keywords are already lowercase. It scans first and returns the original string untouched if there’s nothing to change — paying zero allocations on the common path — and only allocates when it must rewrite bytes:
func normalizeKeywordCase(s string) string { hasUpper := false for i := 0; i < len(s); i++ { if s[i] >= 'A' && s[i] <= 'Z' { hasUpper = true break } } // If no uppercase, return original string (avoid allocation) if !hasUpper { return s } // Need to convert case - create new string result := make([]byte, len(s)) for i := 0; i < len(s); i++ { ch := s[i] if ch >= 'A' && ch <= 'Z' { ch += 'a' - 'A' // PostgreSQL ASCII-only conversion } result[i] = ch } return string(result)}The allocation avoided on the fast path is the make([]byte, len(s)) plus the final string(result) copy. Returning s directly costs nothing because a string is immutable — handing back the same header is safe.
byte vs rune
Section titled “byte vs rune”Indexing a string (s[i]) gives a byte (uint8), not a character. Ranging over a string (for i, r := range s) decodes runes (int32 Unicode codepoints), and i is the byte offset of each rune. A multi-byte UTF-8 character is several bytes but one rune. string(someRune) UTF-8-encodes that codepoint (used in the lexer’s []byte(string(cp)) above); string(someByteSlice) reinterprets bytes as a string. string(65) is "A" (the codepoint) — not "65".
strings.Builder: cheap incremental building, never copy it
Section titled “strings.Builder: cheap incremental building, never copy it”Concatenating with + in a loop reallocates every iteration (strings are immutable). strings.Builder accumulates into an internal growable buffer and produces the final string once with .String():
var b strings.Builderfor i := 0; i < 3; i++ { b.WriteString("x")}return b.String() // "xxx", one final allocationThe SQL-rendering code threads a *strings.Builder through helper functions so writes accumulate in the caller’s builder:
func appendJsonReturning(b *strings.Builder, output *JsonOutput) { // ... b.WriteString(" RETURNING ") // ...}The parameter is *strings.Builder (pointer), not strings.Builder (value), for two reasons: passing a value would (1) write into a copy the caller never sees, losing the output, and (2) trip go vet’s copylocks check — strings.Builder contains a noCopy marker precisely to forbid copying after first use.
For the same reason, ParseContext embeds a strings.Builder as a value field, which is exactly why ParseContext is only ever used as *ParseContext (every method is func (ctx *ParseContext) ...):
literalBuf strings.Builder // accumulates literal valuesIf ParseContext were ever copied by value, it would copy literalBuf — corrupting the builder and tripping copylocks. Using it only via pointer guarantees no copy ever happens.
Arrays vs slices
Section titled “Arrays vs slices”An array has a fixed length that’s part of its type: [3]int and [4]int are different, incompatible types. Arrays are genuine values — assigning or passing one copies all elements.
var a [3]intb := a // copies all 3 intsb[0] = 9 // a[0] is still 0
// A slice over an array does NOT copy:s := a[:] // s is a slice header pointing into a's storages[0] = 9 // a[0] is now 9You’ll rarely declare arrays directly here; slices are the workhorse. The relevant connection: []byte(input) builds a slice over a fresh backing array (that’s the allocation), and a string is an immutable byte sequence, conceptually array-backed — which is why mutating one requires the []byte copy.
make vs new
Section titled “make vs new”Two allocation builtins, easy to confuse:
make(T, ...)— only for slices, maps, and channels. It initializes the internal structure (slice header + backing array; map buckets; channel buffer) and returns an initialized value of typeT.new(T)— for any type. It allocates a zeroedTand returns a*Tpointing at it.
s := make([]int, 0, 8) // a usable, empty slice with cap 8m := make(map[string]int) // a usable, writable mapp := new(int) // p is *int, *p == 0make is everywhere in this chapter’s examples. new is rare for reference types and usually a mistake there:
bad := new([]byte) // bad is *[]byte pointing at a NIL slice — almost never useful*bad = append(*bad, 1) // works, but you wanted `b := make([]byte, 0, n)` insteadEscape analysis: stack vs heap
Section titled “Escape analysis: stack vs heap”Go decides automatically whether each value lives on the stack (cheap, freed when the function returns) or the heap (managed by the GC). The compiler’s escape analysis asks: can this value’s lifetime exceed the function? If a value’s address can be reached after the function returns, it must escape to the heap.
Common things that force a heap escape:
- Returning a pointer to a local:
func f() *T { var x T; return &x }—xoutlivesf, so it heap-allocates. (Unlike C, this is safe in Go — no dangling pointer — but it’s still a heap allocation.) - Storing a value in an interface (interface boxing): assigning a concrete value to an
anyor any interface variable typically moves it to the heap, because the interface must hold a pointer. See Interfaces & Composition. - Closures capturing a variable by reference: if a closure outlives the function and mutates a captured local, that local escapes.
- Slices/maps whose size isn’t known at compile time (
make([]T, n)with dynamicn) generally heap-allocate the backing array.
The alloc-avoidance code you saw earlier — normalizeKeywordCase returning the original string, HasPrefixAtScanPos keeping string(b) == ... inline — is written against this model: the goal is to keep the hot path from producing heap garbage that the GC must later collect.
You cannot tell from reading source whether a given value escapes — it depends on the compiler. You observe it by asking the compiler to print its decisions:
go build -gcflags=-m ./go/common/parser/...This prints lines like ([]byte)(input) escapes to heap and moved to heap: x, each tagged with the source position. Higher verbosity (-gcflags='-m -m') explains why. For allocation counts in benchmarks, go test -bench=. -benchmem reports allocs/op. See Debugging & Profiling.
Defined types vs aliases (memory-relevant recap)
Section titled “Defined types vs aliases (memory-relevant recap)”One more distinction that interacts with everything above. type Value []byte defines a new named type with its own method set — it is not []byte for method/interface purposes, even though it has the same memory layout. type TokenType = int (note the =) is a pure alias: TokenType and int are the same type, fully interchangeable, sharing one identity and one method set.
// This is now just an alias for int to maintain compatibility during transitiontype TokenType = inttype Value []byteYou can pass a TokenType anywhere an int is wanted and vice versa (same type). You cannot pass a plain []byte where a method expects a Value receiver, nor call Value’s methods on a bare []byte, without an explicit conversion — they’re distinct types. The alias is a refactoring shim; the named type is a real abstraction carrying behavior (IsNull, SQLLiteral). Full treatment in Types, Structs & Methods.
Checkpoints
Section titled “Checkpoints”-
After
RowFromProtoreturns, you keep a single 4-byteValuefrom a result whosepr.Valueswas 2 MB. How much memory stays reachable, and why?Answer
All 2 MB. The 4-byteValueis a subslice (pr.Values[offset:offset+4]) whoseptrpoints into the 2 MB backing array; the GC keeps the entire backing array alive as long as any slice references it. To release the rest, copy the 4 bytes out withmake+copy. -
Why does
Valueuse value receivers whileResultandRowuse pointer receivers?Answer
Valueis a 3-word slice header — cheap to copy — and behaves like a primitive, so value receivers are fine.Result/Roware larger structs that are also nil-checked (if r == nil { return nil }), which only works on a pointer receiver, and pointer receivers avoid copying the struct on every call. -
Why is
string(ctx.scanBuf[pos:pos+len(needle)]) == needleallocation-free, buts := string(ctx.scanBuf[...]); return s == needleis not?Answer
The compiler special-casesstring(b) == stringValueconsumed directly by a comparison and lowers it to a memcmp without materializing the temporary string. Assigning the conversion to a variable forces the string to actually exist on the heap, defeating the optimization. -
You write
var m map[string]intthenm["x"] = 1. What happens, and what’s the fix?Answer
It panics: “assignment to entry in nil map.” A nil map is readable (lookups return the zero value) but not writable. Initialize it first withm = make(map[string]int)(optionally with a capacity hint).
Exercises
Section titled “Exercises”-
In
go/common/sqltypes/sqltypes.go, trace one byte fromRowFromProtoback to wherepr.Valuesoriginates (look atRow.ToProto). List every[]byte/Valuethat aliases the same backing array afterRowFromProtoreturns, and argue whether retaining one smallValuekeeps the whole buffer alive. -
Grep
go/common/parser/context.goforReturn copy for safety(two hits). For each method, write the concrete aliasing bug that would occur if it returnedctx.scanBuf(orctx.scanBuf[ctx.scanPos:]) directly instead of amake+copy. -
In
keywords.go’sinit(), explain why&Keywords[i]is used rather than&kfromfor _, k := range Keywords. What capacity does themake(...)request, and what does the runtime do differently because of it? -
Pick any three
CloneSliceOf*functions ingo/common/parser/ast/ast_clone.go. Classify each as a shallowmake+copy(slice of values) or a per-element deep clone (slice of pointers), and explain why slices of pointers need the per-element form to fully break aliasing. -
Read
normalizeKeywordCaseandHasPrefixAtScanPos. For each, name the exact allocation a naive implementation would incur and the precise source construct that avoids it.