Data Structures & Functional Style
Python’s data model is small and sharp: four built-in collections, one
transformation idiom (the comprehension), and a pair of standard-library modules
(itertools, functools) that turn loops into pipelines. There’s no Lodash to
reach for and no for i := range boilerplate — the language gives you most of it.
This module maps Python’s collections onto TS arrays/objects/Map/Set and Go slices/maps, then covers the moves that make Python feel like Python: comprehensions, dataclasses, lazy generators, and the functional standard library.
Core collections
Section titled “Core collections”Mental model
Section titled “Mental model”| Need | TypeScript | Go | Python |
|---|---|---|---|
| Ordered, indexed, growable | Array<T> | []T (slice) | list |
| Key → value | Map / object / Record | map[K]V | dict |
| Unique members | Set<T> | map[T]struct{} | set |
| Fixed, heterogeneous record | tuple type [A, B] | struct (roughly) | tuple |
| Hashable, frozen set | ReadonlySet (type-only) | — | frozenset |
Two things trip up TS/Go devs immediately:
listis the only “array”. There’s no separate array type;listis a dynamic array (amortized O(1) append), not a linked list despite the name.dictpreserves insertion order (guaranteed since 3.7) and is the backbone of Python — objects, kwargs, JSON, and namespaces are all dicts under the hood.
const nums: number[] = [1, 2, 3];nums.push(4); // [1, 2, 3, 4]nums[0]; // 1nums.slice(1, 3); // [2, 3]nums.includes(2); // trueconst doubled = nums.map(n => n * 2);nums := []int{1, 2, 3}nums = append(nums, 4) // [1 2 3 4]nums[0] // 1nums[1:3] // [2 3]// no built-in `includes` or `map` — write a loopnums: list[int] = [1, 2, 3]nums.append(4) # [1, 2, 3, 4]nums[0] # 1nums[1:3] # [2, 3] — slices are copiesnums[-1] # 4 — negative indexing2 in nums # True — O(n) membershipdoubled = [n * 2 for n in nums] # comprehension, not .map()Slicing is Python’s superpower and has no clean TS/Go analog: nums[1:3],
nums[::-1] (reverse), nums[::2] (every other). Slices always return a new list.
const user = new Map<string, number>([["alice", 30]]);user.set("bob", 25);user.get("alice"); // 30user.get("carol"); // undefineduser.has("bob"); // true
// or a plain object / Recordconst cfg: Record<string, string> = { host: "localhost" };user := map[string]int{"alice": 30}user["bob"] = 25v, ok := user["alice"] // 30, true_, ok = user["carol"] // 0, false (comma-ok)delete(user, "bob")user: dict[str, int] = {"alice": 30}user["bob"] = 25user["alice"] # 30user["carol"] # KeyError! — unlike Go's zero valueuser.get("carol") # None — the safe accessoruser.get("carol", 0) # 0 — with default"bob" in user # True — checks keys, O(1)del user["bob"]for name, age in user.items(): ...set and frozenset
Section titled “set and frozenset”const ids = new Set<number>([1, 2, 3]);ids.add(3); // no-op, already presentids.has(2); // trueids.delete(1);// set algebra: not built-in, you write loopsids := map[int]struct{}{1: {}, 2: {}, 3: {}}ids[4] = struct{}{} // add_, ok := ids[2] // membershipdelete(ids, 1)// no set algebra in stdlibids: set[int] = {1, 2, 3} # {} is an empty DICT — use set() for empty setids.add(3) # no-op2 in ids # True, O(1)ids.discard(1) # remove if present (no error if absent)
# set algebra is built into the languagea, b = {1, 2, 3, 4}, {3, 4, 5, 6}a | b # union {1,2,3,4,5,6}a & b # intersection {3,4}a - b # difference {1,2}a ^ b # symmetric {1,2,5,6}
# frozenset: immutable + hashable, so it can be a dict key or set memberseen: set[frozenset[str]] = {frozenset({"a", "b"})}A tuple is an immutable, fixed-shape sequence — Python’s lightweight “struct.”
TS models this with a tuple type [string, number]; Go uses a struct or multiple
return values.
point: tuple[float, float] = (1.0, 2.0)x, y = point # unpacking (destructuring)rgb = (255, 128, 0)rgb[0] # 255 — indexable, but no item assignment
# Tuples are the natural return type for "multiple values" (like Go's name, err)def divmod_(a: int, b: int) -> tuple[int, int]: return a // b, a % b # the parens are optional
q, r = divmod_(17, 5) # 3, 2
# Single-element tuple needs the trailing comma — (x,) not (x)singleton = (42,)Use a tuple for a fixed, heterogeneous record where positions have meaning and
the value won’t change. For a named fixed record, jump straight to a
dataclass (below) or a NamedTuple.
When to use which
Section titled “When to use which”| Use | When |
|---|---|
list | ordered, mutable, you append/iterate; the default sequence |
tuple | fixed-size, immutable record; dict keys; multiple return values |
dict | key → value lookup; the default mapping |
set | membership tests, dedup, set algebra; order doesn’t matter |
frozenset | a set that must be hashable (dict key / set member) |
collections: the specialized containers
Section titled “collections: the specialized containers”The collections module has four workhorses you’ll reach for constantly.
from collections import defaultdict, Counter, deque, namedtuple
# defaultdict — auto-creates missing values; kills the "init if absent" danceby_dept: defaultdict[str, list[str]] = defaultdict(list)for name, dept in employees: by_dept[dept].append(name) # no KeyError, no setdefault
# Counter — multiset / frequency map with a tally APIcounts = Counter("mississippi") # Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})counts.most_common(2) # [('i', 4), ('s', 4)]words = Counter(["a", "b", "a"]) # Counter({'a': 2, 'b': 1})
# deque — O(1) appends/pops at BOTH ends (list is O(n) at the front)queue: deque[int] = deque([1, 2, 3])queue.appendleft(0) # cheap, unlike list.insert(0, x)queue.popleft() # FIFO queuering = deque(maxlen=100) # bounded ring buffer — old items drop off
# namedtuple — a tuple with named fields (lightweight, for quick records)Point = namedtuple("Point", ["x", "y"])p = Point(1, 2)p.x, p[0] # 1, 1 — both workCounter and defaultdict replace the most common hand-rolled map[K]int /
map[K][]V loops from Go and the reduce-into-an-object pattern from TS. For
named records prefer a dataclass over namedtuple unless you specifically want
tuple behavior (immutability + positional access).
Comprehensions
Section titled “Comprehensions”This is the central Python idiom. A comprehension is map/filter folded into
one expression that reads left-to-right. Once it clicks, you’ll write far fewer
explicit loops than in Go and reach for it before .map().filter() chains.
List comprehension
Section titled “List comprehension”const nums = [1, 2, 3, 4, 5, 6];const evenSquares = nums .filter(n => n % 2 === 0) .map(n => n * n);// [4, 16, 36]nums := []int{1, 2, 3, 4, 5, 6}var evenSquares []intfor _, n := range nums { if n%2 == 0 { evenSquares = append(evenSquares, n*n) }}nums = [1, 2, 3, 4, 5, 6]even_squares = [n * n for n in nums if n % 2 == 0]# [4, 16, 36]Read it as: expression for item in iterable if condition. The
TS chain allocates an intermediate array between filter and map; the
comprehension does it in one pass.
dict and set comprehensions
Section titled “dict and set comprehensions”names = ["alice", "bob", "charlie"]
# dict comprehension — { key: value for ... }lengths = {name: len(name) for name in names}# {'alice': 5, 'bob': 3, 'charlie': 7}
# invert a dictby_length = {v: k for k, v in lengths.items()}
# set comprehension — { value for ... }first_letters = {name[0] for name in names} # {'a', 'b', 'c'}
# filter a dict downprices = {"widget": 9.99, "gadget": 24.99, "doohickey": 4.99}cheap = {k: v for k, v in prices.items() if v < 10}The TS equivalent of a dict comprehension is
Object.fromEntries(arr.map(...)) — clunkier and a separate function call.
Nested and conditional
Section titled “Nested and conditional”# Nested loops — the for clauses read in the same order as nested forsmatrix = [[1, 2, 3], [4, 5, 6]]flat = [x for row in matrix for x in row] # [1, 2, 3, 4, 5, 6]
# Conditional EXPRESSION (ternary) in the output position — note: this `if/else`# is part of the value, distinct from the trailing filter `if`labels = ["even" if n % 2 == 0 else "odd" for n in nums]
# Build a grid (nested comprehension produces nested lists)grid = [[r * c for c in range(3)] for r in range(3)]The walrus operator :=
Section titled “The walrus operator :=”The walrus operator assigns and returns in one expression — useful in a comprehension to avoid computing something twice:
import json
raw = ['{"id": 1}', "not json", '{"id": 2}']
def try_parse(s: str) -> dict | None: try: return json.loads(s) except json.JSONDecodeError: return None
# Without walrus you'd call try_parse twice (once to filter, once to keep)parsed = [obj for s in raw if (obj := try_parse(s)) is not None]# [{'id': 1}, {'id': 2}]This is the same pattern as if ((m = regex.exec(s)) !== null) in JS — bind the
result of an expensive call to a name so you can both test it and use it.
Dataclasses
Section titled “Dataclasses”A dataclass is Python’s answer to a TS interface/class or a Go struct:
declare the fields with type hints, and the decorator writes __init__,
__repr__, and __eq__ for you. It’s the right tool for plain internal data.
The basics
Section titled “The basics”interface User { name: string; age: number; email: string;}const u: User = { name: "Alice", age: 30, email: "a@b.com" };
// or a class with a constructorclass Point { constructor(public x: number, public y: number) {}}type User struct { Name string Age int Email string}u := User{Name: "Alice", Age: 30, Email: "a@b.com"}from dataclasses import dataclass
@dataclassclass User: name: str age: int email: str
u = User(name="Alice", age=30, email="a@b.com")u # User(name='Alice', age=30, email='a@b.com') — free __repr__u == User("Alice", 30, "a@b.com") # True — free __eq__ (value equality)Mutable defaults and field(default_factory=...)
Section titled “Mutable defaults and field(default_factory=...)”Plain defaults work like any language. But there’s a famous Python footgun: a
mutable default ([], {}) is shared across all instances. Dataclasses
detect this and force you to use field(default_factory=...).
from dataclasses import dataclass, field
@dataclassclass Order: id: str items: list[str] = field(default_factory=list) # fresh list per instance tags: set[str] = field(default_factory=set) priority: int = 0 # immutable default is finefrozen, slots, kw_only
Section titled “frozen, slots, kw_only”from dataclasses import dataclass
@dataclass(frozen=True, slots=True, kw_only=True)class Config: host: str port: int = 8080 debug: bool = False
c = Config(host="localhost", port=5432)# c.port = 5433 # FrozenInstanceError — frozen makes it immutablehash(c) # frozen dataclasses are hashable → usable as dict keys| Option | Effect | Maps to |
|---|---|---|
frozen=True | immutable; instances become hashable | readonly fields / Go value semantics |
slots=True | __slots__ — less memory, faster attrs, no stray attrs | Go struct’s fixed layout |
kw_only=True | all fields must be passed by keyword | TS object literal; named args |
order=True | generates <, <=, >, >= (by field tuple) | sortable structs |
eq=True | value equality (default True) | Go == on comparable structs |
slots=True is close to free and worth defaulting to for data-heavy classes — it
trades the ability to monkey-patch attributes for lower memory and faster access.
kw_only=True is great for classes with many fields where positional args would
be ambiguous.
__post_init__ and ordering
Section titled “__post_init__ and ordering”from dataclasses import dataclass, field
@dataclass(order=True)class Task: # sort_index drives ordering; we hide it from the constructor + repr sort_index: int = field(init=False, repr=False) name: str priority: int
def __post_init__(self) -> None: # runs after the generated __init__ — derive fields, validate, normalize self.sort_index = -self.priority
tasks = [Task("deploy", 1), Task("hotfix", 9), Task("docs", 2)]tasks.sort() # highest priority first, via sort_index[t.name for t in tasks] # ['hotfix', 'docs', 'deploy']__post_init__ is where you do derived fields and light validation — the rough
equivalent of logic in a TS constructor body or a Go constructor function. For
real validation with good errors, that’s Pydantic’s job.
Generators & iterators
Section titled “Generators & iterators”A generator is a function that pauses and resumes, yielding values lazily
instead of building a list. This is identical in spirit to JS function* and very
close to consuming a Go channel or a Go 1.23+ range-over-func iterator. The win is
memory: you process a billion records without holding a billion in RAM.
function* countUp(n: number): Generator<number> { for (let i = 0; i < n; i++) { yield i; }}for (const x of countUp(3)) console.log(x); // 0, 1, 2// Go 1.23+ range-over-func iteratorfunc countUp(n int) func(yield func(int) bool) { return func(yield func(int) bool) { for i := 0; i < n; i++ { if !yield(i) { return } } }}for x := range countUp(3) { fmt.Println(x) } // 0, 1, 2from collections.abc import Iterator
def count_up(n: int) -> Iterator[int]: for i in range(n): yield i # pause here, resume on next request
for x in count_up(3): print(x) # 0, 1, 2Lazy evaluation and generator expressions
Section titled “Lazy evaluation and generator expressions”A generator expression is a comprehension with () instead of [] — same
syntax, but lazy and memory-flat.
# List comprehension: builds the whole list in memory (eager)squares_list = [n * n for n in range(1_000_000)] # ~8 MB allocated now
# Generator expression: computes one value at a time (lazy)squares_gen = (n * n for n in range(1_000_000)) # ~0 bytes until consumed
# Passing a genexp straight into a consumer — no intermediate list, no extra ()total = sum(n * n for n in range(1_000_000))first_big = next(n * n for n in range(1_000_000) if n * n > 100) # 121itertools — the lazy combinators
Section titled “itertools — the lazy combinators”itertools is the lazy, composable toolkit for iterators. Everything here returns
an iterator, so chains stay memory-flat.
import itertools as it
# chain — concatenate iterables lazily (like [...a, ...b] but lazy)list(it.chain([1, 2], [3, 4], [5])) # [1, 2, 3, 4, 5]
# islice — slice an iterator without building a list (works on infinite gens)list(it.islice(count_up(1_000_000), 3)) # [0, 1, 2]
# pairwise (3.10+) — sliding window of 2; great for diffs/deltaslist(it.pairwise([1, 2, 4, 7])) # [(1,2), (2,4), (4,7)]
# accumulate — running totals (like a scan / cumulative reduce)list(it.accumulate([1, 2, 3, 4])) # [1, 3, 6, 10]
# batched (3.12+) — fixed-size chunks; perfect for batched DB inserts / API callslist(it.batched(range(7), 3)) # [(0,1,2), (3,4,5), (6,)]
# groupby — group CONSECUTIVE equal keys (sort first if you want global groups!)data = [("a", 1), ("a", 2), ("b", 3), ("a", 4)]for key, group in it.groupby(data, key=lambda t: t[0]): print(key, [v for _, v in group])# a [1, 2] b [3] a [4] <- note 'a' appears twice: groupby is consecutive-onlyComparison: laziness across languages
Section titled “Comparison: laziness across languages”| Concept | TypeScript | Go | Python |
|---|---|---|---|
| Lazy producer | function* generator | channel / range-over-func | generator (yield) |
| Inline lazy expr | — (no genexp) | — | (x for x in ...) |
| Combinator library | iter-helpers / RxJS | manual | itertools |
| Backpressure model | pull (iterator protocol) | push (channel send/recv) | pull (iterator protocol) |
Python and JS share a pull model (the consumer calls next()); Go channels
are a push/communicate model. For pure data transformation, the pull model is
simpler — no goroutine to manage, no channel to close.
functools
Section titled “functools”functools holds the higher-order helpers. The standouts for backend work are
the caching decorators and partial.
from functools import reduce, partial, cache, lru_cache, cached_property, singledispatch
# reduce — fold a sequence to one value. Python prefers sum()/comprehensions, but# reduce is there for non-trivial accumulation (e.g. composing functions)product = reduce(lambda acc, n: acc * n, [1, 2, 3, 4], 1) # 24
# partial — pre-bind arguments (like .bind() in JS, or a closure)def fetch(base_url: str, path: str) -> str: ...fetch_api = partial(fetch, "https://api.example.com")fetch_api("/users") # base_url already bound
# cache (3.9+) — unbounded memoization; lru_cache(maxsize=N) for a bounded cache@cachedef fib(n: int) -> int: return n if n < 2 else fib(n - 1) + fib(n - 2)fib(100) # instant — memoized; without @cache, exponentialfib.cache_info() # CacheInfo(hits=..., misses=..., ...)
@lru_cache(maxsize=1024)def expensive_lookup(key: str) -> dict: ...cached_property computes a value once on first access and stores it on the
instance — ideal for derived fields that are expensive but stable:
from dataclasses import dataclassfrom functools import cached_property
@dataclassclass Dataset: rows: list[dict]
@cached_property def summary(self) -> dict[str, float]: # computed once, cached on the instance thereafter return {"count": len(self.rows), "mean": sum(r["v"] for r in self.rows) / len(self.rows)}singledispatch gives you function overloading by the type of the first argument
— the Pythonic alternative to a chain of isinstance checks:
from functools import singledispatch
@singledispatchdef to_json(value: object) -> str: raise TypeError(f"no encoder for {type(value)}")
@to_json.registerdef _(value: int) -> str: return str(value)
@to_json.registerdef _(value: list) -> str: return "[" + ", ".join(to_json(v) for v in value) + "]"
to_json([1, 2, 3]) # "[1, 2, 3]"Functional style
Section titled “Functional style”Python is multi-paradigm, leaning toward comprehensions over map/filter
and expressions over explicit loops where it reads well.
map/filter vs comprehensions — prefer comprehensions
Section titled “map/filter vs comprehensions — prefer comprehensions”nums = [1, 2, 3, 4, 5, 6]
# map/filter exist, but require a lambda and read inside-outdoubled = list(map(lambda n: n * 2, nums))evens = list(filter(lambda n: n % 2 == 0, nums))
# Comprehensions are the idiomatic choice — clearer, no lambda, no list() wrapdoubled = [n * 2 for n in nums]evens = [n for n in nums if n % 2 == 0]map/filter earn their place when you’re passing an existing named function
(map(str.upper, names)) or feeding a lazy iterator into another consumer.
Otherwise, comprehension.
sorted with key=, any/all
Section titled “sorted with key=, any/all”const users = [{ name: "Bob", age: 25 }, { name: "Al", age: 30 }];const byAge = [...users].sort((a, b) => a.age - b.age);const byNameLen = [...users].sort((a, b) => a.name.length - b.name.length);const anyAdult = users.some(u => u.age >= 18);const allAdults = users.every(u => u.age >= 18);sort.Slice(users, func(i, j int) bool { return users[i].Age < users[j].Age })anyAdult := falsefor _, u := range users { if u.Age >= 18 { anyAdult = true; break } }users = [{"name": "Bob", "age": 25}, {"name": "Al", "age": 30}]
# sorted() returns a NEW list; .sort() mutates in place. key= takes a function.by_age = sorted(users, key=lambda u: u["age"])by_name_len = sorted(users, key=lambda u: len(u["name"]))oldest_first = sorted(users, key=lambda u: u["age"], reverse=True)
# multi-key sort: return a tuple (sorts by age, then name)by_age_name = sorted(users, key=lambda u: (u["age"], u["name"]))
# operator.itemgetter / attrgetter are faster, lambda-free key functionsfrom operator import itemgetterby_age = sorted(users, key=itemgetter("age"))
any_adult = any(u["age"] >= 18 for u in users) # short-circuits, lazyall_adults = all(u["age"] >= 18 for u in users)any/all over a generator expression short-circuit — they stop at the first
hit/miss, so any(is_valid(x) for x in huge_stream) won’t scan the whole stream.
Unpacking, *args, **kwargs
Section titled “Unpacking, *args, **kwargs”# Sequence unpacking with a "rest" via the starfirst, *middle, last = [1, 2, 3, 4, 5] # 1, [2, 3, 4], 5a, b = b, a # swap, no temp
# Merge/spread (3.5+): like {...a, ...b} and [...x, ...y]merged = {**defaults, **overrides} # later winscombined = [*list_a, *list_b]
# *args / **kwargs — variadic positional + keyword argsdef log(level: str, *args: object, **kwargs: object) -> None: print(level, args, kwargs)
log("INFO", "started", 1, 2, service="api", region="us")# INFO ('started', 1, 2) {'service': 'api', 'region': 'us'}
# Spread a list/dict INTO a call (the inverse)def make_user(name: str, age: int) -> dict: ...fields = {"name": "Alice", "age": 30}make_user(**fields) # spread dict as keyword argsStructural pattern matching
Section titled “Structural pattern matching”match/case (3.10+) can destructure while it matches — far more powerful
than a switch. It shines on shaped data like parsed JSON or events:
def handle(event: dict) -> str: match event: case {"type": "click", "x": int(x), "y": int(y)}: return f"click at {x},{y}" case {"type": "key", "key": str(k)}: return f"key {k}" case {"type": t}: return f"unknown event {t}" case _: return "malformed"This is structural destructuring + matching in one construct — closer to Rust’s
match than to a C-style switch. (Covered in depth in
Module 02.)
Files, enumerate, zip
Section titled “Files, enumerate, zip”pathlib for filesystem work
Section titled “pathlib for filesystem work”Use pathlib.Path for all filesystem work — it’s the object-oriented, modern API.
(You’ll occasionally see os.path string-juggling in older code; pathlib
replaces it.)
from pathlib import Path
data_dir = Path("data")csv_file = data_dir / "sales.csv" # `/` joins paths — no os.path.joincsv_file.exists() # boolcsv_file.suffix # '.csv'csv_file.stem # 'sales'
text = csv_file.read_text(encoding="utf-8") # one-liner readcsv_file.write_text("a,b,c\n") # one-liner write
# Glob and iterate — lazy, recursive with **for path in data_dir.glob("**/*.csv"): print(path)
# Streaming a big file line-by-line (the file object is itself an iterator)with csv_file.open() as f: for line in f: # lazy — one line in memory at a time process(line)enumerate and zip
Section titled “enumerate and zip”names = ["alice", "bob", "charlie"]scores = [85, 92, 78]
# enumerate — index + value. NEVER write `for i in range(len(names))`for i, name in enumerate(names): print(f"{i}: {name}")for i, name in enumerate(names, start=1): # 1-based numbering print(f"{i}. {name}")
# zip — pair up parallel iterables (built into the language, unlike TS)for name, score in zip(names, scores): print(f"{name}: {score}")
# zip + dict comprehension = build a lookup from two listsscoreboard = {name: score for name, score in zip(names, scores)}
# zip(*matrix) transposes — a classic trickmatrix = [[1, 2, 3], [4, 5, 6]]rows = list(zip(*matrix)) # [(1, 4), (2, 5), (3, 6)]Practice
Section titled “Practice”Put generators, itertools, and dataclasses together into one lazy pipeline —
parse, filter, group, aggregate — without ever holding the full dataset in memory.