Skip to content

Data Structures & Functional Style

Python’s data model is small and sharp: four built-in collections, one transformation idiom (the comprehension), and a pair of standard-library modules (itertools, functools) that turn loops into pipelines. There’s no Lodash to reach for and no for i := range boilerplate — the language gives you most of it.

This module maps Python’s collections onto TS arrays/objects/Map/Set and Go slices/maps, then covers the moves that make Python feel like Python: comprehensions, dataclasses, lazy generators, and the functional standard library.

NeedTypeScriptGoPython
Ordered, indexed, growableArray<T>[]T (slice)list
Key → valueMap / object / Recordmap[K]Vdict
Unique membersSet<T>map[T]struct{}set
Fixed, heterogeneous recordtuple type [A, B]struct (roughly)tuple
Hashable, frozen setReadonlySet (type-only)frozenset

Two things trip up TS/Go devs immediately:

  • list is the only “array”. There’s no separate array type; list is a dynamic array (amortized O(1) append), not a linked list despite the name.
  • dict preserves insertion order (guaranteed since 3.7) and is the backbone of Python — objects, kwargs, JSON, and namespaces are all dicts under the hood.
const nums: number[] = [1, 2, 3];
nums.push(4); // [1, 2, 3, 4]
nums[0]; // 1
nums.slice(1, 3); // [2, 3]
nums.includes(2); // true
const doubled = nums.map(n => n * 2);

Slicing is Python’s superpower and has no clean TS/Go analog: nums[1:3], nums[::-1] (reverse), nums[::2] (every other). Slices always return a new list.

const user = new Map<string, number>([["alice", 30]]);
user.set("bob", 25);
user.get("alice"); // 30
user.get("carol"); // undefined
user.has("bob"); // true
// or a plain object / Record
const cfg: Record<string, string> = { host: "localhost" };
const ids = new Set<number>([1, 2, 3]);
ids.add(3); // no-op, already present
ids.has(2); // true
ids.delete(1);
// set algebra: not built-in, you write loops

A tuple is an immutable, fixed-shape sequence — Python’s lightweight “struct.” TS models this with a tuple type [string, number]; Go uses a struct or multiple return values.

point: tuple[float, float] = (1.0, 2.0)
x, y = point # unpacking (destructuring)
rgb = (255, 128, 0)
rgb[0] # 255 — indexable, but no item assignment
# Tuples are the natural return type for "multiple values" (like Go's name, err)
def divmod_(a: int, b: int) -> tuple[int, int]:
return a // b, a % b # the parens are optional
q, r = divmod_(17, 5) # 3, 2
# Single-element tuple needs the trailing comma — (x,) not (x)
singleton = (42,)

Use a tuple for a fixed, heterogeneous record where positions have meaning and the value won’t change. For a named fixed record, jump straight to a dataclass (below) or a NamedTuple.

UseWhen
listordered, mutable, you append/iterate; the default sequence
tuplefixed-size, immutable record; dict keys; multiple return values
dictkey → value lookup; the default mapping
setmembership tests, dedup, set algebra; order doesn’t matter
frozenseta set that must be hashable (dict key / set member)

The collections module has four workhorses you’ll reach for constantly.

from collections import defaultdict, Counter, deque, namedtuple
# defaultdict — auto-creates missing values; kills the "init if absent" dance
by_dept: defaultdict[str, list[str]] = defaultdict(list)
for name, dept in employees:
by_dept[dept].append(name) # no KeyError, no setdefault
# Counter — multiset / frequency map with a tally API
counts = Counter("mississippi") # Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
counts.most_common(2) # [('i', 4), ('s', 4)]
words = Counter(["a", "b", "a"]) # Counter({'a': 2, 'b': 1})
# deque — O(1) appends/pops at BOTH ends (list is O(n) at the front)
queue: deque[int] = deque([1, 2, 3])
queue.appendleft(0) # cheap, unlike list.insert(0, x)
queue.popleft() # FIFO queue
ring = deque(maxlen=100) # bounded ring buffer — old items drop off
# namedtuple — a tuple with named fields (lightweight, for quick records)
Point = namedtuple("Point", ["x", "y"])
p = Point(1, 2)
p.x, p[0] # 1, 1 — both work

Counter and defaultdict replace the most common hand-rolled map[K]int / map[K][]V loops from Go and the reduce-into-an-object pattern from TS. For named records prefer a dataclass over namedtuple unless you specifically want tuple behavior (immutability + positional access).

This is the central Python idiom. A comprehension is map/filter folded into one expression that reads left-to-right. Once it clicks, you’ll write far fewer explicit loops than in Go and reach for it before .map().filter() chains.

const nums = [1, 2, 3, 4, 5, 6];
const evenSquares = nums
.filter(n => n % 2 === 0)
.map(n => n * n);
// [4, 16, 36]

Read it as: expression for item in iterable if condition. The TS chain allocates an intermediate array between filter and map; the comprehension does it in one pass.

names = ["alice", "bob", "charlie"]
# dict comprehension — { key: value for ... }
lengths = {name: len(name) for name in names}
# {'alice': 5, 'bob': 3, 'charlie': 7}
# invert a dict
by_length = {v: k for k, v in lengths.items()}
# set comprehension — { value for ... }
first_letters = {name[0] for name in names} # {'a', 'b', 'c'}
# filter a dict down
prices = {"widget": 9.99, "gadget": 24.99, "doohickey": 4.99}
cheap = {k: v for k, v in prices.items() if v < 10}

The TS equivalent of a dict comprehension is Object.fromEntries(arr.map(...)) — clunkier and a separate function call.

# Nested loops — the for clauses read in the same order as nested fors
matrix = [[1, 2, 3], [4, 5, 6]]
flat = [x for row in matrix for x in row] # [1, 2, 3, 4, 5, 6]
# Conditional EXPRESSION (ternary) in the output position — note: this `if/else`
# is part of the value, distinct from the trailing filter `if`
labels = ["even" if n % 2 == 0 else "odd" for n in nums]
# Build a grid (nested comprehension produces nested lists)
grid = [[r * c for c in range(3)] for r in range(3)]

The walrus operator assigns and returns in one expression — useful in a comprehension to avoid computing something twice:

import json
raw = ['{"id": 1}', "not json", '{"id": 2}']
def try_parse(s: str) -> dict | None:
try:
return json.loads(s)
except json.JSONDecodeError:
return None
# Without walrus you'd call try_parse twice (once to filter, once to keep)
parsed = [obj for s in raw if (obj := try_parse(s)) is not None]
# [{'id': 1}, {'id': 2}]

This is the same pattern as if ((m = regex.exec(s)) !== null) in JS — bind the result of an expensive call to a name so you can both test it and use it.

A dataclass is Python’s answer to a TS interface/class or a Go struct: declare the fields with type hints, and the decorator writes __init__, __repr__, and __eq__ for you. It’s the right tool for plain internal data.

interface User {
name: string;
age: number;
email: string;
}
const u: User = { name: "Alice", age: 30, email: "a@b.com" };
// or a class with a constructor
class Point {
constructor(public x: number, public y: number) {}
}

Mutable defaults and field(default_factory=...)

Section titled “Mutable defaults and field(default_factory=...)”

Plain defaults work like any language. But there’s a famous Python footgun: a mutable default ([], {}) is shared across all instances. Dataclasses detect this and force you to use field(default_factory=...).

from dataclasses import dataclass, field
@dataclass
class Order:
id: str
items: list[str] = field(default_factory=list) # fresh list per instance
tags: set[str] = field(default_factory=set)
priority: int = 0 # immutable default is fine
from dataclasses import dataclass
@dataclass(frozen=True, slots=True, kw_only=True)
class Config:
host: str
port: int = 8080
debug: bool = False
c = Config(host="localhost", port=5432)
# c.port = 5433 # FrozenInstanceError — frozen makes it immutable
hash(c) # frozen dataclasses are hashable → usable as dict keys
OptionEffectMaps to
frozen=Trueimmutable; instances become hashablereadonly fields / Go value semantics
slots=True__slots__ — less memory, faster attrs, no stray attrsGo struct’s fixed layout
kw_only=Trueall fields must be passed by keywordTS object literal; named args
order=Truegenerates <, <=, >, >= (by field tuple)sortable structs
eq=Truevalue equality (default True)Go == on comparable structs

slots=True is close to free and worth defaulting to for data-heavy classes — it trades the ability to monkey-patch attributes for lower memory and faster access. kw_only=True is great for classes with many fields where positional args would be ambiguous.

from dataclasses import dataclass, field
@dataclass(order=True)
class Task:
# sort_index drives ordering; we hide it from the constructor + repr
sort_index: int = field(init=False, repr=False)
name: str
priority: int
def __post_init__(self) -> None:
# runs after the generated __init__ — derive fields, validate, normalize
self.sort_index = -self.priority
tasks = [Task("deploy", 1), Task("hotfix", 9), Task("docs", 2)]
tasks.sort() # highest priority first, via sort_index
[t.name for t in tasks] # ['hotfix', 'docs', 'deploy']

__post_init__ is where you do derived fields and light validation — the rough equivalent of logic in a TS constructor body or a Go constructor function. For real validation with good errors, that’s Pydantic’s job.

A generator is a function that pauses and resumes, yielding values lazily instead of building a list. This is identical in spirit to JS function* and very close to consuming a Go channel or a Go 1.23+ range-over-func iterator. The win is memory: you process a billion records without holding a billion in RAM.

function* countUp(n: number): Generator<number> {
for (let i = 0; i < n; i++) {
yield i;
}
}
for (const x of countUp(3)) console.log(x); // 0, 1, 2

A generator expression is a comprehension with () instead of [] — same syntax, but lazy and memory-flat.

# List comprehension: builds the whole list in memory (eager)
squares_list = [n * n for n in range(1_000_000)] # ~8 MB allocated now
# Generator expression: computes one value at a time (lazy)
squares_gen = (n * n for n in range(1_000_000)) # ~0 bytes until consumed
# Passing a genexp straight into a consumer — no intermediate list, no extra ()
total = sum(n * n for n in range(1_000_000))
first_big = next(n * n for n in range(1_000_000) if n * n > 100) # 121

itertools is the lazy, composable toolkit for iterators. Everything here returns an iterator, so chains stay memory-flat.

import itertools as it
# chain — concatenate iterables lazily (like [...a, ...b] but lazy)
list(it.chain([1, 2], [3, 4], [5])) # [1, 2, 3, 4, 5]
# islice — slice an iterator without building a list (works on infinite gens)
list(it.islice(count_up(1_000_000), 3)) # [0, 1, 2]
# pairwise (3.10+) — sliding window of 2; great for diffs/deltas
list(it.pairwise([1, 2, 4, 7])) # [(1,2), (2,4), (4,7)]
# accumulate — running totals (like a scan / cumulative reduce)
list(it.accumulate([1, 2, 3, 4])) # [1, 3, 6, 10]
# batched (3.12+) — fixed-size chunks; perfect for batched DB inserts / API calls
list(it.batched(range(7), 3)) # [(0,1,2), (3,4,5), (6,)]
# groupby — group CONSECUTIVE equal keys (sort first if you want global groups!)
data = [("a", 1), ("a", 2), ("b", 3), ("a", 4)]
for key, group in it.groupby(data, key=lambda t: t[0]):
print(key, [v for _, v in group])
# a [1, 2] b [3] a [4] <- note 'a' appears twice: groupby is consecutive-only
ConceptTypeScriptGoPython
Lazy producerfunction* generatorchannel / range-over-funcgenerator (yield)
Inline lazy expr— (no genexp)(x for x in ...)
Combinator libraryiter-helpers / RxJSmanualitertools
Backpressure modelpull (iterator protocol)push (channel send/recv)pull (iterator protocol)

Python and JS share a pull model (the consumer calls next()); Go channels are a push/communicate model. For pure data transformation, the pull model is simpler — no goroutine to manage, no channel to close.

functools holds the higher-order helpers. The standouts for backend work are the caching decorators and partial.

from functools import reduce, partial, cache, lru_cache, cached_property, singledispatch
# reduce — fold a sequence to one value. Python prefers sum()/comprehensions, but
# reduce is there for non-trivial accumulation (e.g. composing functions)
product = reduce(lambda acc, n: acc * n, [1, 2, 3, 4], 1) # 24
# partial — pre-bind arguments (like .bind() in JS, or a closure)
def fetch(base_url: str, path: str) -> str: ...
fetch_api = partial(fetch, "https://api.example.com")
fetch_api("/users") # base_url already bound
# cache (3.9+) — unbounded memoization; lru_cache(maxsize=N) for a bounded cache
@cache
def fib(n: int) -> int:
return n if n < 2 else fib(n - 1) + fib(n - 2)
fib(100) # instant — memoized; without @cache, exponential
fib.cache_info() # CacheInfo(hits=..., misses=..., ...)
@lru_cache(maxsize=1024)
def expensive_lookup(key: str) -> dict: ...

cached_property computes a value once on first access and stores it on the instance — ideal for derived fields that are expensive but stable:

from dataclasses import dataclass
from functools import cached_property
@dataclass
class Dataset:
rows: list[dict]
@cached_property
def summary(self) -> dict[str, float]:
# computed once, cached on the instance thereafter
return {"count": len(self.rows), "mean": sum(r["v"] for r in self.rows) / len(self.rows)}

singledispatch gives you function overloading by the type of the first argument — the Pythonic alternative to a chain of isinstance checks:

from functools import singledispatch
@singledispatch
def to_json(value: object) -> str:
raise TypeError(f"no encoder for {type(value)}")
@to_json.register
def _(value: int) -> str:
return str(value)
@to_json.register
def _(value: list) -> str:
return "[" + ", ".join(to_json(v) for v in value) + "]"
to_json([1, 2, 3]) # "[1, 2, 3]"

Python is multi-paradigm, leaning toward comprehensions over map/filter and expressions over explicit loops where it reads well.

map/filter vs comprehensions — prefer comprehensions

Section titled “map/filter vs comprehensions — prefer comprehensions”
nums = [1, 2, 3, 4, 5, 6]
# map/filter exist, but require a lambda and read inside-out
doubled = list(map(lambda n: n * 2, nums))
evens = list(filter(lambda n: n % 2 == 0, nums))
# Comprehensions are the idiomatic choice — clearer, no lambda, no list() wrap
doubled = [n * 2 for n in nums]
evens = [n for n in nums if n % 2 == 0]

map/filter earn their place when you’re passing an existing named function (map(str.upper, names)) or feeding a lazy iterator into another consumer. Otherwise, comprehension.

const users = [{ name: "Bob", age: 25 }, { name: "Al", age: 30 }];
const byAge = [...users].sort((a, b) => a.age - b.age);
const byNameLen = [...users].sort((a, b) => a.name.length - b.name.length);
const anyAdult = users.some(u => u.age >= 18);
const allAdults = users.every(u => u.age >= 18);

any/all over a generator expression short-circuit — they stop at the first hit/miss, so any(is_valid(x) for x in huge_stream) won’t scan the whole stream.

# Sequence unpacking with a "rest" via the star
first, *middle, last = [1, 2, 3, 4, 5] # 1, [2, 3, 4], 5
a, b = b, a # swap, no temp
# Merge/spread (3.5+): like {...a, ...b} and [...x, ...y]
merged = {**defaults, **overrides} # later wins
combined = [*list_a, *list_b]
# *args / **kwargs — variadic positional + keyword args
def log(level: str, *args: object, **kwargs: object) -> None:
print(level, args, kwargs)
log("INFO", "started", 1, 2, service="api", region="us")
# INFO ('started', 1, 2) {'service': 'api', 'region': 'us'}
# Spread a list/dict INTO a call (the inverse)
def make_user(name: str, age: int) -> dict: ...
fields = {"name": "Alice", "age": 30}
make_user(**fields) # spread dict as keyword args

match/case (3.10+) can destructure while it matches — far more powerful than a switch. It shines on shaped data like parsed JSON or events:

def handle(event: dict) -> str:
match event:
case {"type": "click", "x": int(x), "y": int(y)}:
return f"click at {x},{y}"
case {"type": "key", "key": str(k)}:
return f"key {k}"
case {"type": t}:
return f"unknown event {t}"
case _:
return "malformed"

This is structural destructuring + matching in one construct — closer to Rust’s match than to a C-style switch. (Covered in depth in Module 02.)

Use pathlib.Path for all filesystem work — it’s the object-oriented, modern API. (You’ll occasionally see os.path string-juggling in older code; pathlib replaces it.)

from pathlib import Path
data_dir = Path("data")
csv_file = data_dir / "sales.csv" # `/` joins paths — no os.path.join
csv_file.exists() # bool
csv_file.suffix # '.csv'
csv_file.stem # 'sales'
text = csv_file.read_text(encoding="utf-8") # one-liner read
csv_file.write_text("a,b,c\n") # one-liner write
# Glob and iterate — lazy, recursive with **
for path in data_dir.glob("**/*.csv"):
print(path)
# Streaming a big file line-by-line (the file object is itself an iterator)
with csv_file.open() as f:
for line in f: # lazy — one line in memory at a time
process(line)
names = ["alice", "bob", "charlie"]
scores = [85, 92, 78]
# enumerate — index + value. NEVER write `for i in range(len(names))`
for i, name in enumerate(names):
print(f"{i}: {name}")
for i, name in enumerate(names, start=1): # 1-based numbering
print(f"{i}. {name}")
# zip — pair up parallel iterables (built into the language, unlike TS)
for name, score in zip(names, scores):
print(f"{name}: {score}")
# zip + dict comprehension = build a lookup from two lists
scoreboard = {name: score for name, score in zip(names, scores)}
# zip(*matrix) transposes — a classic trick
matrix = [[1, 2, 3], [4, 5, 6]]
rows = list(zip(*matrix)) # [(1, 4), (2, 5), (3, 6)]

Put generators, itertools, and dataclasses together into one lazy pipeline — parse, filter, group, aggregate — without ever holding the full dataset in memory.