Data Structures & Functional Style

Python’s data model is small and sharp: four built-in collections, one transformation idiom (the comprehension), and a pair of standard-library modules (itertools, functools) that turn loops into pipelines. There’s no Lodash to reach for and no for i := range boilerplate — the language gives you most of it.

This module maps Python’s collections onto TS arrays/objects/Map/Set and Go slices/maps, then covers the moves that make Python feel like Python: comprehensions, dataclasses, lazy generators, and the functional standard library.

Core collections

Mental model

Need	TypeScript	Go	Python
Ordered, indexed, growable	`Array<T>`	`[]T` (slice)	`list`
Key → value	`Map` / object / `Record`	`map[K]V`	`dict`
Unique members	`Set<T>`	`map[T]struct{}`	`set`
Fixed, heterogeneous record	tuple type `[A, B]`	struct (roughly)	`tuple`
Hashable, frozen set	`ReadonlySet` (type-only)	—	`frozenset`

Two things trip up TS/Go devs immediately:

list is the only “array”. There’s no separate array type; list is a dynamic array (amortized O(1) append), not a linked list despite the name.
dict preserves insertion order (guaranteed since 3.7) and is the backbone of Python — objects, kwargs, JSON, and namespaces are all dicts under the hood.

list

const nums: number[] = [1, 2, 3];
nums.push(4);              // [1, 2, 3, 4]
nums[0];                   // 1
nums.slice(1, 3);          // [2, 3]
nums.includes(2);          // true
const doubled = nums.map(n => n * 2);

nums := []int{1, 2, 3}
nums = append(nums, 4)     // [1 2 3 4]
nums[0]                    // 1
nums[1:3]                  // [2 3]
// no built-in `includes` or `map` — write a loop

nums: list[int] = [1, 2, 3]
nums.append(4)             # [1, 2, 3, 4]
nums[0]                    # 1
nums[1:3]                  # [2, 3]  — slices are copies
nums[-1]                   # 4       — negative indexing
2 in nums                  # True    — O(n) membership
doubled = [n * 2 for n in nums]   # comprehension, not .map()

Slicing is Python’s superpower and has no clean TS/Go analog: nums[1:3], nums[::-1] (reverse), nums[::2] (every other). Slices always return a new list.

dict

const user = new Map<string, number>([["alice", 30]]);
user.set("bob", 25);
user.get("alice");          // 30
user.get("carol");          // undefined
user.has("bob");            // true

// or a plain object / Record
const cfg: Record<string, string> = { host: "localhost" };

user := map[string]int{"alice": 30}
user["bob"] = 25
v, ok := user["alice"]      // 30, true
_, ok = user["carol"]       // 0, false  (comma-ok)
delete(user, "bob")

user: dict[str, int] = {"alice": 30}
user["bob"] = 25
user["alice"]               # 30
user["carol"]               # KeyError! — unlike Go's zero value
user.get("carol")           # None — the safe accessor
user.get("carol", 0)        # 0 — with default
"bob" in user               # True — checks keys, O(1)
del user["bob"]
for name, age in user.items():
    ...

set and frozenset

const ids = new Set<number>([1, 2, 3]);
ids.add(3);                 // no-op, already present
ids.has(2);                 // true
ids.delete(1);
// set algebra: not built-in, you write loops

ids := map[int]struct{}{1: {}, 2: {}, 3: {}}
ids[4] = struct{}{}         // add
_, ok := ids[2]             // membership
delete(ids, 1)
// no set algebra in stdlib

ids: set[int] = {1, 2, 3}   # {} is an empty DICT — use set() for empty set
ids.add(3)                  # no-op
2 in ids                    # True, O(1)
ids.discard(1)              # remove if present (no error if absent)

# set algebra is built into the language
a, b = {1, 2, 3, 4}, {3, 4, 5, 6}
a | b                       # union        {1,2,3,4,5,6}
a & b                       # intersection {3,4}
a - b                       # difference   {1,2}
a ^ b                       # symmetric    {1,2,5,6}

# frozenset: immutable + hashable, so it can be a dict key or set member
seen: set[frozenset[str]] = {frozenset({"a", "b"})}

tuple

A tuple is an immutable, fixed-shape sequence — Python’s lightweight “struct.” TS models this with a tuple type [string, number]; Go uses a struct or multiple return values.

point: tuple[float, float] = (1.0, 2.0)
x, y = point                       # unpacking (destructuring)
rgb = (255, 128, 0)
rgb[0]                             # 255 — indexable, but no item assignment

# Tuples are the natural return type for "multiple values" (like Go's name, err)
def divmod_(a: int, b: int) -> tuple[int, int]:
    return a // b, a % b           # the parens are optional

q, r = divmod_(17, 5)              # 3, 2

# Single-element tuple needs the trailing comma — (x,) not (x)
singleton = (42,)

Use a tuple for a fixed, heterogeneous record where positions have meaning and the value won’t change. For a named fixed record, jump straight to a dataclass (below) or a NamedTuple.

When to use which

Use	When
`list`	ordered, mutable, you append/iterate; the default sequence
`tuple`	fixed-size, immutable record; dict keys; multiple return values
`dict`	key → value lookup; the default mapping
`set`	membership tests, dedup, set algebra; order doesn’t matter
`frozenset`	a set that must be hashable (dict key / set member)

`collections`: the specialized containers

The collections module has four workhorses you’ll reach for constantly.

from collections import defaultdict, Counter, deque, namedtuple

# defaultdict — auto-creates missing values; kills the "init if absent" dance
by_dept: defaultdict[str, list[str]] = defaultdict(list)
for name, dept in employees:
    by_dept[dept].append(name)     # no KeyError, no setdefault

# Counter — multiset / frequency map with a tally API
counts = Counter("mississippi")    # Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
counts.most_common(2)              # [('i', 4), ('s', 4)]
words = Counter(["a", "b", "a"])   # Counter({'a': 2, 'b': 1})

# deque — O(1) appends/pops at BOTH ends (list is O(n) at the front)
queue: deque[int] = deque([1, 2, 3])
queue.appendleft(0)                # cheap, unlike list.insert(0, x)
queue.popleft()                    # FIFO queue
ring = deque(maxlen=100)           # bounded ring buffer — old items drop off

# namedtuple — a tuple with named fields (lightweight, for quick records)
Point = namedtuple("Point", ["x", "y"])
p = Point(1, 2)
p.x, p[0]                          # 1, 1 — both work

Counter and defaultdict replace the most common hand-rolled map[K]int / map[K][]V loops from Go and the reduce-into-an-object pattern from TS. For named records prefer a dataclass over namedtuple unless you specifically want tuple behavior (immutability + positional access).

Comprehensions

This is the central Python idiom. A comprehension is map/filter folded into one expression that reads left-to-right. Once it clicks, you’ll write far fewer explicit loops than in Go and reach for it before .map().filter() chains.

List comprehension

const nums = [1, 2, 3, 4, 5, 6];
const evenSquares = nums
  .filter(n => n % 2 === 0)
  .map(n => n * n);
// [4, 16, 36]

nums := []int{1, 2, 3, 4, 5, 6}
var evenSquares []int
for _, n := range nums {
    if n%2 == 0 {
        evenSquares = append(evenSquares, n*n)
    }
}

nums = [1, 2, 3, 4, 5, 6]
even_squares = [n * n for n in nums if n % 2 == 0]
# [4, 16, 36]

Read it as: expression for item in iterable if condition. The TS chain allocates an intermediate array between filter and map; the comprehension does it in one pass.

dict and set comprehensions

names = ["alice", "bob", "charlie"]

# dict comprehension — { key: value for ... }
lengths = {name: len(name) for name in names}
# {'alice': 5, 'bob': 3, 'charlie': 7}

# invert a dict
by_length = {v: k for k, v in lengths.items()}

# set comprehension — { value for ... }
first_letters = {name[0] for name in names}     # {'a', 'b', 'c'}

# filter a dict down
prices = {"widget": 9.99, "gadget": 24.99, "doohickey": 4.99}
cheap = {k: v for k, v in prices.items() if v < 10}

The TS equivalent of a dict comprehension is Object.fromEntries(arr.map(...)) — clunkier and a separate function call.

Nested and conditional

# Nested loops — the for clauses read in the same order as nested fors
matrix = [[1, 2, 3], [4, 5, 6]]
flat = [x for row in matrix for x in row]        # [1, 2, 3, 4, 5, 6]

# Conditional EXPRESSION (ternary) in the output position — note: this `if/else`
# is part of the value, distinct from the trailing filter `if`
labels = ["even" if n % 2 == 0 else "odd" for n in nums]

# Build a grid (nested comprehension produces nested lists)
grid = [[r * c for c in range(3)] for r in range(3)]

The walrus operator `:=`

The walrus operator assigns and returns in one expression — useful in a comprehension to avoid computing something twice:

import json

raw = ['{"id": 1}', "not json", '{"id": 2}']

def try_parse(s: str) -> dict | None:
    try:
        return json.loads(s)
    except json.JSONDecodeError:
        return None

# Without walrus you'd call try_parse twice (once to filter, once to keep)
parsed = [obj for s in raw if (obj := try_parse(s)) is not None]
# [{'id': 1}, {'id': 2}]

This is the same pattern as if ((m = regex.exec(s)) !== null) in JS — bind the result of an expensive call to a name so you can both test it and use it.

Dataclasses

A dataclass is Python’s answer to a TS interface/class or a Go struct: declare the fields with type hints, and the decorator writes __init__, __repr__, and __eq__ for you. It’s the right tool for plain internal data.

The basics

interface User {
  name: string;
  age: number;
  email: string;
}
const u: User = { name: "Alice", age: 30, email: "a@b.com" };

// or a class with a constructor
class Point {
  constructor(public x: number, public y: number) {}
}

type User struct {
    Name  string
    Age   int
    Email string
}
u := User{Name: "Alice", Age: 30, Email: "a@b.com"}

from dataclasses import dataclass

@dataclass
class User:
    name: str
    age: int
    email: str

u = User(name="Alice", age=30, email="a@b.com")
u                       # User(name='Alice', age=30, email='a@b.com') — free __repr__
u == User("Alice", 30, "a@b.com")   # True — free __eq__ (value equality)

Mutable defaults and `field(default_factory=...)`

Plain defaults work like any language. But there’s a famous Python footgun: a mutable default ([], {}) is shared across all instances. Dataclasses detect this and force you to use field(default_factory=...).

from dataclasses import dataclass, field

@dataclass
class Order:
    id: str
    items: list[str] = field(default_factory=list)   # fresh list per instance
    tags: set[str] = field(default_factory=set)
    priority: int = 0                                 # immutable default is fine

frozen, slots, kw_only

from dataclasses import dataclass

@dataclass(frozen=True, slots=True, kw_only=True)
class Config:
    host: str
    port: int = 8080
    debug: bool = False

c = Config(host="localhost", port=5432)
# c.port = 5433        # FrozenInstanceError — frozen makes it immutable
hash(c)                # frozen dataclasses are hashable → usable as dict keys

Option	Effect	Maps to
`frozen=True`	immutable; instances become hashable	`readonly` fields / Go value semantics
`slots=True`	`__slots__` — less memory, faster attrs, no stray attrs	Go struct’s fixed layout
`kw_only=True`	all fields must be passed by keyword	TS object literal; named args
`order=True`	generates `<`, `<=`, `>`, `>=` (by field tuple)	sortable structs
`eq=True`	value equality (default `True`)	Go `==` on comparable structs

slots=True is close to free and worth defaulting to for data-heavy classes — it trades the ability to monkey-patch attributes for lower memory and faster access. kw_only=True is great for classes with many fields where positional args would be ambiguous.

`__post_init__` and ordering

from dataclasses import dataclass, field

@dataclass(order=True)
class Task:
    # sort_index drives ordering; we hide it from the constructor + repr
    sort_index: int = field(init=False, repr=False)
    name: str
    priority: int

    def __post_init__(self) -> None:
        # runs after the generated __init__ — derive fields, validate, normalize
        self.sort_index = -self.priority

tasks = [Task("deploy", 1), Task("hotfix", 9), Task("docs", 2)]
tasks.sort()                       # highest priority first, via sort_index
[t.name for t in tasks]            # ['hotfix', 'docs', 'deploy']

__post_init__ is where you do derived fields and light validation — the rough equivalent of logic in a TS constructor body or a Go constructor function. For real validation with good errors, that’s Pydantic’s job.

Generators & iterators

A generator is a function that pauses and resumes, yielding values lazily instead of building a list. This is identical in spirit to JS function* and very close to consuming a Go channel or a Go 1.23+ range-over-func iterator. The win is memory: you process a billion records without holding a billion in RAM.

yield

function* countUp(n: number): Generator<number> {
  for (let i = 0; i < n; i++) {
    yield i;
  }
}
for (const x of countUp(3)) console.log(x);  // 0, 1, 2

// Go 1.23+ range-over-func iterator
func countUp(n int) func(yield func(int) bool) {
    return func(yield func(int) bool) {
        for i := 0; i < n; i++ {
            if !yield(i) {
                return
            }
        }
    }
}
for x := range countUp(3) { fmt.Println(x) }  // 0, 1, 2

from collections.abc import Iterator

def count_up(n: int) -> Iterator[int]:
    for i in range(n):
        yield i                    # pause here, resume on next request

for x in count_up(3):
    print(x)                       # 0, 1, 2

Lazy evaluation and generator expressions

A generator expression is a comprehension with () instead of [] — same syntax, but lazy and memory-flat.

# List comprehension: builds the whole list in memory (eager)
squares_list = [n * n for n in range(1_000_000)]      # ~8 MB allocated now

# Generator expression: computes one value at a time (lazy)
squares_gen = (n * n for n in range(1_000_000))       # ~0 bytes until consumed

# Passing a genexp straight into a consumer — no intermediate list, no extra ()
total = sum(n * n for n in range(1_000_000))
first_big = next(n * n for n in range(1_000_000) if n * n > 100)   # 121

itertools — the lazy combinators

itertools is the lazy, composable toolkit for iterators. Everything here returns an iterator, so chains stay memory-flat.

import itertools as it

# chain — concatenate iterables lazily (like [...a, ...b] but lazy)
list(it.chain([1, 2], [3, 4], [5]))            # [1, 2, 3, 4, 5]

# islice — slice an iterator without building a list (works on infinite gens)
list(it.islice(count_up(1_000_000), 3))        # [0, 1, 2]

# pairwise (3.10+) — sliding window of 2; great for diffs/deltas
list(it.pairwise([1, 2, 4, 7]))                # [(1,2), (2,4), (4,7)]

# accumulate — running totals (like a scan / cumulative reduce)
list(it.accumulate([1, 2, 3, 4]))              # [1, 3, 6, 10]

# batched (3.12+) — fixed-size chunks; perfect for batched DB inserts / API calls
list(it.batched(range(7), 3))                  # [(0,1,2), (3,4,5), (6,)]

# groupby — group CONSECUTIVE equal keys (sort first if you want global groups!)
data = [("a", 1), ("a", 2), ("b", 3), ("a", 4)]
for key, group in it.groupby(data, key=lambda t: t[0]):
    print(key, [v for _, v in group])
# a [1, 2]   b [3]   a [4]   <- note 'a' appears twice: groupby is consecutive-only

Comparison: laziness across languages

Concept	TypeScript	Go	Python
Lazy producer	`function*` generator	channel / range-over-func	generator (`yield`)
Inline lazy expr	— (no genexp)	—	`(x for x in ...)`
Combinator library	iter-helpers / RxJS	manual	`itertools`
Backpressure model	pull (iterator protocol)	push (channel send/recv)	pull (iterator protocol)

Python and JS share a pull model (the consumer calls next()); Go channels are a push/communicate model. For pure data transformation, the pull model is simpler — no goroutine to manage, no channel to close.

functools

functools holds the higher-order helpers. The standouts for backend work are the caching decorators and partial.

from functools import reduce, partial, cache, lru_cache, cached_property, singledispatch

# reduce — fold a sequence to one value. Python prefers sum()/comprehensions, but
# reduce is there for non-trivial accumulation (e.g. composing functions)
product = reduce(lambda acc, n: acc * n, [1, 2, 3, 4], 1)   # 24

# partial — pre-bind arguments (like .bind() in JS, or a closure)
def fetch(base_url: str, path: str) -> str: ...
fetch_api = partial(fetch, "https://api.example.com")
fetch_api("/users")                # base_url already bound

# cache (3.9+) — unbounded memoization; lru_cache(maxsize=N) for a bounded cache
@cache
def fib(n: int) -> int:
    return n if n < 2 else fib(n - 1) + fib(n - 2)
fib(100)                           # instant — memoized; without @cache, exponential
fib.cache_info()                   # CacheInfo(hits=..., misses=..., ...)

@lru_cache(maxsize=1024)
def expensive_lookup(key: str) -> dict: ...

cached_property computes a value once on first access and stores it on the instance — ideal for derived fields that are expensive but stable:

from dataclasses import dataclass
from functools import cached_property

@dataclass
class Dataset:
    rows: list[dict]

    @cached_property
    def summary(self) -> dict[str, float]:
        # computed once, cached on the instance thereafter
        return {"count": len(self.rows), "mean": sum(r["v"] for r in self.rows) / len(self.rows)}

singledispatch gives you function overloading by the type of the first argument — the Pythonic alternative to a chain of isinstance checks:

from functools import singledispatch

@singledispatch
def to_json(value: object) -> str:
    raise TypeError(f"no encoder for {type(value)}")

@to_json.register
def _(value: int) -> str:
    return str(value)

@to_json.register
def _(value: list) -> str:
    return "[" + ", ".join(to_json(v) for v in value) + "]"

to_json([1, 2, 3])                 # "[1, 2, 3]"

Functional style

Python is multi-paradigm, leaning toward comprehensions over map/filter and expressions over explicit loops where it reads well.

map/filter vs comprehensions — prefer comprehensions

nums = [1, 2, 3, 4, 5, 6]

# map/filter exist, but require a lambda and read inside-out
doubled = list(map(lambda n: n * 2, nums))
evens = list(filter(lambda n: n % 2 == 0, nums))

# Comprehensions are the idiomatic choice — clearer, no lambda, no list() wrap
doubled = [n * 2 for n in nums]
evens = [n for n in nums if n % 2 == 0]

map/filter earn their place when you’re passing an existing named function (map(str.upper, names)) or feeding a lazy iterator into another consumer. Otherwise, comprehension.

sorted with key=, any/all

const users = [{ name: "Bob", age: 25 }, { name: "Al", age: 30 }];
const byAge = [...users].sort((a, b) => a.age - b.age);
const byNameLen = [...users].sort((a, b) => a.name.length - b.name.length);
const anyAdult = users.some(u => u.age >= 18);
const allAdults = users.every(u => u.age >= 18);

sort.Slice(users, func(i, j int) bool { return users[i].Age < users[j].Age })
anyAdult := false
for _, u := range users { if u.Age >= 18 { anyAdult = true; break } }

users = [{"name": "Bob", "age": 25}, {"name": "Al", "age": 30}]

# sorted() returns a NEW list; .sort() mutates in place. key= takes a function.
by_age = sorted(users, key=lambda u: u["age"])
by_name_len = sorted(users, key=lambda u: len(u["name"]))
oldest_first = sorted(users, key=lambda u: u["age"], reverse=True)

# multi-key sort: return a tuple (sorts by age, then name)
by_age_name = sorted(users, key=lambda u: (u["age"], u["name"]))

# operator.itemgetter / attrgetter are faster, lambda-free key functions
from operator import itemgetter
by_age = sorted(users, key=itemgetter("age"))

any_adult = any(u["age"] >= 18 for u in users)    # short-circuits, lazy
all_adults = all(u["age"] >= 18 for u in users)

any/all over a generator expression short-circuit — they stop at the first hit/miss, so any(is_valid(x) for x in huge_stream) won’t scan the whole stream.

Unpacking, *args, **kwargs

# Sequence unpacking with a "rest" via the star
first, *middle, last = [1, 2, 3, 4, 5]      # 1, [2, 3, 4], 5
a, b = b, a                                 # swap, no temp

# Merge/spread (3.5+): like {...a, ...b} and [...x, ...y]
merged = {**defaults, **overrides}          # later wins
combined = [*list_a, *list_b]

# *args / **kwargs — variadic positional + keyword args
def log(level: str, *args: object, **kwargs: object) -> None:
    print(level, args, kwargs)

log("INFO", "started", 1, 2, service="api", region="us")
# INFO ('started', 1, 2) {'service': 'api', 'region': 'us'}

# Spread a list/dict INTO a call (the inverse)
def make_user(name: str, age: int) -> dict: ...
fields = {"name": "Alice", "age": 30}
make_user(**fields)                          # spread dict as keyword args

Structural pattern matching

match/case (3.10+) can destructure while it matches — far more powerful than a switch. It shines on shaped data like parsed JSON or events:

def handle(event: dict) -> str:
    match event:
        case {"type": "click", "x": int(x), "y": int(y)}:
            return f"click at {x},{y}"
        case {"type": "key", "key": str(k)}:
            return f"key {k}"
        case {"type": t}:
            return f"unknown event {t}"
        case _:
            return "malformed"

This is structural destructuring + matching in one construct — closer to Rust’s match than to a C-style switch. (Covered in depth in Module 02.)

Files, enumerate, zip

pathlib for filesystem work

Use pathlib.Path for all filesystem work — it’s the object-oriented, modern API. (You’ll occasionally see os.path string-juggling in older code; pathlib replaces it.)

from pathlib import Path

data_dir = Path("data")
csv_file = data_dir / "sales.csv"          # `/` joins paths — no os.path.join
csv_file.exists()                          # bool
csv_file.suffix                            # '.csv'
csv_file.stem                              # 'sales'

text = csv_file.read_text(encoding="utf-8")        # one-liner read
csv_file.write_text("a,b,c\n")                     # one-liner write

# Glob and iterate — lazy, recursive with **
for path in data_dir.glob("**/*.csv"):
    print(path)

# Streaming a big file line-by-line (the file object is itself an iterator)
with csv_file.open() as f:
    for line in f:                          # lazy — one line in memory at a time
        process(line)

enumerate and zip

names = ["alice", "bob", "charlie"]
scores = [85, 92, 78]

# enumerate — index + value. NEVER write `for i in range(len(names))`
for i, name in enumerate(names):
    print(f"{i}: {name}")
for i, name in enumerate(names, start=1):   # 1-based numbering
    print(f"{i}. {name}")

# zip — pair up parallel iterables (built into the language, unlike TS)
for name, score in zip(names, scores):
    print(f"{name}: {score}")

# zip + dict comprehension = build a lookup from two lists
scoreboard = {name: score for name, score in zip(names, scores)}

# zip(*matrix) transposes — a classic trick
matrix = [[1, 2, 3], [4, 5, 6]]
rows = list(zip(*matrix))                   # [(1, 4), (2, 5), (3, 6)]

Practice

Put generators, itertools, and dataclasses together into one lazy pipeline — parse, filter, group, aggregate — without ever holding the full dataset in memory.

Data Pipeline Build a lazy stream-processing pipeline over records with generators, itertools, and dataclasses — compose, don't materialize.