Data Pipeline
Build a CSV data processing pipeline that reads sales data, transforms it with
chained collection operations, and prints a summary report. This is the exercise
where Kotlin’s collection operators start to feel like a small query language —
the kind of map/filter/groupBy chaining you’d reach for array.reduce or a
for loop in TypeScript/Go.
What you’ll practice
Section titled “What you’ll practice”- Chaining collection operations:
map,filter,groupBy,sortedBy,sumOf - Sequences for efficient large-data processing
- Scope functions (
let,also,run) - Destructuring in loops and lambdas
buildList/buildMapfor constructing results
Requirements
Section titled “Requirements”The program reads CSV rows with the columns
date,region,product,quantity,unit_price and produces a report with five
sections.
- Parse each CSV line into a
Saledata class (skip the header, ignore malformed rows). - Print the totals: total revenue, total order count, average order value.
- Break revenue down by region, sorted by revenue descending.
- Rank the top products by revenue, with their total units.
- Show the daily trend — revenue per date, in date order.
- List high-value orders (revenue over $100).
Example input
Section titled “Example input”date,region,product,quantity,unit_price2024-01-15,North,Widget,10,9.992024-01-15,South,Gadget,5,24.992024-01-16,North,Gadget,3,24.992024-01-16,East,Widget,20,9.992024-01-17,South,Widget,15,9.992024-01-17,North,Doohickey,8,4.992024-01-18,East,Gadget,12,24.992024-01-18,West,Widget,7,9.992024-01-19,North,Widget,25,9.992024-01-19,South,Doohickey,30,4.99Expected output
Section titled “Expected output”=== Sales Report ===
Total Revenue: $1,279.52Total Orders: 10Average Order Value: $127.95
--- Revenue by Region --- North: $524.57 (4 orders) South: $424.55 (3 orders) East: $499.68 (2 orders) West: $69.93 (1 orders)
--- Top Products --- 1. Widget - $769.23 (77 units) 2. Gadget - $374.85 (20 units) 3. Doohickey - $189.40 (38 units)
--- Daily Trend --- 2024-01-15: $224.85 2024-01-16: $274.87 2024-01-17: $189.77 2024-01-18: $369.81 2024-01-19: $399.55
--- High Value Orders (> $100) --- 2024-01-16, East: 20x Widget = $199.80 2024-01-18, East: 12x Gadget = $299.88 2024-01-19, North: 25x Widget = $249.75 2024-01-19, South: 30x Doohickey = $149.70The worked solution
Section titled “The worked solution”A single-module Gradle project with one Kotlin file. No serialization library this time — the data is plain CSV, parsed by hand.
Directorydata-pipeline/
- build.gradle.kts deps + build config
- settings.gradle.kts project name
Directorysrc/
Directorymain/
Directorykotlin/com/example/datapipeline/
- Main.kt the whole pipeline
Directoryresources/
- sales.csv sample data for the stretch goal
build.gradle.kts
Section titled “build.gradle.kts”The simplest possible build: the JVM plugin to compile, application so
./gradlew run works, and nothing else. There’s no runtime dependency because the
parsing is all standard-library String operations.
plugins { kotlin("jvm") version "2.1.0" application}
group = "com.example"version = "1.0-SNAPSHOT"
repositories { mavenCentral()}
dependencies { testImplementation(kotlin("test"))}
tasks.test { useJUnitPlatform()}
application { mainClass.set("com.example.datapipeline.MainKt")}rootProject.name = "data-pipeline"The data classes
Section titled “The data classes”Three small data classes model the domain. Sale is the parsed row; the two
summary types hold the aggregated results. The interesting line is the computed
property revenue — it’s not a stored field, it’s recalculated on every access
(get()), the Kotlin equivalent of a TypeScript getter.
package com.example.datapipeline
data class Sale( val date: String, val region: String, val product: String, val quantity: Int, val unitPrice: Double) { val revenue: Double get() = quantity * unitPrice}
data class RegionSummary( val region: String, val totalRevenue: Double, val orderCount: Int)
data class ProductSummary( val product: String, val totalRevenue: Double, val totalUnits: Int)Parsing the CSV
Section titled “Parsing the CSV”Two functions, both built from collection operators. parseCsvLine returns
Sale? — a nullable — and uses the Elvis operator ?: return null so a bad number
short-circuits the whole row to null. parseCsv then chains the cleanup:
drop(1) skips the header, filter removes blank lines, and mapNotNull parses
each line and drops any that came back null in one step. That mapNotNull is
the idiom to reach for whenever a TS dev would write .map(...).filter(Boolean).
fun parseCsvLine(line: String): Sale? { val parts = line.split(",") if (parts.size != 5) return null return Sale( date = parts[0].trim(), region = parts[1].trim(), product = parts[2].trim(), quantity = parts[3].trim().toIntOrNull() ?: return null, unitPrice = parts[4].trim().toDoubleOrNull() ?: return null )}
fun parseCsv(csv: String): List<Sale> { return csv.lines() .drop(1) // skip header .filter { it.isNotBlank() } .mapNotNull { parseCsvLine(it) }}Generating the report
Section titled “Generating the report”This is the heart of the exercise — five independent collection pipelines, each one reading top-to-bottom like a description of what you want, not how to loop.
fun generateReport(sales: List<Sale>) { val totalRevenue = sales.sumOf { it.revenue } val totalOrders = sales.size val avgOrderValue = if (totalOrders > 0) totalRevenue / totalOrders else 0.0
println("=== Sales Report ===") println() println("Total Revenue: $${"%,.2f".format(totalRevenue)}") println("Total Orders: $totalOrders") println("Average Order Value: $${"%,.2f".format(avgOrderValue)}")
// Revenue by region println() println("--- Revenue by Region ---") sales .groupBy { it.region } .map { (region, regionSales) -> RegionSummary( region = region, totalRevenue = regionSales.sumOf { it.revenue }, orderCount = regionSales.size ) } .sortedByDescending { it.totalRevenue } .forEach { (region, revenue, count) -> println(" %-6s: $%,.2f (%d orders)".format(region, revenue, count)) }
// Top products println() println("--- Top Products ---") sales .groupBy { it.product } .map { (product, productSales) -> ProductSummary( product = product, totalRevenue = productSales.sumOf { it.revenue }, totalUnits = productSales.sumOf { it.quantity } ) } .sortedByDescending { it.totalRevenue } .forEachIndexed { index, (product, revenue, units) -> println(" ${index + 1}. %-10s - $%,.2f (%d units)".format(product, revenue, units)) }
// Daily trend println() println("--- Daily Trend ---") sales .groupBy { it.date } .mapValues { (_, daySales) -> daySales.sumOf { it.revenue } } .toSortedMap() .forEach { (date, revenue) -> println(" $date: $${"%,.2f".format(revenue)}") }
// High value orders println() println("--- High Value Orders (> \$100) ---") sales .filter { it.revenue > 100.0 } .sortedByDescending { it.revenue } .forEach { sale -> println(" ${sale.date}, ${sale.region}: ${sale.quantity}x ${sale.product} = $${"%,.2f".format(sale.revenue)}") }}A few things to notice if you’re coming from TS/Go:
sumOf { it.revenue }adds up a projection in one call — no accumulator variable, noreducewith a seed. There are typed overloads (sumOfoverIntvsDouble), so the result type matches what your lambda returns.groupBy { it.region }returns aMap<String, List<Sale>>. From there.maptransforms each entry into a summary object, exactly the group-then-aggregate move you’d hand-roll with amap[string][]Salein Go.- Destructuring in the lambda:
.forEach { (region, revenue, count) -> … }unpacks aRegionSummarystraight into three named parameters because it’s a data class..map { (region, regionSales) -> … }does the same for aMap.Entry. mapValuesrewrites only the values of a map, leaving the keys;toSortedMapreorders by key so the daily trend comes out in date order for free (ISO dates sort lexicographically).forEachIndexedhands you the position alongside the element — that’s where the1.,2.,3.ranking comes from.- The format strings (
%-6s,%,.2f,%d) are Java’sString.formatreached via the.formatextension — left-padding, thousands separators, and fixed decimals for the aligned columns.
Wiring it together
Section titled “Wiring it together”main holds the sample CSV in a trimIndent-ed raw string, parses it, and reports.
The one idiom worth calling out is .also { … }: it runs a side effect (the
“Parsed N records” log) and returns the original value untouched, so it slots into
the middle of an expression without breaking the chain.
fun main() { val csvData = """ date,region,product,quantity,unit_price 2024-01-15,North,Widget,10,9.99 2024-01-15,South,Gadget,5,24.99 2024-01-16,North,Gadget,3,24.99 2024-01-16,East,Widget,20,9.99 2024-01-17,South,Widget,15,9.99 2024-01-17,North,Doohickey,8,4.99 2024-01-18,East,Gadget,12,24.99 2024-01-18,West,Widget,7,9.99 2024-01-19,North,Widget,25,9.99 2024-01-19,South,Doohickey,30,4.99 """.trimIndent()
val sales = parseCsv(csvData) .also { println("Parsed ${it.size} sales records\n") }
generateReport(sales)}Run it
Section titled “Run it”-
Build the project:
Terminal window ./gradlew build -
Run it — the sample data is embedded, so no input needed:
Terminal window ./gradlew run