Skip to content

Observability

Production systems need three pillars of observability: logging, metrics, and tracing. They also need structured error handling that’s explicit about failure modes. This module covers all of these in Kotlin/JVM, mapped to what you already know from TypeScript and Go.

A quick map of which tool plays which role in each ecosystem:

ConceptTypeScriptGoKotlin/JVM
Logging librarywinston / pinoslog / zap / zerologSLF4J + Logback
Structured loggingpino (JSON by default)slog (structured by default)logstash-logback-encoder
Logging wrapperslog stdlibkotlin-logging (io.github.oshai)
Metricsprom-clientprometheus/client_golangMicrometer
Metrics endpointcustom /metricspromhttp.Handler()Spring Actuator /actuator/prometheus
Tracing@opentelemetry/sdk-nodego.opentelemetry.io/otelopentelemetry-java
Health checkscustom /healthcustom or frameworkSpring Actuator /actuator/health
Error handlingError subclasses / neverthrowerror interface, fmt.ErrorfResult, sealed classes, exceptions

The dependencies that show up across this module:

build.gradle.kts
dependencies {
// Logging
implementation("ch.qos.logback:logback-classic:1.5.12")
implementation("io.github.oshai:kotlin-logging-jvm:7.0.3")
implementation("net.logstash.logback:logstash-logback-encoder:8.0")
// Metrics (Micrometer + Prometheus)
implementation("io.micrometer:micrometer-registry-prometheus:1.14.2")
// Spring Boot (includes most of the above)
implementation("org.springframework.boot:spring-boot-starter-actuator")
// OpenTelemetry
implementation(platform("io.opentelemetry:opentelemetry-bom:1.44.1"))
implementation("io.opentelemetry:opentelemetry-api")
implementation("io.opentelemetry:opentelemetry-sdk")
implementation("io.opentelemetry:opentelemetry-exporter-otlp")
}

Kotlin sits between TypeScript (exceptions everywhere) and Go (explicit error returns). You have three approaches:

ApproachWhen to use
ExceptionsTruly exceptional, unrecoverable situations (OOM, broken connections, programmer errors)
kotlin.ResultWrapping a single operation that may fail
Sealed class hierarchiesDomain errors with distinct failure modes (the recommended approach)

The same “user not found” problem, the idiomatic way in each language:

// Option 1: Throw
function getUser(id: string): User {
const user = db.find(id);
if (!user) throw new NotFoundError(`User ${id} not found`);
return user;
}
// Option 2: neverthrow / fp-ts Result
function getUser(id: string): Result<User, AppError> {
const user = db.find(id);
if (!user) return err(new NotFoundError(`User ${id}`));
return ok(user);
}

Key differences:

  • The sealed class Result<T> makes failure a value, like Go’s error return — but the compiler tracks the exact failure cases via the sealed hierarchy.
  • Unlike Go’s untyped error, each AppError subtype carries its own typed fields (NotFound(resource, id), Validation(field, message)).
  • Unlike TypeScript’s throw, the failure modes are visible in the return type, not hidden behind try/catch.
// BAD: Callers don't know what can go wrong
fun createUser(name: String, email: String): User {
if (name.isBlank()) throw IllegalArgumentException("Name required")
if (!email.contains("@")) throw IllegalArgumentException("Invalid email")
if (userRepo.existsByEmail(email)) throw ConflictException("Email taken")
return userRepo.save(User(name = name, email = email))
}
// GOOD: The return type tells the full story
fun createUser(name: String, email: String): Result<User> {
if (name.isBlank()) return Result.Failure(AppError.Validation("name", "Name is required"))
if (!email.contains("@")) return Result.Failure(AppError.Validation("email", "Invalid email"))
if (userRepo.existsByEmail(email)) return Result.Failure(AppError.Conflict("Email already taken"))
val user = userRepo.save(User(name = name, email = email))
return Result.Success(user)
}

The sealed class approach gives you:

  • Exhaustive when: the compiler forces you to handle every error case.
  • No hidden control flow: no try/catch guessing games.
  • Self-documenting: the function signature tells you what can fail.
  • Composable: easy to map, flatMap, and chain results.

Adding combinators turns Result<T> into something you can chain like a neverthrow or fp-ts result:

sealed class Result<out T> {
data class Success<T>(val value: T) : Result<T>()
data class Failure(val error: AppError) : Result<Nothing>()
fun <R> map(transform: (T) -> R): Result<R> = when (this) {
is Success -> Success(transform(value))
is Failure -> this
}
fun <R> flatMap(transform: (T) -> Result<R>): Result<R> = when (this) {
is Success -> transform(value)
is Failure -> this
}
fun getOrNull(): T? = when (this) {
is Success -> value
is Failure -> null
}
fun getOrElse(default: () -> @UnsafeVariance T): T = when (this) {
is Success -> value
is Failure -> default()
}
fun onSuccess(action: (T) -> Unit): Result<T> {
if (this is Success) action(value)
return this
}
fun onFailure(action: (AppError) -> Unit): Result<T> {
if (this is Failure) action(error)
return this
}
}
class TaskService(
private val taskRepo: TaskRepository,
private val userRepo: UserRepository
) {
fun createTask(userId: String, title: String, description: String): Result<Task> {
// Validate
if (title.isBlank()) {
return Result.Failure(AppError.Validation("title", "Title is required"))
}
if (title.length > 200) {
return Result.Failure(AppError.Validation("title", "Title must be under 200 chars"))
}
// Check user exists
val user = userRepo.findById(userId)
?: return Result.Failure(AppError.NotFound("User", userId))
// Create
val task = taskRepo.save(
Task(title = title, description = description, assignedTo = user.id)
)
return Result.Success(task)
}
fun completeTask(taskId: String, userId: String): Result<Task> {
val task = taskRepo.findById(taskId)
?: return Result.Failure(AppError.NotFound("Task", taskId))
if (task.assignedTo != userId) {
return Result.Failure(AppError.Unauthorized("Only the assignee can complete this task"))
}
if (task.completed) {
return Result.Failure(AppError.Conflict("Task is already completed"))
}
val updated = taskRepo.save(task.copy(completed = true))
return Result.Success(updated)
}
}

Mapping Result to HTTP responses (Spring Boot)

Section titled “Mapping Result to HTTP responses (Spring Boot)”

A single when over the sealed Result turns domain outcomes into HTTP status codes — no scattered try/catch in your controllers:

src/main/kotlin/com/example/TaskController.kt
import org.springframework.http.ResponseEntity
import org.springframework.http.HttpStatus
import org.springframework.web.bind.annotation.*
@RestController
@RequestMapping("/api/tasks")
class TaskController(private val taskService: TaskService) {
@PostMapping
fun createTask(@RequestBody request: CreateTaskRequest): ResponseEntity<Any> {
return when (val result = taskService.createTask(request.userId, request.title, request.description)) {
is Result.Success -> ResponseEntity.status(HttpStatus.CREATED).body(result.value)
is Result.Failure -> result.error.toResponse()
}
}
@PatchMapping("/{id}/complete")
fun completeTask(
@PathVariable id: String,
@RequestHeader("X-User-Id") userId: String
): ResponseEntity<Any> {
return when (val result = taskService.completeTask(id, userId)) {
is Result.Success -> ResponseEntity.ok(result.value)
is Result.Failure -> result.error.toResponse()
}
}
}
// Extension function to map AppError to HTTP responses
fun AppError.toResponse(): ResponseEntity<Any> = when (this) {
is AppError.NotFound -> ResponseEntity.status(HttpStatus.NOT_FOUND)
.body(ErrorResponse("NOT_FOUND", "$resource with id $id not found"))
is AppError.Validation -> ResponseEntity.status(HttpStatus.BAD_REQUEST)
.body(ErrorResponse("VALIDATION_ERROR", "$field: $message"))
is AppError.Conflict -> ResponseEntity.status(HttpStatus.CONFLICT)
.body(ErrorResponse("CONFLICT", message))
is AppError.Unauthorized -> ResponseEntity.status(HttpStatus.FORBIDDEN)
.body(ErrorResponse("UNAUTHORIZED", reason))
is AppError.Internal -> ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(ErrorResponse("INTERNAL_ERROR", "An internal error occurred"))
}
data class ErrorResponse(val code: String, val message: String)

Kotlin has a built-in Result type that wraps success or exception. It’s useful for simple cases:

fun parseAge(input: String): kotlin.Result<Int> = runCatching {
val age = input.toInt()
require(age in 0..150) { "Age out of range: $age" }
age
}
fun main() {
parseAge("25")
.onSuccess { println("Age: $it") }
.onFailure { println("Error: ${it.message}") }
val age = parseAge("abc").getOrDefault(-1)
println("Parsed: $age") // -1
// Map / recover
val result = parseAge("25")
.map { it * 365 } // 25 * 365
.getOrElse { 0 }
println("Days: $result") // 9125
}

When to use kotlin.Result vs a sealed class:

  • kotlin.Result — wrapping a single fallible operation (parsing, I/O).
  • Sealed class Result<T> — domain errors with distinct types (your service layer).

Follow this rule (similar to Go’s philosophy):

SituationUse
Expected failure (user input, not found, conflict)Result / sealed class
Programmer error (null deref, index OOB)Let it crash (exception)
Infrastructure failure (DB down, network error)Exception at boundary, catch and wrap in Result at service layer
Library API designResult for operations that commonly fail
// DO: Expected failures return Result
fun findUser(id: String): Result<User> { /* ... */ }
fun validateEmail(email: String): Result<String> { /* ... */ }
// DON'T: Don't throw for expected failures
fun findUser(id: String): User {
return repo.find(id) ?: throw NotFoundException("...") // Bad
}
// OK: Infrastructure exceptions -- catch at boundary
fun findUser(id: String): Result<User> {
return try {
val user = repo.find(id) // May throw DB exception
?: return Result.Failure(AppError.NotFound("User", id))
Result.Success(user)
} catch (e: Exception) {
Result.Failure(AppError.Internal(e))
}
}

SLF4J is the facade (like an interface). Logback is the implementation. This separation means you can swap implementations without changing code — similar to how Go’s slog is the standard interface.

JVM logging pipeline
Rendering diagram…

The same “create user, log it, log failures” loop in each ecosystem:

import winston from 'winston';
const logger = winston.createLogger({
level: 'info',
format: winston.format.json(),
transports: [new winston.transports.Console()],
});
logger.info('Server started', { port: 8080 });
logger.error('Failed to connect', { error: err.message });

Key differences:

  • SLF4J uses {} placeholders (not string interpolation) so the message is only formatted if the level is enabled.
  • The convention is one logger per class via LoggerFactory.getLogger(Foo::class.java).
  • Passing the exception as the last argument (logger.error("...", name, e)) logs the full stack trace.
LevelWhen to useExample
TRACEVery detailed debugginglogger.trace("Parsing token: {}", token)
DEBUGDeveloper debugging infologger.debug("Cache miss for key: {}", key)
INFONormal operationslogger.info("Server started on port {}", port)
WARNSomething unexpected but handledlogger.warn("Retry attempt {} for {}", attempt, url)
ERRORSomething failedlogger.error("Database connection failed", exception)

Create src/main/resources/logback.xml:

src/main/resources/logback.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<!-- Console output with colors (for development) -->
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<!-- File output with rotation -->
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/app.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<fileNamePattern>logs/app.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<maxHistory>30</maxHistory>
<totalSizeCap>3GB</totalSizeCap>
</rollingPolicy>
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<!-- Set log levels per package -->
<logger name="com.example" level="DEBUG"/>
<logger name="org.springframework" level="INFO"/>
<logger name="org.hibernate.SQL" level="DEBUG"/>
<!-- Root level -->
<root level="INFO">
<appender-ref ref="CONSOLE"/>
<appender-ref ref="FILE"/>
</root>
</configuration>

In Spring Boot, configure logging in application.yml (which overrides logback defaults):

application.yml
logging:
level:
root: INFO
com.example: DEBUG
org.springframework.web: INFO
org.hibernate.SQL: DEBUG
pattern:
console: "%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n"
file:
name: logs/app.log
max-size: 100MB
max-history: 30

For per-environment behavior, layer the config files like this:

  • Directorysrc/main/resources/
    • logback-spring.xml Spring-aware logback config
    • application.yml default config
    • application-dev.yml dev overrides
    • application-prod.yml prod overrides

logback-spring.xml with Spring profiles:

src/main/resources/logback-spring.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<include resource="org/springframework/boot/logging/logback/defaults.xml"/>
<!-- Development: human-readable console -->
<springProfile name="dev">
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%clr(%d{HH:mm:ss.SSS}){faint} %clr(%-5level) %clr(%logger{36}){cyan} - %msg%n</pattern>
</encoder>
</appender>
<root level="DEBUG">
<appender-ref ref="CONSOLE"/>
</root>
</springProfile>
<!-- Production: JSON structured logging -->
<springProfile name="prod">
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
</root>
</springProfile>
</configuration>

Structured logging outputs JSON instead of plain text, making logs machine-parseable for tools like ELK, Loki, or Datadog.

# Plain text (hard to parse)
2024-01-15 10:23:45.123 INFO UserService - User created: id=123, name=Alice
# Structured JSON (machine-parseable)
{"timestamp":"2024-01-15T10:23:45.123Z","level":"INFO","logger":"UserService","message":"User created","userId":"123","userName":"Alice"}
build.gradle.kts
dependencies {
implementation("net.logstash.logback:logstash-logback-encoder:8.0")
}

logback.xml for JSON output:

src/main/resources/logback.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<includeMdcKeyName>requestId</includeMdcKeyName>
<includeMdcKeyName>userId</includeMdcKeyName>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
</root>
</configuration>

Adding context with MDC (Mapped Diagnostic Context)

Section titled “Adding context with MDC (Mapped Diagnostic Context)”

MDC attaches key-value context to all log messages in the current thread. This is how you correlate logs for a single request — the JVM equivalent of Go’s slog.With(...) or Node’s AsyncLocalStorage.

logger := slog.With("requestId", requestId, "userId", userId)
logger.Info("Processing request")

Key differences:

  • Go threads context explicitly via context.Context; MDC is thread-local, so you put at the start of the request and clear in a finally.
  • Once set, every log line on that thread automatically includes requestId, method, and path — no need to pass the logger around.

Now all logs within this request automatically include the MDC fields:

{
"timestamp": "2024-01-15T10:23:45.123Z",
"level": "INFO",
"logger": "com.example.UserService",
"message": "User created",
"requestId": "abc-123",
"method": "POST",
"path": "/api/users"
}

For adding fields to specific log statements (not thread-wide):

import net.logstash.logback.argument.StructuredArguments.*
import org.slf4j.LoggerFactory
class OrderService {
private val logger = LoggerFactory.getLogger(OrderService::class.java)
fun processOrder(orderId: String, total: Double) {
logger.info(
"Processing order",
keyValue("orderId", orderId),
keyValue("total", total),
keyValue("currency", "USD")
)
// Output: {"message":"Processing order","orderId":"abc","total":99.99,"currency":"USD"}
}
}

kotlin-logging provides an idiomatic Kotlin wrapper around SLF4J. It’s like using slog in Go instead of the raw log package.

build.gradle.kts
dependencies {
implementation("io.github.oshai:kotlin-logging-jvm:7.0.3")
}
import io.github.oshai.kotlinlogging.KotlinLogging
// Create logger -- one per file (not per class)
private val logger = KotlinLogging.logger {}
class UserService(private val repo: UserRepository) {
fun createUser(name: String, email: String): User {
logger.info { "Creating user: name=$name, email=$email" }
val user = repo.save(User(name = name, email = email))
logger.info { "User created: id=${user.id}" }
return user
}
fun deleteUser(id: String) {
logger.debug { "Deleting user: id=$id" }
try {
repo.delete(id)
logger.info { "User deleted: id=$id" }
} catch (e: Exception) {
logger.error(e) { "Failed to delete user: id=$id" }
}
}
}
// SLF4J: message is always evaluated (even if debug is disabled)
logger.debug("Processing items: count=${expensiveCount()}")
// kotlin-logging: lambda is only evaluated if debug is enabled
logger.debug { "Processing items: count=${expensiveCount()}" }

The lambda-based API avoids unnecessary string concatenation when the log level is disabled. This matters in hot paths.

import io.github.oshai.kotlinlogging.KotlinLogging
import net.logstash.logback.argument.StructuredArguments.keyValue
private val logger = KotlinLogging.logger {}
fun processPayment(orderId: String, amount: Double) {
logger.atInfo {
message = "Payment processed"
payload = mapOf(
"orderId" to orderId,
"amount" to amount,
"currency" to "USD"
)
}
}

MDC is thread-local, but coroutines can switch threads. Use MDCContext to preserve MDC across coroutine suspension points:

import kotlinx.coroutines.*
import kotlinx.coroutines.slf4j.MDCContext
import org.slf4j.MDC
import io.github.oshai.kotlinlogging.KotlinLogging
private val logger = KotlinLogging.logger {}
suspend fun handleRequest(requestId: String) {
MDC.put("requestId", requestId)
// MDCContext copies MDC to the coroutine context
withContext(MDCContext()) {
logger.info { "Starting request processing" }
// Even after suspension, MDC is preserved
val result = withContext(Dispatchers.IO + MDCContext()) {
logger.info { "Fetching from database" } // requestId still in MDC
fetchFromDatabase()
}
logger.info { "Request processed: result=$result" }
}
}

Add the dependency:

build.gradle.kts
dependencies {
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-slf4j:1.9.0")
}

Micrometer is a metrics facade (like SLF4J for logging). It supports multiple backends: Prometheus, Datadog, New Relic, CloudWatch, etc. You write code once and switch backends via configuration.

Micrometer metrics pipeline
Rendering diagram…

The same labeled counter, three ways:

// prom-client
const counter = new promClient.Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'path', 'status'],
});
counter.inc({ method: 'GET', path: '/api/users', status: 200 });

Key differences:

  • Micrometer “tags” are Prometheus “labels”; dotted metric names (http.requests.total) are normalized to underscores at the scrape endpoint.
  • You build meters against a MeterRegistry rather than a global; the registry is what gets wired to a backend.
TypePurposeExample
CounterMonotonically increasing valueTotal requests, errors, items processed
GaugeValue that goes up and downActive connections, queue size, memory usage
TimerDuration + count of eventsRequest latency, DB query time
Distribution SummaryDistribution of valuesRequest/response sizes
HistogramBucketed distributionRequest latency buckets for percentiles

Standalone Micrometer setup (without Spring Boot)

Section titled “Standalone Micrometer setup (without Spring Boot)”
import io.micrometer.core.instrument.MeterRegistry
import io.micrometer.prometheusmetrics.PrometheusConfig
import io.micrometer.prometheusmetrics.PrometheusMeterRegistry
fun main() {
// Create a Prometheus registry
val registry: PrometheusMeterRegistry = PrometheusMeterRegistry(PrometheusConfig.DEFAULT)
// Register metrics
val requestCount = registry.counter("app.requests.total", "endpoint", "/api/tasks")
val activeConnections = registry.gauge("app.connections.active", java.util.concurrent.atomic.AtomicInteger(0))
// Simulate some activity
requestCount.increment()
requestCount.increment()
activeConnections?.set(5)
// Scrape metrics in Prometheus text format
println(registry.scrape())
// Output:
// # HELP app_requests_total
// # TYPE app_requests_total counter
// app_requests_total{endpoint="/api/tasks"} 2.0
// # HELP app_connections_active
// # TYPE app_connections_active gauge
// app_connections_active 5.0
}
import io.micrometer.core.instrument.MeterRegistry
import io.micrometer.core.instrument.Counter
class TaskService(private val registry: MeterRegistry) {
private val tasksCreated = Counter.builder("tasks.created.total")
.description("Total number of tasks created")
.register(registry)
private val tasksFailed = Counter.builder("tasks.failed.total")
.description("Total number of task creation failures")
.register(registry)
fun createTask(title: String): Task {
try {
val task = repo.save(Task(title = title))
tasksCreated.increment()
return task
} catch (e: Exception) {
tasksFailed.increment()
throw e
}
}
}
import io.micrometer.core.instrument.Gauge
import java.util.concurrent.atomic.AtomicInteger
class ConnectionPool(registry: MeterRegistry) {
private val activeConnections = AtomicInteger(0)
private val pendingRequests = AtomicInteger(0)
init {
Gauge.builder("pool.connections.active", activeConnections) { it.toDouble() }
.description("Number of active connections")
.register(registry)
Gauge.builder("pool.requests.pending", pendingRequests) { it.toDouble() }
.description("Number of pending connection requests")
.register(registry)
}
fun acquire(): Connection {
pendingRequests.incrementAndGet()
try {
val conn = pool.borrow()
activeConnections.incrementAndGet()
return conn
} finally {
pendingRequests.decrementAndGet()
}
}
fun release(conn: Connection) {
pool.returnObject(conn)
activeConnections.decrementAndGet()
}
}
import io.micrometer.core.instrument.Timer
import io.micrometer.core.instrument.MeterRegistry
class UserRepository(private val registry: MeterRegistry) {
private val queryTimer = Timer.builder("db.query.duration")
.description("Database query execution time")
.tag("table", "users")
.publishPercentiles(0.5, 0.95, 0.99) // p50, p95, p99
.publishPercentileHistogram()
.register(registry)
fun findById(id: String): User? {
return queryTimer.record<User?> {
// Actual DB query
jdbcTemplate.queryForObject(
"SELECT * FROM users WHERE id = ?",
userRowMapper,
id
)
}
}
}
import io.micrometer.core.instrument.Timer
import kotlin.system.measureTimeMillis
class AsyncUserRepository(private val registry: MeterRegistry) {
private val queryTimer = Timer.builder("db.query.duration")
.tag("table", "users")
.register(registry)
suspend fun findById(id: String): User? {
val startTime = System.nanoTime()
try {
return suspendingQuery("SELECT * FROM users WHERE id = ?", id)
} finally {
val duration = System.nanoTime() - startTime
queryTimer.record(java.time.Duration.ofNanos(duration))
}
}
}
import io.micrometer.core.instrument.DistributionSummary
class PayloadMetrics(registry: MeterRegistry) {
private val requestSize = DistributionSummary.builder("http.request.size")
.description("HTTP request body size in bytes")
.baseUnit("bytes")
.publishPercentiles(0.5, 0.95, 0.99)
.register(registry)
fun recordRequestSize(sizeBytes: Long) {
requestSize.record(sizeBytes.toDouble())
}
}
class OrderMetrics(private val registry: MeterRegistry) {
fun recordOrderPlaced(amount: Double, currency: String) {
registry.counter(
"orders.placed.total",
"currency", currency
).increment()
registry.summary(
"orders.amount",
"currency", currency
).record(amount)
}
fun recordOrderFulfillmentTime(durationMs: Long) {
registry.timer("orders.fulfillment.duration")
.record(java.time.Duration.ofMillis(durationMs))
}
fun trackInventoryLevel(productId: String, level: () -> Double) {
Gauge.builder("inventory.level", level)
.tag("productId", productId)
.register(registry)
}
}
import io.micrometer.core.instrument.MeterRegistry
import io.micrometer.core.instrument.Timer
import jakarta.servlet.Filter
import jakarta.servlet.FilterChain
import jakarta.servlet.ServletRequest
import jakarta.servlet.ServletResponse
import jakarta.servlet.http.HttpServletRequest
import jakarta.servlet.http.HttpServletResponse
import org.springframework.stereotype.Component
@Component
class MetricsFilter(private val registry: MeterRegistry) : Filter {
override fun doFilter(request: ServletRequest, response: ServletResponse, chain: FilterChain) {
val httpRequest = request as HttpServletRequest
val httpResponse = response as HttpServletResponse
val sample = Timer.start(registry)
try {
chain.doFilter(request, response)
} finally {
sample.stop(
Timer.builder("http.server.requests")
.tag("method", httpRequest.method)
.tag("uri", normalizeUri(httpRequest.requestURI))
.tag("status", httpResponse.status.toString())
.register(registry)
)
}
}
private fun normalizeUri(uri: String): String {
// Replace path parameters with placeholders for lower cardinality
return uri.replace(Regex("/\\d+"), "/{id}")
}
}

Spring Boot Actuator provides production-ready features out of the box: health checks, metrics, info, and more.

build.gradle.kts
dependencies {
implementation("org.springframework.boot:spring-boot-starter-actuator")
implementation("io.micrometer:micrometer-registry-prometheus")
}
application.yml
management:
endpoints:
web:
exposure:
include: health,info,prometheus,metrics
endpoint:
health:
show-details: always
metrics:
tags:
application: task-api
distribution:
percentiles-histogram:
http.server.requests: true
percentiles:
http.server.requests: 0.5, 0.95, 0.99
EndpointPurpose
/actuator/healthHealth check (UP/DOWN)
/actuator/infoApplication info
/actuator/prometheusPrometheus scrape endpoint
/actuator/metricsList all metrics
/actuator/metrics/{name}Get specific metric
Terminal window
curl http://localhost:8080/actuator/health
{
"status": "UP",
"components": {
"db": {
"status": "UP",
"details": {
"database": "PostgreSQL",
"validationQuery": "isValid()"
}
},
"diskSpace": {
"status": "UP",
"details": {
"total": 499963174912,
"free": 389537574912,
"threshold": 10485760
}
},
"redis": {
"status": "UP",
"details": {
"version": "7.2.4"
}
}
}
}
Terminal window
curl http://localhost:8080/actuator/prometheus
# HELP http_server_requests_seconds Duration of HTTP server request handling
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{method="GET",status="200",uri="/api/tasks"} 42
http_server_requests_seconds_sum{method="GET",status="200",uri="/api/tasks"} 1.234
http_server_requests_seconds{method="GET",status="200",uri="/api/tasks",quantile="0.95"} 0.045
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{area="heap",id="G1 Eden Space"} 1.2345678E7
import io.micrometer.core.instrument.MeterRegistry
import org.springframework.stereotype.Service
@Service
class TaskService(
private val taskRepo: TaskRepository,
private val registry: MeterRegistry // Auto-injected by Spring
) {
fun createTask(request: CreateTaskRequest): Task {
val timer = registry.timer("task.creation.duration")
return timer.recordCallable {
val task = taskRepo.save(request.toEntity())
registry.counter("tasks.created", "priority", request.priority.name).increment()
task
}!!
}
fun getTaskStats(): TaskStats {
val total = taskRepo.count()
val completed = taskRepo.countByCompleted(true)
// Register gauges that track current state
registry.gauge("tasks.total", total)
registry.gauge("tasks.completed", completed)
return TaskStats(total = total, completed = completed)
}
}
import org.springframework.boot.actuate.health.Health
import org.springframework.boot.actuate.health.HealthIndicator
import org.springframework.stereotype.Component
@Component
class ExternalApiHealthIndicator(
private val externalApiClient: ExternalApiClient
) : HealthIndicator {
override fun health(): Health {
return try {
val response = externalApiClient.ping()
if (response.isSuccessful) {
Health.up()
.withDetail("externalApi", "reachable")
.withDetail("responseTime", "${response.durationMs}ms")
.build()
} else {
Health.down()
.withDetail("externalApi", "unhealthy")
.withDetail("status", response.statusCode)
.build()
}
} catch (e: Exception) {
Health.down()
.withDetail("externalApi", "unreachable")
.withDetail("error", e.message)
.build()
}
}
}

Distributed tracing tracks a request across multiple services. OpenTelemetry is the vendor-neutral standard.

A single trace ID threads through every service the request touches, with each service contributing its own span:

A trace across three services
Rendering diagram…
TermMeaning
TraceEnd-to-end request journey
SpanSingle unit of work within a trace
Trace IDUnique ID shared by all spans in a trace
Span IDUnique ID for a single span
Parent Span IDLinks child spans to parent spans
BaggageKey-value pairs propagated across services
build.gradle.kts
dependencies {
implementation(platform("io.opentelemetry:opentelemetry-bom:1.44.1"))
implementation("io.opentelemetry:opentelemetry-api")
implementation("io.opentelemetry:opentelemetry-sdk")
implementation("io.opentelemetry:opentelemetry-exporter-otlp")
implementation("io.opentelemetry:opentelemetry-semconv:1.29.0-alpha")
}

A parent span wraps the operation; child spans nest under it via makeCurrent():

import io.opentelemetry.api.OpenTelemetry
import io.opentelemetry.api.trace.Span
import io.opentelemetry.api.trace.StatusCode
import io.opentelemetry.api.trace.Tracer
import io.opentelemetry.context.Context
class OrderService(private val openTelemetry: OpenTelemetry) {
private val tracer: Tracer = openTelemetry.getTracer("order-service", "1.0.0")
fun processOrder(orderId: String): Order {
val span = tracer.spanBuilder("process-order")
.setAttribute("order.id", orderId)
.startSpan()
return try {
span.makeCurrent().use {
// Child span for validation
validateOrder(orderId)
// Child span for payment
processPayment(orderId)
// Child span for notification
sendConfirmation(orderId)
val order = Order(orderId, status = "completed")
span.setAttribute("order.status", "completed")
order
}
} catch (e: Exception) {
span.setStatus(StatusCode.ERROR, e.message ?: "Unknown error")
span.recordException(e)
throw e
} finally {
span.end()
}
}
private fun validateOrder(orderId: String) {
val span = tracer.spanBuilder("validate-order")
.startSpan()
try {
span.makeCurrent().use {
// validation logic
span.addEvent("Validation passed")
}
} finally {
span.end()
}
}
private fun processPayment(orderId: String) {
val span = tracer.spanBuilder("process-payment")
.setAttribute("payment.provider", "stripe")
.startSpan()
try {
span.makeCurrent().use {
// payment logic
span.addEvent("Payment captured", io.opentelemetry.api.common.Attributes.of(
io.opentelemetry.api.common.AttributeKey.stringKey("payment.id"), "pay_123"
))
}
} finally {
span.end()
}
}
private fun sendConfirmation(orderId: String) {
val span = tracer.spanBuilder("send-confirmation")
.startSpan()
try {
span.makeCurrent().use {
// email logic
}
} finally {
span.end()
}
}
}
import io.opentelemetry.api.OpenTelemetry
import io.opentelemetry.sdk.OpenTelemetrySdk
import io.opentelemetry.sdk.trace.SdkTracerProvider
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor
import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter
import io.opentelemetry.sdk.resources.Resource
import io.opentelemetry.api.common.Attributes
import io.opentelemetry.semconv.ResourceAttributes
fun configureOpenTelemetry(): OpenTelemetry {
val resource = Resource.getDefault().merge(
Resource.create(
Attributes.of(
ResourceAttributes.SERVICE_NAME, "task-api",
ResourceAttributes.SERVICE_VERSION, "1.0.0"
)
)
)
val spanExporter = OtlpGrpcSpanExporter.builder()
.setEndpoint("http://localhost:4317") // OTLP collector endpoint
.build()
val tracerProvider = SdkTracerProvider.builder()
.setResource(resource)
.addSpanProcessor(BatchSpanProcessor.builder(spanExporter).build())
.build()
val openTelemetry = OpenTelemetrySdk.builder()
.setTracerProvider(tracerProvider)
.buildAndRegisterGlobal()
// Shutdown hook to flush remaining spans
Runtime.getRuntime().addShutdownHook(Thread {
tracerProvider.shutdown()
})
return openTelemetry
}

The easiest approach — no code changes needed:

Terminal window
# Download the agent
curl -L -o opentelemetry-javaagent.jar \
https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar
# Run your app with the agent
java -javaagent:opentelemetry-javaagent.jar \
-Dotel.service.name=task-api \
-Dotel.exporter.otlp.endpoint=http://localhost:4317 \
-jar your-app.jar

The agent auto-instruments:

  • Spring MVC / WebFlux
  • JDBC (all queries)
  • HTTP clients (OkHttp, Apache HttpClient)
  • Kafka producer/consumer
  • Redis (Jedis, Lettuce)
  • gRPC
build.gradle.kts
dependencies {
implementation("io.micrometer:micrometer-tracing-bridge-otel")
implementation("io.opentelemetry:opentelemetry-exporter-otlp")
}
application.yml
management:
tracing:
sampling:
probability: 1.0 # 100% sampling (use lower in production)
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces

OpenTelemetry context is thread-local, similar to MDC. For coroutines, propagate the context:

import io.opentelemetry.context.Context
import io.opentelemetry.extension.kotlin.asContextElement
import kotlinx.coroutines.*
suspend fun processOrderAsync(orderId: String) {
val span = tracer.spanBuilder("process-order-async").startSpan()
try {
// Propagate OTel context to coroutine
withContext(span.makeCurrent().use { Context.current() }.asContextElement()) {
val result = async(Dispatchers.IO) {
// Context is preserved here
fetchOrderDetails(orderId)
}
result.await()
}
} finally {
span.end()
}
}

Add the Kotlin extension:

build.gradle.kts
dependencies {
implementation("io.opentelemetry:opentelemetry-extension-kotlin")
}
ProbePurposeSpring Boot Endpoint
Liveness”Is the process alive?”/actuator/health/liveness
Readiness”Can it accept traffic?”/actuator/health/readiness
Startup”Has it finished starting?”/actuator/health/liveness (with startup config)
application.yml
management:
endpoint:
health:
probes:
enabled: true
show-details: always
group:
liveness:
include: livenessState
readiness:
include: readinessState, db, redis
health:
livenessstate:
enabled: true
readinessstate:
enabled: true
k8s-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: task-api
spec:
template:
spec:
containers:
- name: task-api
image: task-api:latest
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30 # 30 * 5s = 150s max startup time
import org.springframework.boot.actuate.health.Health
import org.springframework.boot.actuate.health.HealthIndicator
import org.springframework.stereotype.Component
@Component("database")
class DatabaseHealthIndicator(
private val dataSource: javax.sql.DataSource
) : HealthIndicator {
override fun health(): Health {
return try {
dataSource.connection.use { conn ->
conn.prepareStatement("SELECT 1").use { stmt ->
stmt.executeQuery()
}
}
Health.up()
.withDetail("database", "PostgreSQL")
.withDetail("status", "connected")
.build()
} catch (e: Exception) {
Health.down(e)
.withDetail("database", "PostgreSQL")
.withDetail("error", e.message)
.build()
}
}
}
@Component("redis")
class RedisHealthIndicator(
private val redisTemplate: org.springframework.data.redis.core.StringRedisTemplate
) : HealthIndicator {
override fun health(): Health {
return try {
val pong = redisTemplate.connectionFactory?.connection?.ping()
Health.up()
.withDetail("redis", "connected")
.withDetail("ping", pong)
.build()
} catch (e: Exception) {
Health.down(e)
.withDetail("redis", "disconnected")
.build()
}
}
}

Non-Spring health checks (Ktor / plain Kotlin)

Section titled “Non-Spring health checks (Ktor / plain Kotlin)”
import io.ktor.server.application.*
import io.ktor.server.response.*
import io.ktor.server.routing.*
import io.ktor.http.*
import kotlinx.serialization.Serializable
@Serializable
data class HealthResponse(
val status: String,
val checks: Map<String, HealthCheck>
)
@Serializable
data class HealthCheck(
val status: String,
val details: Map<String, String> = emptyMap()
)
fun Application.configureHealthRoutes(
dataSource: javax.sql.DataSource,
redis: redis.clients.jedis.JedisPool
) {
routing {
get("/health") {
val dbHealth = checkDatabase(dataSource)
val redisHealth = checkRedis(redis)
val overallStatus = if (dbHealth.status == "UP" && redisHealth.status == "UP") "UP" else "DOWN"
val response = HealthResponse(
status = overallStatus,
checks = mapOf("db" to dbHealth, "redis" to redisHealth)
)
val httpStatus = if (overallStatus == "UP") HttpStatusCode.OK else HttpStatusCode.ServiceUnavailable
call.respond(httpStatus, response)
}
get("/health/live") {
call.respond(HttpStatusCode.OK, mapOf("status" to "UP"))
}
get("/health/ready") {
val dbHealth = checkDatabase(dataSource)
val status = if (dbHealth.status == "UP") HttpStatusCode.OK else HttpStatusCode.ServiceUnavailable
call.respond(status, mapOf("status" to dbHealth.status))
}
}
}
private fun checkDatabase(dataSource: javax.sql.DataSource): HealthCheck {
return try {
dataSource.connection.use { it.prepareStatement("SELECT 1").execute() }
HealthCheck("UP", mapOf("type" to "postgresql"))
} catch (e: Exception) {
HealthCheck("DOWN", mapOf("error" to (e.message ?: "unknown")))
}
}
private fun checkRedis(pool: redis.clients.jedis.JedisPool): HealthCheck {
return try {
pool.resource.use { it.ping() }
HealthCheck("UP")
} catch (e: Exception) {
HealthCheck("DOWN", mapOf("error" to (e.message ?: "unknown")))
}
}

The standard pull-based pipeline: your app exposes metrics, Prometheus scrapes them, Grafana queries Prometheus for dashboards.

Prometheus + Grafana stack
Rendering diagram…
docker-compose.yml
services:
app:
build: .
ports:
- "8080:8080"
environment:
- SPRING_PROFILES_ACTIVE=prod
- SPRING_DATASOURCE_URL=jdbc:postgresql://postgres:5432/taskdb
- SPRING_DATASOURCE_USERNAME=app
- SPRING_DATASOURCE_PASSWORD=app
- OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
depends_on:
- postgres
- redis
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: taskdb
POSTGRES_USER: app
POSTGRES_PASSWORD: app
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
redis:
image: redis:7-alpine
ports:
- "6379:6379"
prometheus:
image: prom/prometheus:v2.54.1
ports:
- "9090:9090"
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=7d'
grafana:
image: grafana/grafana:11.4.0
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_SECURITY_ADMIN_USER=admin
volumes:
- grafana-data:/var/lib/grafana
- ./config/grafana/provisioning:/etc/grafana/provisioning
volumes:
pgdata:
prometheus-data:
grafana-data:
config/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'task-api'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['app:8080']
labels:
application: 'task-api'
environment: 'docker'
config/grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
# Request rate (requests per second)
rate(http_server_requests_seconds_count[5m])
# 95th percentile latency
histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m]))
# Error rate (5xx responses)
rate(http_server_requests_seconds_count{status=~"5.."}[5m])
/ rate(http_server_requests_seconds_count[5m])
# JVM heap usage
jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"}
# Active threads
jvm_threads_live_threads
# Custom: tasks created per minute
rate(tasks_created_total[1m]) * 60

Full observability with the OpenTelemetry Collector

Section titled “Full observability with the OpenTelemetry Collector”

For production setups, use the OpenTelemetry Collector to receive, process, and export telemetry:

docker-compose.yml (additions)
otel-collector:
image: otel/opentelemetry-collector-contrib:0.114.0
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8889:8889" # Prometheus metrics exporter
volumes:
- ./config/otel-collector.yml:/etc/otelcol-contrib/config.yaml
jaeger:
image: jaegertracing/all-in-one:1.62
ports:
- "16686:16686" # Jaeger UI
- "14268:14268" # Jaeger collector
environment:
- COLLECTOR_OTLP_ENABLED=true
config/otel-collector.yml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
ConcernTypeScriptGoKotlin/JVM
Error handlingthrow / neverthrow Resulterror interface, explicit returnsSealed class Result hierarchies
Loggingwinston / pinoslog / zapSLF4J + Logback + kotlin-logging
Structured loggingpino JSONslog structuredlogstash-logback-encoder
Log contextAsyncLocalStoragecontext.ContextMDC (+ MDCContext for coroutines)
Metricsprom-clientprometheus/client_golangMicrometer + Prometheus
Tracing@opentelemetry/sdk-nodego.opentelemetry.io/otelOpenTelemetry Java/Kotlin
Health checkscustom endpointcustom endpointSpring Actuator
Metrics dashboardGrafanaGrafanaGrafana

Put the three pillars to work — wire up a real observability stack, then tighten your service layer’s error handling.