CLIs with Cobra
What you will learn: how a real Go codebase builds its command-line binaries with Cobra — the two distinct binary shapes (service daemons vs. subcommand tools), how command trees are wired with AddCommand, how pflag flags bind to viper config, the lifecycle hooks (PersistentPreRunE/PreRunE/RunE), and how a leaf subcommand actually reaches a gRPC service or a topology store.
Prerequisites: packages and imports (the package split and blank imports), interfaces & composition (struct embedding, struct-literal construction), errors (RunE returns wrapped errors instead of exiting), context (cmd.Context()), and stdlib & idioms (closures as handlers, the dependency-carrier struct). This builds on architecture & request flow, which is where these binaries sit.
We’ll use multigres (“Vitess for Postgres”) as the running example — every binary it ships is a clean illustration of one of two Cobra patterns.
The two binary shapes
Section titled “The two binary shapes”Everything you run lives under go/cmd/<name>/main.go. There are exactly two shapes, and knowing which one you’re looking at is the first thing to figure out.
Directorygo/cmd/
Directorymultigateway/ main.go — service daemon (single root command, no subcommands)
- …
Directorymultipooler/ main.go — service daemon
- …
Directorymultiorch/ main.go — service daemon
- …
Directorymultiadmin/ main.go — service daemon
- …
Directoryportpoolserver/ main.go — service daemon
- …
Directorymultigres/ main.go + command/ — subcommand tool (a tree of subcommands)
- …
Directorypgctld/ main.go + command/ — hybrid: subcommand tree whose
serverleaf is a daemon- …
The two shapes differ on every axis that matters:
| Service daemon | Subcommand tool | |
|---|---|---|
| Factory | CreateMultiGatewayCommand() (*cobra.Command, *Service) | GetRootCommand() *cobra.Command |
| Subcommands | none (Args: cobra.NoArgs) | a tree wired with AddCommand |
| Flags | service.RegisterFlags(cmd.Flags()) | per-command, plus a root PersistentFlags() |
| Config load | in PreRunE → service.CobraPreRunE | in the root’s PersistentPreRunE |
What RunE does | starts a long-running server | does one thing and returns |
The main.go pattern (subcommand tool)
Section titled “The main.go pattern (subcommand tool)”A tool’s main.go is deliberately tiny — build a command, execute it, set the exit code:
func main() { root := command.GetRootCommand() if err := root.Execute(); err != nil { os.Exit(1) //nolint:forbidigo // main() is allowed to call os.Exit }}Three things to notice:
Execute()drives everything. Cobra parsesos.Args, walks the command tree to find the invoked subcommand, runs the persistent and local hooks, then runs that command’sRunE. Any error from any of those bubbles back here.os.Exitlives only inmain. Command handlers return errors;maintranslates a non-nil error into exit code 1. The//nolint:forbidigocomment documents thatos.Exitis banned everywhere else by the linter — that ban is what forces handlers to useRunE. (See errors for why returning errors beats exiting.)- All the wiring is in a factory.
mainknows nothing about the command tree;GetRootCommandowns it.
GetRootCommand: the factory + dependency-carrier struct
Section titled “GetRootCommand: the factory + dependency-carrier struct”The factory builds the whole tree. The state the commands need is hung off a struct so every handler closure and subcommand can reach it — this is the dependency-carrier idiom: one struct of fields wired once, with methods and closures used as handlers (see stdlib & idioms).
// MultigresCommand holds the configuration for multigres commandstype MultigresCommand struct { reg *viperutil.Registry vc *viperutil.ViperConfig telemetry *telemetry.Telemetry}
func GetRootCommand() *cobra.Command { reg := viperutil.NewRegistry() mc := &MultigresCommand{ reg: reg, vc: viperutil.NewViperConfig(reg), telemetry: telemetry.NewTelemetry(), } // ... build root, attach hooks, AddCommand subtrees ...}The root cobra.Command carries identity (Use/Short/Long) plus the lifecycle hooks:
root := &cobra.Command{ Use: "multigres", Short: "The command-line companion for managing Multigres clusters", PersistentPreRunE: func(cmd *cobra.Command, args []string) error { // Silence usage for application errors, but allow it for flag errors. // This runs after flag parsing, so flag errors still show usage. cmd.SilenceUsage = true
viper.SetConfigName("multigres") if _, err := mc.vc.LoadConfig(mc.reg); err != nil { return err } span, err = mc.telemetry.InitForCommand(cmd, "multigres-cli", true) return err }, PersistentPostRunE: func(cmd *cobra.Command, args []string) error { span.End() // ... shut telemetry down with a timeout ... },}After construction, three more steps complete the root before subcommands are attached:
mc.vc.RegisterFlags(root.PersistentFlags()) // declare + bind config-path/config-name on the root
root.SetOut(os.Stdout) // cobra defaults command output to STDERR; we want STDOUTroot.SetErr(os.Stderr)
AddClusterCommand(root, mc) // nested: multigres cluster <leaf>AddTopoCommands(root, mc) // flat: multigres getgateways / createclustermetadataAddPoolerCommands(root, mc) // flat: multigres getpoolerstatusThe hook lifecycle, and the SilenceUsage trick
Section titled “The hook lifecycle, and the SilenceUsage trick”Cobra runs hooks in this order for the actually-invoked command chain.
flowchart TB PRE["PersistentPreRunE<br/>(nearest ancestor that defines one)<br/>root: load config + telemetry"] PREE["PreRunE<br/>(the invoked command)<br/>per-command validation"] RUN["RunE<br/>(the invoked command)<br/>the real work"] POST["PersistentPostRunE<br/>root: span.End + telemetry shutdown"] PRE --> PREE --> RUN --> POST
Two subtle but load-bearing facts:
SilenceUsageis set insidePersistentPreRunE, not at construction. Flag-parse errors happen before any hook runs, so a bad flag still prints the usage block. Once parsing succeeds and the hook runs,SilenceUsage = truemeans a later application error prints just the error, not a wall of usage text.- A child that defines its own
PersistentPreRunEoverrides the parent’s — Cobra does not merge them. The tools here deliberately define persistent hooks only on the root, so every subcommand inherits config loading and telemetry init. Add a subcommand with its ownPersistentPreRunEand you’d silently lose config loading.
AddCommand wiring: nested vs. flat
Section titled “AddCommand wiring: nested vs. flat”The root calls three Add* helpers. Each one decides the tree shape by how many AddCommand hops it makes.
Nested — multigres cluster init
Section titled “Nested — multigres cluster init”A nested group builds an intermediate command, registers its leaves on it, then attaches the group to the root:
func AddClusterCommand(root *cobra.Command, mc *MultigresCommand) { clusterCmd := &cobra.Command{ Use: "cluster", Short: "Manage cluster lifecycle", }
cluster.AddInitCommand(clusterCmd) cluster.AddStartCommand(clusterCmd) cluster.AddStopCommand(clusterCmd) // ... ~9 more ...
root.AddCommand(clusterCmd)}That’s two hops: root → clusterCmd → initCmd. clusterCmd itself has no RunE; it’s just a grouping node, so multigres cluster with no leaf prints help.
Flat — multigres getgateways
Section titled “Flat — multigres getgateways”A flat group attaches each leaf directly to the root — one hop, no grouping node:
func AddTopoCommands(root *cobra.Command, mc *MultigresCommand) { root.AddCommand(topo.AddGetCellCommand()) root.AddCommand(topo.AddGetGatewaysCommand()) root.AddCommand(topo.AddGetPoolersCommand()) root.AddCommand(topo.CreateClusterMetadataCommand()) // ... more flat leaves ...}Put together, the tree mirrors the directory layout under command/:
Directorymultigres/ (root, command/root.go)
Directorycluster/ (group, command/cluster.go)
- init (leaf, command/cluster/init.go)
- start / stop / status (leaves, command/cluster/*.go)
- getgateways (flat leaf, command/topo/getgateways.go)
- getpoolers / getcell / … (flat leaves, command/topo/*.go)
- createclustermetadata (flat leaf, command/topo/createclustermetadata.go)
- getpoolerstatus (flat leaf, command/pooler/status.go)
The directories command/cluster, command/topo, command/pooler, and command/admin are separate Go packages, each leaf in its own file. The admin package holds the shared gRPC client reused by the topo and pooler leaves.
Flags: declare, then bind to viper
Section titled “Flags: declare, then bind to viper”Cobra uses pflag, a POSIX-style superset of the stdlib flag package. There are two flag scopes:
cmd.Flags()— local to that command.root.PersistentFlags()— inherited by every descendant.
At execution time Cobra merges inherited flags into cmd.Flags(), so a leaf can read config-path (declared on the root) via cmd.Flags().GetStringSlice("config-path") even though it never declared it.
Declaration
Section titled “Declaration”Plain pflag calls declare flags with a name, default, and help string. Shorthands use the P suffix, and there are typed variants for every kind:
root.PersistentFlags().StringP("pg-database", "D", pc.pgDatabase.Default(), "...")root.PersistentFlags().IntP("timeout", "t", pc.timeout.Default(), "...")root.PersistentFlags().StringSlice("pg-initdb-sql-files", pc.pgInitdbSQLFiles.Default(), "...")Required flags use MarkFlagRequired:
_ = cmd.MarkFlagRequired("cell")_ = cmd.MarkFlagRequired("service-id")Binding to viper
Section titled “Binding to viper”A flag’s value and its config binding are separate steps. This codebase uses viperutil.Value[T] objects (configured via viperutil.Configure(...)) and ties them to flags with viperutil.BindFlags. A leaf shows the full sequence — configure the typed values, build the command, declare the matching pflags, then bind:
func AddInitCommand(clusterCmd *cobra.Command) { reg := viperutil.NewRegistry() icmd := &initCmd{ provisioner: viperutil.Configure(reg, "provisioner", viperutil.Options[string]{ Default: "local", FlagName: "provisioner", }), // backupPath, backupURL, region similarly ... }
cmd := &cobra.Command{Use: "init", /* ... */ RunE: icmd.runInit}
cmd.Flags().String("provisioner", icmd.provisioner.Default(), "...") cmd.Flags().String("backup-path", icmd.backupPath.Default(), "...") // ... declare the rest first ...
viperutil.BindFlags(cmd.Flags(), icmd.provisioner, icmd.backupPath, icmd.backupURL, icmd.region) // <-- bind LAST
clusterCmd.AddCommand(cmd)}Once bound, icmd.provisioner.Get() returns the value with precedence flag > env > config > default. (The full Configure/Registry/precedence mechanics live in config & viperutil; here we only care about the wiring order.)
How a subcommand reaches a service
Section titled “How a subcommand reaches a service”A leaf’s RunE is where it talks to the rest of the system. There are two paths.
Path 1 — gRPC via the shared admin client
Section titled “Path 1 — gRPC via the shared admin client”The cleanest example reads a flag, builds the client, calls the RPC with cmd.Context(), and renders the response as JSON:
func runGetGateways(cmd *cobra.Command, args []string) error { cellsFlag, err := cmd.Flags().GetString("cells") // ... split into []string cells ...
client, err := admin.NewClient(cmd) if err != nil { return err } defer client.Close()
response, err := client.GetGateways(cmd.Context(), &multiadminpb.GetGatewaysRequest{Cells: cells}) if err != nil { return fmt.Errorf("failed to get gateways: %w", err) }
jsonData, err := json.MarshalIndent(response, "", " ") // ... cmd.Print(string(jsonData)) return nil}The bridge from CLI to a gRPC stub is the admin package’s Conn. It embeds the generated client interface (struct embedding; see interfaces & composition) so client.GetGateways(...) resolves directly to the embedded MultiAdminServiceClient:
type Conn struct { multiadminpb.MultiAdminServiceClient // embedded: promotes all RPC methods onto Conn conn *grpc.ClientConn}
func NewClient(cmd *cobra.Command) (*Conn, error) { addr, err := GetServerAddress(cmd) if err != nil { return nil, err } conn, err := grpccommon.NewClient(addr, grpccommon.WithDialOptions(grpc.WithTransportCredentials(insecure.NewCredentials()))) // ... return &Conn{ MultiAdminServiceClient: multiadminpb.NewMultiAdminServiceClient(conn), conn: conn, }, nil}GetServerAddress resolves the address with the same flag-or-config precedence theme: the --admin-server flag wins, otherwise it reads multigres.yaml from a --config-path directory and computes localhost:<grpcPort> from the local provisioner config. The gRPC dial mechanics are covered in gRPC & protobuf.
A second client leaf adds two idioms worth copying — a per-RPC timeout layered on cmd.Context() (see context) and protojson to render enums as readable names:
ctx, cancel := context.WithTimeout(cmd.Context(), 10*time.Second)defer cancel()response, err := client.GetPoolerStatus(ctx, &multiadminpb.GetPoolerStatusRequest{ PoolerId: &clustermetadatapb.ID{Cell: cell, Name: serviceID},})// ...marshaler := protojson.MarshalOptions{Indent: " ", UseProtoNames: true}Path 2 — straight to the topology store
Section titled “Path 2 — straight to the topology store”Not every leaf goes through the admin service. The cluster-bootstrap leaf opens the topology store directly and writes metadata:
ts, err := topoclient.OpenServer("etcd", globalTopoRoot, []string{globalTopoAddress}, topoclient.NewDefaultTopoConfig())if err != nil { return fmt.Errorf("failed to connect to topology server: %w", err)}defer ts.Close()// ... createCell / provisionDatabase ...This is the “operator bootstrap” path — there’s no running gateway or admin service yet, so it talks to etcd directly. This leaf also reads its flags with raw cmd.Flags().GetString(...) and MarkFlagRequired(...) — no viper binding at all.
The service-daemon shape
Section titled “The service-daemon shape”A daemon has no command tree, no persistent flags, no GetRootCommand — just a factory that returns the command and the service object, with flags and config delegated to the service:
func CreateMultiGatewayCommand() (*cobra.Command, *multigateway.MultiGateway) { mg := multigateway.NewMultiGateway()
cmd := &cobra.Command{ Use: constants.ServiceMultigateway, // "multigateway" — name stays in sync via constants Args: cobra.NoArgs, PreRunE: func(cmd *cobra.Command, args []string) error { return mg.CobraPreRunE(cmd) // load config + register reload watchers }, RunE: func(cmd *cobra.Command, args []string) error { return run(cmd.Context(), mg) // start the daemon }, }
mg.RegisterFlags(cmd.Flags()) // service owns its flags return cmd, mg}
func run(ctx context.Context, mg *multigateway.MultiGateway) error { if err := mg.Init(ctx); err != nil { return err } return mg.RunDefault()}The other daemons follow the same pattern (CreateMultiPoolerCommand, CreateMultiOrchCommand, and so on). The Use: value comes from a shared constants package, so the binary name and the cobra command name never drift apart.
Where do the flags come from? Not from a persistent root — from the service’s own RegisterFlags, which declares the pflags, binds them, then delegates to sub-components:
func (mg *MultiGateway) RegisterFlags(fs *pflag.FlagSet) { fs.String("cell", mg.cell.Default(), "cell to use") fs.Int("pg-port", mg.pgPort.Default(), "PostgreSQL protocol listen port") // ... many more: Duration / Bool / Uint64 ... viperutil.BindFlags(fs, mg.cell, mg.serviceID, mg.pgPort /* ... */) mg.senv.RegisterFlags(fs) // servenv adds its flags too mg.grpcServer.RegisterFlags(fs) mg.topoConfig.RegisterFlags(fs)}And config loading? Not a PersistentPreRunE — it’s CobraPreRunE, which bottoms out in a shared servenv implementation:
func (sv *ServEnv) CobraPreRunE(cmd *cobra.Command) error { ch := make(chan struct{}) viperutil.NotifyConfigReload(sv.reg, ch) // hot-reload watcher go func() { for range ch { /* log new settings */ } }()
watchCancel, err := sv.vc.LoadConfig(sv.reg) if err != nil { return fmt.Errorf("%s: failed to read in config: %w", cmd.Name(), err) } sv.OnTerm(watchCancel) return nil}So the service-daemon flow is Execute → PreRunE (CobraPreRunE: load config + watchers) → RunE (run → Init → RunDefault). What Init/RunDefault/servenv actually do once RunE fires is the subject of service anatomy.
pgctld, the hybrid
Section titled “pgctld, the hybrid”pgctld is a subcommand tool (it has a GetRootCommand), but one of its leaves — server — is itself a daemon. The root declares all the persistent flags and binds them in one big BindFlags call, then wires the subcommands:
AddServerCommand(root, pc) // the daemon leaf (parallels the service binaries)AddInitCommand(root, pc)AddStartCommand(root, pc)// ... stop / restart / status / version / reloadMost leaves use a sub-struct plus a createCommand helper, with the PreRunE validates / RunE does work split:
func (s *PgCtlStartCmd) createCommand() *cobra.Command { cmd := &cobra.Command{ Use: "start", PreRunE: func(cmd *cobra.Command, args []string) error { return s.pgCtlCmd.validateInitialized(cmd, args) }, RunE: s.runStart, } return cmd}pgctld also shows a flag alias via SetGlobalNormalizationFunc, rewriting the deprecated --init-db-sql-file to --pg-initdb-sql-files.
Plugin registration via blank imports
Section titled “Plugin registration via blank imports”Each service cmd directory has a tiny file that does nothing but import a package for its side effects:
package main
import ( _ "github.com/multigres/multigres/go/common/plugins/topo")The blank _ import runs the package’s init() to register a topo backend, without creating a direct code dependency — the import-for-side-effects idiom. This is how topoclient.OpenServer("etcd", ...) later finds the etcd backend without cmd/ importing it explicitly.
Checkpoints
Section titled “Checkpoints”Why is SilenceUsage = true set inside PersistentPreRunE instead of when the command is constructed?
Flag-parse errors happen before any hook runs, so leaving usage visible at construction time means a bad flag still prints the helpful usage block. Setting SilenceUsage only after parsing succeeds (inside the hook) means application errors print just the error, while flag errors still show usage. Setting it at construction would hide usage even on bad flags.You add a new subcommand and give it its own PersistentPreRunE. What breaks?
Config loading and telemetry init silently stop happening for that command. Cobra does not merge persistent hooks — a child’s PersistentPreRunE overrides the parent’s. The tools rely on only the root defining persistent hooks so every leaf inherits config-load. Define PreRunE on the leaf instead, or call the root’s logic explicitly.Why does multigres getgateways work but multigres init fail?
getgateways is wired flat onto the root by AddTopoCommands (one AddCommand hop). init is wired nested under the cluster group by AddClusterCommand, so its real path is multigres cluster init. There is no top-level init.Why must viperutil.BindFlags be called after the cmd.Flags().String(…) declarations?
BindFlags panics if a Value maps to a flag that isn’t yet defined on the flag set. The flags must exist first, so every command declares its pflags, then calls BindFlags last.Continue to config & viperutil for the full mechanics behind viperutil.Configure, Value[T], BindFlags, the Registry, and the flag > env > config > default precedence this page only sketched at the command layer.