CLIs with Cobra

What you will learn: how a real Go codebase builds its command-line binaries with Cobra — the two distinct binary shapes (service daemons vs. subcommand tools), how command trees are wired with AddCommand, how pflag flags bind to viper config, the lifecycle hooks (PersistentPreRunE/PreRunE/RunE), and how a leaf subcommand actually reaches a gRPC service or a topology store.

Prerequisites: packages and imports (the package split and blank imports), interfaces & composition (struct embedding, struct-literal construction), errors (RunE returns wrapped errors instead of exiting), context (cmd.Context()), and stdlib & idioms (closures as handlers, the dependency-carrier struct). This builds on architecture & request flow, which is where these binaries sit.

We’ll use multigres (“Vitess for Postgres”) as the running example — every binary it ships is a clean illustration of one of two Cobra patterns.

The two binary shapes

Everything you run lives under go/cmd/<name>/main.go. There are exactly two shapes, and knowing which one you’re looking at is the first thing to figure out.

Directorygo/cmd/
- Directorymultigateway/ main.go — service daemon (single root command, no subcommands)
  - …
- Directorymultipooler/ main.go — service daemon
  - …
- Directorymultiorch/ main.go — service daemon
  - …
- Directorymultiadmin/ main.go — service daemon
  - …
- Directoryportpoolserver/ main.go — service daemon
  - …
- Directorymultigres/ main.go + command/ — subcommand tool (a tree of subcommands)
  - …
- Directorypgctld/ main.go + command/ — hybrid: subcommand tree whose server leaf is a daemon
  - …

The two shapes differ on every axis that matters:

	Service daemon	Subcommand tool
Factory	`CreateMultiGatewayCommand() (cobra.Command, Service)`	`GetRootCommand() *cobra.Command`
Subcommands	none (`Args: cobra.NoArgs`)	a tree wired with `AddCommand`
Flags	`service.RegisterFlags(cmd.Flags())`	per-command, plus a root `PersistentFlags()`
Config load	in `PreRunE` → `service.CobraPreRunE`	in the root’s `PersistentPreRunE`
What `RunE` does	starts a long-running server	does one thing and returns

The `main.go` pattern (subcommand tool)

A tool’s main.go is deliberately tiny — build a command, execute it, set the exit code:

func main() {
  root := command.GetRootCommand()
  if err := root.Execute(); err != nil {
    os.Exit(1) //nolint:forbidigo // main() is allowed to call os.Exit
  }
}

Three things to notice:

Execute() drives everything. Cobra parses os.Args, walks the command tree to find the invoked subcommand, runs the persistent and local hooks, then runs that command’s RunE. Any error from any of those bubbles back here.
os.Exit lives only in main. Command handlers return errors; main translates a non-nil error into exit code 1. The //nolint:forbidigo comment documents that os.Exit is banned everywhere else by the linter — that ban is what forces handlers to use RunE. (See errors for why returning errors beats exiting.)
All the wiring is in a factory. main knows nothing about the command tree; GetRootCommand owns it.

`GetRootCommand`: the factory + dependency-carrier struct

The factory builds the whole tree. The state the commands need is hung off a struct so every handler closure and subcommand can reach it — this is the dependency-carrier idiom: one struct of fields wired once, with methods and closures used as handlers (see stdlib & idioms).

// MultigresCommand holds the configuration for multigres commands
type MultigresCommand struct {
  reg       *viperutil.Registry
  vc        *viperutil.ViperConfig
  telemetry *telemetry.Telemetry
}

func GetRootCommand() *cobra.Command {
  reg := viperutil.NewRegistry()
  mc := &MultigresCommand{
    reg:       reg,
    vc:        viperutil.NewViperConfig(reg),
    telemetry: telemetry.NewTelemetry(),
  }
  // ... build root, attach hooks, AddCommand subtrees ...
}

The root cobra.Command carries identity (Use/Short/Long) plus the lifecycle hooks:

root := &cobra.Command{
  Use:   "multigres",
  Short: "The command-line companion for managing Multigres clusters",
  PersistentPreRunE: func(cmd *cobra.Command, args []string) error {
    // Silence usage for application errors, but allow it for flag errors.
    // This runs after flag parsing, so flag errors still show usage.
    cmd.SilenceUsage = true

    viper.SetConfigName("multigres")
    if _, err := mc.vc.LoadConfig(mc.reg); err != nil {
      return err
    }
    span, err = mc.telemetry.InitForCommand(cmd, "multigres-cli", true)
    return err
  },
  PersistentPostRunE: func(cmd *cobra.Command, args []string) error {
    span.End()
    // ... shut telemetry down with a timeout ...
  },
}

After construction, three more steps complete the root before subcommands are attached:

mc.vc.RegisterFlags(root.PersistentFlags())   // declare + bind config-path/config-name on the root

root.SetOut(os.Stdout)   // cobra defaults command output to STDERR; we want STDOUT
root.SetErr(os.Stderr)

AddClusterCommand(root, mc)   // nested:  multigres cluster <leaf>
AddTopoCommands(root, mc)     // flat:    multigres getgateways / createclustermetadata
AddPoolerCommands(root, mc)   // flat:    multigres getpoolerstatus

The hook lifecycle, and the `SilenceUsage` trick

Cobra runs hooks in this order for the actually-invoked command chain.

Cobra hook order

Rendering diagram…

flowchart TB
PRE["PersistentPreRunE<br/>(nearest ancestor that defines one)<br/>root: load config + telemetry"]
PREE["PreRunE<br/>(the invoked command)<br/>per-command validation"]
RUN["RunE<br/>(the invoked command)<br/>the real work"]
POST["PersistentPostRunE<br/>root: span.End + telemetry shutdown"]
PRE --> PREE --> RUN --> POST

Two subtle but load-bearing facts:

SilenceUsage is set inside PersistentPreRunE, not at construction. Flag-parse errors happen before any hook runs, so a bad flag still prints the usage block. Once parsing succeeds and the hook runs, SilenceUsage = true means a later application error prints just the error, not a wall of usage text.
A child that defines its own PersistentPreRunE overrides the parent’s — Cobra does not merge them. The tools here deliberately define persistent hooks only on the root, so every subcommand inherits config loading and telemetry init. Add a subcommand with its own PersistentPreRunE and you’d silently lose config loading.

`AddCommand` wiring: nested vs. flat

The root calls three Add* helpers. Each one decides the tree shape by how many AddCommand hops it makes.

Nested — `multigres cluster init`

A nested group builds an intermediate command, registers its leaves on it, then attaches the group to the root:

func AddClusterCommand(root *cobra.Command, mc *MultigresCommand) {
  clusterCmd := &cobra.Command{
    Use:   "cluster",
    Short: "Manage cluster lifecycle",
  }

  cluster.AddInitCommand(clusterCmd)
  cluster.AddStartCommand(clusterCmd)
  cluster.AddStopCommand(clusterCmd)
  // ... ~9 more ...

  root.AddCommand(clusterCmd)
}

That’s two hops: root → clusterCmd → initCmd. clusterCmd itself has no RunE; it’s just a grouping node, so multigres cluster with no leaf prints help.

Flat — `multigres getgateways`

A flat group attaches each leaf directly to the root — one hop, no grouping node:

func AddTopoCommands(root *cobra.Command, mc *MultigresCommand) {
  root.AddCommand(topo.AddGetCellCommand())
  root.AddCommand(topo.AddGetGatewaysCommand())
  root.AddCommand(topo.AddGetPoolersCommand())
  root.AddCommand(topo.CreateClusterMetadataCommand())
  // ... more flat leaves ...
}

Put together, the tree mirrors the directory layout under command/:

Directorymultigres/ (root, command/root.go)
- Directorycluster/ (group, command/cluster.go)
  - init (leaf, command/cluster/init.go)
  - start / stop / status (leaves, command/cluster/*.go)
- getgateways (flat leaf, command/topo/getgateways.go)
- getpoolers / getcell / … (flat leaves, command/topo/*.go)
- createclustermetadata (flat leaf, command/topo/createclustermetadata.go)
- getpoolerstatus (flat leaf, command/pooler/status.go)

The directories command/cluster, command/topo, command/pooler, and command/admin are separate Go packages, each leaf in its own file. The admin package holds the shared gRPC client reused by the topo and pooler leaves.

Flags: declare, then bind to viper

Cobra uses pflag, a POSIX-style superset of the stdlib flag package. There are two flag scopes:

cmd.Flags() — local to that command.
root.PersistentFlags() — inherited by every descendant.

At execution time Cobra merges inherited flags into cmd.Flags(), so a leaf can read config-path (declared on the root) via cmd.Flags().GetStringSlice("config-path") even though it never declared it.

Declaration

Plain pflag calls declare flags with a name, default, and help string. Shorthands use the P suffix, and there are typed variants for every kind:

root.PersistentFlags().StringP("pg-database", "D", pc.pgDatabase.Default(), "...")
root.PersistentFlags().IntP("timeout", "t", pc.timeout.Default(), "...")
root.PersistentFlags().StringSlice("pg-initdb-sql-files", pc.pgInitdbSQLFiles.Default(), "...")

Required flags use MarkFlagRequired:

_ = cmd.MarkFlagRequired("cell")
_ = cmd.MarkFlagRequired("service-id")

Binding to viper

A flag’s value and its config binding are separate steps. This codebase uses viperutil.Value[T] objects (configured via viperutil.Configure(...)) and ties them to flags with viperutil.BindFlags. A leaf shows the full sequence — configure the typed values, build the command, declare the matching pflags, then bind:

func AddInitCommand(clusterCmd *cobra.Command) {
  reg := viperutil.NewRegistry()
  icmd := &initCmd{
    provisioner: viperutil.Configure(reg, "provisioner", viperutil.Options[string]{
      Default: "local", FlagName: "provisioner",
    }),
    // backupPath, backupURL, region similarly ...
  }

  cmd := &cobra.Command{Use: "init", /* ... */ RunE: icmd.runInit}

  cmd.Flags().String("provisioner", icmd.provisioner.Default(), "...")
  cmd.Flags().String("backup-path", icmd.backupPath.Default(), "...")
  // ... declare the rest first ...

  viperutil.BindFlags(cmd.Flags(), icmd.provisioner, icmd.backupPath,
    icmd.backupURL, icmd.region)   // <-- bind LAST

  clusterCmd.AddCommand(cmd)
}

Once bound, icmd.provisioner.Get() returns the value with precedence flag > env > config > default. (The full Configure/Registry/precedence mechanics live in config & viperutil; here we only care about the wiring order.)

Gotcha — BindFlags must come AFTER the pflag declarations

BindFlags panics if a Value maps to a flag that isn’t defined on the flag set:

// This function will panic if any of the values was configured to map to a flag
// which is not defined on the flag set.
func BindFlags(fs *pflag.FlagSet, values ...value.Registerable) { ... }

That panic is why every command declares its cmd.Flags().String(...) lines first and calls BindFlags(...) last. Reorder them and the binary panics on startup.

How a subcommand reaches a service

A leaf’s RunE is where it talks to the rest of the system. There are two paths.

Path 1 — gRPC via the shared admin client

The cleanest example reads a flag, builds the client, calls the RPC with cmd.Context(), and renders the response as JSON:

func runGetGateways(cmd *cobra.Command, args []string) error {
  cellsFlag, err := cmd.Flags().GetString("cells")
  // ... split into []string cells ...

  client, err := admin.NewClient(cmd)
  if err != nil {
    return err
  }
  defer client.Close()

  response, err := client.GetGateways(cmd.Context(),
    &multiadminpb.GetGatewaysRequest{Cells: cells})
  if err != nil {
    return fmt.Errorf("failed to get gateways: %w", err)
  }

  jsonData, err := json.MarshalIndent(response, "", "  ")
  // ...
  cmd.Print(string(jsonData))
  return nil
}

The bridge from CLI to a gRPC stub is the admin package’s Conn. It embeds the generated client interface (struct embedding; see interfaces & composition) so client.GetGateways(...) resolves directly to the embedded MultiAdminServiceClient:

type Conn struct {
  multiadminpb.MultiAdminServiceClient   // embedded: promotes all RPC methods onto Conn
  conn *grpc.ClientConn
}

func NewClient(cmd *cobra.Command) (*Conn, error) {
  addr, err := GetServerAddress(cmd)
  if err != nil {
    return nil, err
  }
  conn, err := grpccommon.NewClient(addr,
    grpccommon.WithDialOptions(grpc.WithTransportCredentials(insecure.NewCredentials())))
  // ...
  return &Conn{
    MultiAdminServiceClient: multiadminpb.NewMultiAdminServiceClient(conn),
    conn:                    conn,
  }, nil
}

GetServerAddress resolves the address with the same flag-or-config precedence theme: the --admin-server flag wins, otherwise it reads multigres.yaml from a --config-path directory and computes localhost:<grpcPort> from the local provisioner config. The gRPC dial mechanics are covered in gRPC & protobuf.

A second client leaf adds two idioms worth copying — a per-RPC timeout layered on cmd.Context() (see context) and protojson to render enums as readable names:

ctx, cancel := context.WithTimeout(cmd.Context(), 10*time.Second)
defer cancel()
response, err := client.GetPoolerStatus(ctx, &multiadminpb.GetPoolerStatusRequest{
  PoolerId: &clustermetadatapb.ID{Cell: cell, Name: serviceID},
})
// ...
marshaler := protojson.MarshalOptions{Indent: "  ", UseProtoNames: true}

Path 2 — straight to the topology store

Not every leaf goes through the admin service. The cluster-bootstrap leaf opens the topology store directly and writes metadata:

ts, err := topoclient.OpenServer("etcd", globalTopoRoot, []string{globalTopoAddress},
  topoclient.NewDefaultTopoConfig())
if err != nil {
  return fmt.Errorf("failed to connect to topology server: %w", err)
}
defer ts.Close()
// ... createCell / provisionDatabase ...

This is the “operator bootstrap” path — there’s no running gateway or admin service yet, so it talks to etcd directly. This leaf also reads its flags with raw cmd.Flags().GetString(...) and MarkFlagRequired(...) — no viper binding at all.

The service-daemon shape

A daemon has no command tree, no persistent flags, no GetRootCommand — just a factory that returns the command and the service object, with flags and config delegated to the service:

func CreateMultiGatewayCommand() (*cobra.Command, *multigateway.MultiGateway) {
  mg := multigateway.NewMultiGateway()

  cmd := &cobra.Command{
    Use:  constants.ServiceMultigateway,   // "multigateway" — name stays in sync via constants
    Args: cobra.NoArgs,
    PreRunE: func(cmd *cobra.Command, args []string) error {
      return mg.CobraPreRunE(cmd)          // load config + register reload watchers
    },
    RunE: func(cmd *cobra.Command, args []string) error {
      return run(cmd.Context(), mg)        // start the daemon
    },
  }

  mg.RegisterFlags(cmd.Flags())                // service owns its flags
  return cmd, mg
}

func run(ctx context.Context, mg *multigateway.MultiGateway) error {
  if err := mg.Init(ctx); err != nil {
    return err
  }
  return mg.RunDefault()
}

The other daemons follow the same pattern (CreateMultiPoolerCommand, CreateMultiOrchCommand, and so on). The Use: value comes from a shared constants package, so the binary name and the cobra command name never drift apart.

Where do the flags come from? Not from a persistent root — from the service’s own RegisterFlags, which declares the pflags, binds them, then delegates to sub-components:

func (mg *MultiGateway) RegisterFlags(fs *pflag.FlagSet) {
  fs.String("cell", mg.cell.Default(), "cell to use")
  fs.Int("pg-port", mg.pgPort.Default(), "PostgreSQL protocol listen port")
  // ... many more: Duration / Bool / Uint64 ...
  viperutil.BindFlags(fs, mg.cell, mg.serviceID, mg.pgPort /* ... */)
  mg.senv.RegisterFlags(fs)        // servenv adds its flags too
  mg.grpcServer.RegisterFlags(fs)
  mg.topoConfig.RegisterFlags(fs)
}

And config loading? Not a PersistentPreRunE — it’s CobraPreRunE, which bottoms out in a shared servenv implementation:

func (sv *ServEnv) CobraPreRunE(cmd *cobra.Command) error {
  ch := make(chan struct{})
  viperutil.NotifyConfigReload(sv.reg, ch)   // hot-reload watcher
  go func() { for range ch { /* log new settings */ } }()

  watchCancel, err := sv.vc.LoadConfig(sv.reg)
  if err != nil {
    return fmt.Errorf("%s: failed to read in config: %w", cmd.Name(), err)
  }
  sv.OnTerm(watchCancel)
  return nil
}

So the service-daemon flow is Execute → PreRunE (CobraPreRunE: load config + watchers) → RunE (run → Init → RunDefault). What Init/RunDefault/servenv actually do once RunE fires is the subject of service anatomy.

pgctld, the hybrid

pgctld is a subcommand tool (it has a GetRootCommand), but one of its leaves — server — is itself a daemon. The root declares all the persistent flags and binds them in one big BindFlags call, then wires the subcommands:

AddServerCommand(root, pc)   // the daemon leaf (parallels the service binaries)
AddInitCommand(root, pc)
AddStartCommand(root, pc)
// ... stop / restart / status / version / reload

Most leaves use a sub-struct plus a createCommand helper, with the PreRunE validates / RunE does work split:

func (s *PgCtlStartCmd) createCommand() *cobra.Command {
  cmd := &cobra.Command{
    Use: "start",
    PreRunE: func(cmd *cobra.Command, args []string) error {
      return s.pgCtlCmd.validateInitialized(cmd, args)
    },
    RunE: s.runStart,
  }
  return cmd
}

pgctld also shows a flag alias via SetGlobalNormalizationFunc, rewriting the deprecated --init-db-sql-file to --pg-initdb-sql-files.

Plugin registration via blank imports

Each service cmd directory has a tiny file that does nothing but import a package for its side effects:

package main

import (
  _ "github.com/multigres/multigres/go/common/plugins/topo"
)

The blank _ import runs the package’s init() to register a topo backend, without creating a direct code dependency — the import-for-side-effects idiom. This is how topoclient.OpenServer("etcd", ...) later finds the etcd backend without cmd/ importing it explicitly.

Checkpoints

Why is SilenceUsage = true set inside PersistentPreRunE instead of when the command is constructed?

Flag-parse errors happen before any hook runs, so leaving usage visible at construction time means a bad flag still prints the helpful usage block. Setting SilenceUsage only after parsing succeeds (inside the hook) means application errors print just the error, while flag errors still show usage. Setting it at construction would hide usage even on bad flags.

You add a new subcommand and give it its own PersistentPreRunE. What breaks?

Config loading and telemetry init silently stop happening for that command. Cobra does not merge persistent hooks — a child’s PersistentPreRunE overrides the parent’s. The tools rely on only the root defining persistent hooks so every leaf inherits config-load. Define PreRunE on the leaf instead, or call the root’s logic explicitly.

Why does multigres getgateways work but multigres init fail?

getgateways is wired flat onto the root by AddTopoCommands (one AddCommand hop). init is wired nested under the cluster group by AddClusterCommand, so its real path is multigres cluster init. There is no top-level init.

Why must viperutil.BindFlags be called after the cmd.Flags().String(…) declarations?

BindFlags panics if a Value maps to a flag that isn’t yet defined on the flag set. The flags must exist first, so every command declares its pflags, then calls BindFlags last.

Continue to config & viperutil for the full mechanics behind viperutil.Configure, Value[T], BindFlags, the Registry, and the flag > env > config > default precedence this page only sketched at the command layer.

CLIs with Cobra

The two binary shapes

The `main.go` pattern (subcommand tool)

`GetRootCommand`: the factory + dependency-carrier struct

The hook lifecycle, and the `SilenceUsage` trick

`AddCommand` wiring: nested vs. flat

Nested — `multigres cluster init`

Flat — `multigres getgateways`

Flags: declare, then bind to viper

Declaration

Binding to viper

How a subcommand reaches a service

Path 1 — gRPC via the shared admin client

Path 2 — straight to the topology store

The service-daemon shape

pgctld, the hybrid

Plugin registration via blank imports

Checkpoints

Next

See also

CLIs with Cobra

The two binary shapes

The main.go pattern (subcommand tool)

GetRootCommand: the factory + dependency-carrier struct

The hook lifecycle, and the SilenceUsage trick

AddCommand wiring: nested vs. flat

Nested — multigres cluster init

Flat — multigres getgateways

Flags: declare, then bind to viper

Declaration

Binding to viper

How a subcommand reaches a service

Path 1 — gRPC via the shared admin client

Path 2 — straight to the topology store

The service-daemon shape

pgctld, the hybrid

Plugin registration via blank imports

Checkpoints

Next

See also

The `main.go` pattern (subcommand tool)

`GetRootCommand`: the factory + dependency-carrier struct

The hook lifecycle, and the `SilenceUsage` trick

`AddCommand` wiring: nested vs. flat

Nested — `multigres cluster init`

Flat — `multigres getgateways`