Skip to content

CLIs with Cobra

What you will learn: how a real Go codebase builds its command-line binaries with Cobra — the two distinct binary shapes (service daemons vs. subcommand tools), how command trees are wired with AddCommand, how pflag flags bind to viper config, the lifecycle hooks (PersistentPreRunE/PreRunE/RunE), and how a leaf subcommand actually reaches a gRPC service or a topology store.

Prerequisites: packages and imports (the package split and blank imports), interfaces & composition (struct embedding, struct-literal construction), errors (RunE returns wrapped errors instead of exiting), context (cmd.Context()), and stdlib & idioms (closures as handlers, the dependency-carrier struct). This builds on architecture & request flow, which is where these binaries sit.

We’ll use multigres (“Vitess for Postgres”) as the running example — every binary it ships is a clean illustration of one of two Cobra patterns.


Everything you run lives under go/cmd/<name>/main.go. There are exactly two shapes, and knowing which one you’re looking at is the first thing to figure out.

  • Directorygo/cmd/
    • Directorymultigateway/ main.go — service daemon (single root command, no subcommands)
    • Directorymultipooler/ main.go — service daemon
    • Directorymultiorch/ main.go — service daemon
    • Directorymultiadmin/ main.go — service daemon
    • Directoryportpoolserver/ main.go — service daemon
    • Directorymultigres/ main.go + command/ — subcommand tool (a tree of subcommands)
    • Directorypgctld/ main.go + command/ — hybrid: subcommand tree whose server leaf is a daemon

The two shapes differ on every axis that matters:

Service daemonSubcommand tool
FactoryCreateMultiGatewayCommand() (*cobra.Command, *Service)GetRootCommand() *cobra.Command
Subcommandsnone (Args: cobra.NoArgs)a tree wired with AddCommand
Flagsservice.RegisterFlags(cmd.Flags())per-command, plus a root PersistentFlags()
Config loadin PreRunEservice.CobraPreRunEin the root’s PersistentPreRunE
What RunE doesstarts a long-running serverdoes one thing and returns

A tool’s main.go is deliberately tiny — build a command, execute it, set the exit code:

go/cmd/multigres/main.go
func main() {
root := command.GetRootCommand()
if err := root.Execute(); err != nil {
os.Exit(1) //nolint:forbidigo // main() is allowed to call os.Exit
}
}

Three things to notice:

  1. Execute() drives everything. Cobra parses os.Args, walks the command tree to find the invoked subcommand, runs the persistent and local hooks, then runs that command’s RunE. Any error from any of those bubbles back here.
  2. os.Exit lives only in main. Command handlers return errors; main translates a non-nil error into exit code 1. The //nolint:forbidigo comment documents that os.Exit is banned everywhere else by the linter — that ban is what forces handlers to use RunE. (See errors for why returning errors beats exiting.)
  3. All the wiring is in a factory. main knows nothing about the command tree; GetRootCommand owns it.

GetRootCommand: the factory + dependency-carrier struct

Section titled “GetRootCommand: the factory + dependency-carrier struct”

The factory builds the whole tree. The state the commands need is hung off a struct so every handler closure and subcommand can reach it — this is the dependency-carrier idiom: one struct of fields wired once, with methods and closures used as handlers (see stdlib & idioms).

go/cmd/multigres/command/root.go
// MultigresCommand holds the configuration for multigres commands
type MultigresCommand struct {
reg *viperutil.Registry
vc *viperutil.ViperConfig
telemetry *telemetry.Telemetry
}
func GetRootCommand() *cobra.Command {
reg := viperutil.NewRegistry()
mc := &MultigresCommand{
reg: reg,
vc: viperutil.NewViperConfig(reg),
telemetry: telemetry.NewTelemetry(),
}
// ... build root, attach hooks, AddCommand subtrees ...
}

The root cobra.Command carries identity (Use/Short/Long) plus the lifecycle hooks:

go/cmd/multigres/command/root.go
root := &cobra.Command{
Use: "multigres",
Short: "The command-line companion for managing Multigres clusters",
PersistentPreRunE: func(cmd *cobra.Command, args []string) error {
// Silence usage for application errors, but allow it for flag errors.
// This runs after flag parsing, so flag errors still show usage.
cmd.SilenceUsage = true
viper.SetConfigName("multigres")
if _, err := mc.vc.LoadConfig(mc.reg); err != nil {
return err
}
span, err = mc.telemetry.InitForCommand(cmd, "multigres-cli", true)
return err
},
PersistentPostRunE: func(cmd *cobra.Command, args []string) error {
span.End()
// ... shut telemetry down with a timeout ...
},
}

After construction, three more steps complete the root before subcommands are attached:

go/cmd/multigres/command/root.go
mc.vc.RegisterFlags(root.PersistentFlags()) // declare + bind config-path/config-name on the root
root.SetOut(os.Stdout) // cobra defaults command output to STDERR; we want STDOUT
root.SetErr(os.Stderr)
AddClusterCommand(root, mc) // nested: multigres cluster <leaf>
AddTopoCommands(root, mc) // flat: multigres getgateways / createclustermetadata
AddPoolerCommands(root, mc) // flat: multigres getpoolerstatus

The hook lifecycle, and the SilenceUsage trick

Section titled “The hook lifecycle, and the SilenceUsage trick”

Cobra runs hooks in this order for the actually-invoked command chain.

Cobra hook order
Rendering diagram…

Two subtle but load-bearing facts:

  • SilenceUsage is set inside PersistentPreRunE, not at construction. Flag-parse errors happen before any hook runs, so a bad flag still prints the usage block. Once parsing succeeds and the hook runs, SilenceUsage = true means a later application error prints just the error, not a wall of usage text.
  • A child that defines its own PersistentPreRunE overrides the parent’s — Cobra does not merge them. The tools here deliberately define persistent hooks only on the root, so every subcommand inherits config loading and telemetry init. Add a subcommand with its own PersistentPreRunE and you’d silently lose config loading.

The root calls three Add* helpers. Each one decides the tree shape by how many AddCommand hops it makes.

A nested group builds an intermediate command, registers its leaves on it, then attaches the group to the root:

go/cmd/multigres/command/cluster.go
func AddClusterCommand(root *cobra.Command, mc *MultigresCommand) {
clusterCmd := &cobra.Command{
Use: "cluster",
Short: "Manage cluster lifecycle",
}
cluster.AddInitCommand(clusterCmd)
cluster.AddStartCommand(clusterCmd)
cluster.AddStopCommand(clusterCmd)
// ... ~9 more ...
root.AddCommand(clusterCmd)
}

That’s two hops: rootclusterCmdinitCmd. clusterCmd itself has no RunE; it’s just a grouping node, so multigres cluster with no leaf prints help.

A flat group attaches each leaf directly to the root — one hop, no grouping node:

go/cmd/multigres/command/topo.go
func AddTopoCommands(root *cobra.Command, mc *MultigresCommand) {
root.AddCommand(topo.AddGetCellCommand())
root.AddCommand(topo.AddGetGatewaysCommand())
root.AddCommand(topo.AddGetPoolersCommand())
root.AddCommand(topo.CreateClusterMetadataCommand())
// ... more flat leaves ...
}

Put together, the tree mirrors the directory layout under command/:

  • Directorymultigres/ (root, command/root.go)
    • Directorycluster/ (group, command/cluster.go)
      • init (leaf, command/cluster/init.go)
      • start / stop / status (leaves, command/cluster/*.go)
    • getgateways (flat leaf, command/topo/getgateways.go)
    • getpoolers / getcell / … (flat leaves, command/topo/*.go)
    • createclustermetadata (flat leaf, command/topo/createclustermetadata.go)
    • getpoolerstatus (flat leaf, command/pooler/status.go)

The directories command/cluster, command/topo, command/pooler, and command/admin are separate Go packages, each leaf in its own file. The admin package holds the shared gRPC client reused by the topo and pooler leaves.


Cobra uses pflag, a POSIX-style superset of the stdlib flag package. There are two flag scopes:

  • cmd.Flags()local to that command.
  • root.PersistentFlags()inherited by every descendant.

At execution time Cobra merges inherited flags into cmd.Flags(), so a leaf can read config-path (declared on the root) via cmd.Flags().GetStringSlice("config-path") even though it never declared it.

Plain pflag calls declare flags with a name, default, and help string. Shorthands use the P suffix, and there are typed variants for every kind:

go/cmd/pgctld/command/root.go
root.PersistentFlags().StringP("pg-database", "D", pc.pgDatabase.Default(), "...")
root.PersistentFlags().IntP("timeout", "t", pc.timeout.Default(), "...")
root.PersistentFlags().StringSlice("pg-initdb-sql-files", pc.pgInitdbSQLFiles.Default(), "...")

Required flags use MarkFlagRequired:

go/cmd/multigres/command/pooler/status.go
_ = cmd.MarkFlagRequired("cell")
_ = cmd.MarkFlagRequired("service-id")

A flag’s value and its config binding are separate steps. This codebase uses viperutil.Value[T] objects (configured via viperutil.Configure(...)) and ties them to flags with viperutil.BindFlags. A leaf shows the full sequence — configure the typed values, build the command, declare the matching pflags, then bind:

go/cmd/multigres/command/cluster/init.go
func AddInitCommand(clusterCmd *cobra.Command) {
reg := viperutil.NewRegistry()
icmd := &initCmd{
provisioner: viperutil.Configure(reg, "provisioner", viperutil.Options[string]{
Default: "local", FlagName: "provisioner",
}),
// backupPath, backupURL, region similarly ...
}
cmd := &cobra.Command{Use: "init", /* ... */ RunE: icmd.runInit}
cmd.Flags().String("provisioner", icmd.provisioner.Default(), "...")
cmd.Flags().String("backup-path", icmd.backupPath.Default(), "...")
// ... declare the rest first ...
viperutil.BindFlags(cmd.Flags(), icmd.provisioner, icmd.backupPath,
icmd.backupURL, icmd.region) // <-- bind LAST
clusterCmd.AddCommand(cmd)
}

Once bound, icmd.provisioner.Get() returns the value with precedence flag > env > config > default. (The full Configure/Registry/precedence mechanics live in config & viperutil; here we only care about the wiring order.)


A leaf’s RunE is where it talks to the rest of the system. There are two paths.

Path 1 — gRPC via the shared admin client

Section titled “Path 1 — gRPC via the shared admin client”

The cleanest example reads a flag, builds the client, calls the RPC with cmd.Context(), and renders the response as JSON:

go/cmd/multigres/command/topo/getgateways.go
func runGetGateways(cmd *cobra.Command, args []string) error {
cellsFlag, err := cmd.Flags().GetString("cells")
// ... split into []string cells ...
client, err := admin.NewClient(cmd)
if err != nil {
return err
}
defer client.Close()
response, err := client.GetGateways(cmd.Context(),
&multiadminpb.GetGatewaysRequest{Cells: cells})
if err != nil {
return fmt.Errorf("failed to get gateways: %w", err)
}
jsonData, err := json.MarshalIndent(response, "", " ")
// ...
cmd.Print(string(jsonData))
return nil
}

The bridge from CLI to a gRPC stub is the admin package’s Conn. It embeds the generated client interface (struct embedding; see interfaces & composition) so client.GetGateways(...) resolves directly to the embedded MultiAdminServiceClient:

go/cmd/multigres/command/admin/client.go
type Conn struct {
multiadminpb.MultiAdminServiceClient // embedded: promotes all RPC methods onto Conn
conn *grpc.ClientConn
}
func NewClient(cmd *cobra.Command) (*Conn, error) {
addr, err := GetServerAddress(cmd)
if err != nil {
return nil, err
}
conn, err := grpccommon.NewClient(addr,
grpccommon.WithDialOptions(grpc.WithTransportCredentials(insecure.NewCredentials())))
// ...
return &Conn{
MultiAdminServiceClient: multiadminpb.NewMultiAdminServiceClient(conn),
conn: conn,
}, nil
}

GetServerAddress resolves the address with the same flag-or-config precedence theme: the --admin-server flag wins, otherwise it reads multigres.yaml from a --config-path directory and computes localhost:<grpcPort> from the local provisioner config. The gRPC dial mechanics are covered in gRPC & protobuf.

A second client leaf adds two idioms worth copying — a per-RPC timeout layered on cmd.Context() (see context) and protojson to render enums as readable names:

go/cmd/multigres/command/pooler/status.go
ctx, cancel := context.WithTimeout(cmd.Context(), 10*time.Second)
defer cancel()
response, err := client.GetPoolerStatus(ctx, &multiadminpb.GetPoolerStatusRequest{
PoolerId: &clustermetadatapb.ID{Cell: cell, Name: serviceID},
})
// ...
marshaler := protojson.MarshalOptions{Indent: " ", UseProtoNames: true}

Not every leaf goes through the admin service. The cluster-bootstrap leaf opens the topology store directly and writes metadata:

go/cmd/multigres/command/topo/createclustermetadata.go
ts, err := topoclient.OpenServer("etcd", globalTopoRoot, []string{globalTopoAddress},
topoclient.NewDefaultTopoConfig())
if err != nil {
return fmt.Errorf("failed to connect to topology server: %w", err)
}
defer ts.Close()
// ... createCell / provisionDatabase ...

This is the “operator bootstrap” path — there’s no running gateway or admin service yet, so it talks to etcd directly. This leaf also reads its flags with raw cmd.Flags().GetString(...) and MarkFlagRequired(...) — no viper binding at all.


A daemon has no command tree, no persistent flags, no GetRootCommand — just a factory that returns the command and the service object, with flags and config delegated to the service:

go/cmd/multigateway/main.go
func CreateMultiGatewayCommand() (*cobra.Command, *multigateway.MultiGateway) {
mg := multigateway.NewMultiGateway()
cmd := &cobra.Command{
Use: constants.ServiceMultigateway, // "multigateway" — name stays in sync via constants
Args: cobra.NoArgs,
PreRunE: func(cmd *cobra.Command, args []string) error {
return mg.CobraPreRunE(cmd) // load config + register reload watchers
},
RunE: func(cmd *cobra.Command, args []string) error {
return run(cmd.Context(), mg) // start the daemon
},
}
mg.RegisterFlags(cmd.Flags()) // service owns its flags
return cmd, mg
}
func run(ctx context.Context, mg *multigateway.MultiGateway) error {
if err := mg.Init(ctx); err != nil {
return err
}
return mg.RunDefault()
}

The other daemons follow the same pattern (CreateMultiPoolerCommand, CreateMultiOrchCommand, and so on). The Use: value comes from a shared constants package, so the binary name and the cobra command name never drift apart.

Where do the flags come from? Not from a persistent root — from the service’s own RegisterFlags, which declares the pflags, binds them, then delegates to sub-components:

go/services/multigateway/init.go
func (mg *MultiGateway) RegisterFlags(fs *pflag.FlagSet) {
fs.String("cell", mg.cell.Default(), "cell to use")
fs.Int("pg-port", mg.pgPort.Default(), "PostgreSQL protocol listen port")
// ... many more: Duration / Bool / Uint64 ...
viperutil.BindFlags(fs, mg.cell, mg.serviceID, mg.pgPort /* ... */)
mg.senv.RegisterFlags(fs) // servenv adds its flags too
mg.grpcServer.RegisterFlags(fs)
mg.topoConfig.RegisterFlags(fs)
}

And config loading? Not a PersistentPreRunE — it’s CobraPreRunE, which bottoms out in a shared servenv implementation:

go/common/servenv/servenv.go
func (sv *ServEnv) CobraPreRunE(cmd *cobra.Command) error {
ch := make(chan struct{})
viperutil.NotifyConfigReload(sv.reg, ch) // hot-reload watcher
go func() { for range ch { /* log new settings */ } }()
watchCancel, err := sv.vc.LoadConfig(sv.reg)
if err != nil {
return fmt.Errorf("%s: failed to read in config: %w", cmd.Name(), err)
}
sv.OnTerm(watchCancel)
return nil
}

So the service-daemon flow is ExecutePreRunE (CobraPreRunE: load config + watchers) → RunE (runInitRunDefault). What Init/RunDefault/servenv actually do once RunE fires is the subject of service anatomy.

pgctld is a subcommand tool (it has a GetRootCommand), but one of its leaves — server — is itself a daemon. The root declares all the persistent flags and binds them in one big BindFlags call, then wires the subcommands:

go/cmd/pgctld/command/root.go
AddServerCommand(root, pc) // the daemon leaf (parallels the service binaries)
AddInitCommand(root, pc)
AddStartCommand(root, pc)
// ... stop / restart / status / version / reload

Most leaves use a sub-struct plus a createCommand helper, with the PreRunE validates / RunE does work split:

go/cmd/pgctld/command/start.go
func (s *PgCtlStartCmd) createCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "start",
PreRunE: func(cmd *cobra.Command, args []string) error {
return s.pgCtlCmd.validateInitialized(cmd, args)
},
RunE: s.runStart,
}
return cmd
}

pgctld also shows a flag alias via SetGlobalNormalizationFunc, rewriting the deprecated --init-db-sql-file to --pg-initdb-sql-files.


Each service cmd directory has a tiny file that does nothing but import a package for its side effects:

go/cmd/multigateway/plugin_topo.go
package main
import (
_ "github.com/multigres/multigres/go/common/plugins/topo"
)

The blank _ import runs the package’s init() to register a topo backend, without creating a direct code dependency — the import-for-side-effects idiom. This is how topoclient.OpenServer("etcd", ...) later finds the etcd backend without cmd/ importing it explicitly.


Why is SilenceUsage = true set inside PersistentPreRunE instead of when the command is constructed? Flag-parse errors happen before any hook runs, so leaving usage visible at construction time means a bad flag still prints the helpful usage block. Setting SilenceUsage only after parsing succeeds (inside the hook) means application errors print just the error, while flag errors still show usage. Setting it at construction would hide usage even on bad flags.
You add a new subcommand and give it its own PersistentPreRunE. What breaks? Config loading and telemetry init silently stop happening for that command. Cobra does not merge persistent hooks — a child’s PersistentPreRunE overrides the parent’s. The tools rely on only the root defining persistent hooks so every leaf inherits config-load. Define PreRunE on the leaf instead, or call the root’s logic explicitly.
Why does multigres getgateways work but multigres init fail? getgateways is wired flat onto the root by AddTopoCommands (one AddCommand hop). init is wired nested under the cluster group by AddClusterCommand, so its real path is multigres cluster init. There is no top-level init.
Why must viperutil.BindFlags be called after the cmd.Flags().String(…) declarations? BindFlags panics if a Value maps to a flag that isn’t yet defined on the flag set. The flags must exist first, so every command declares its pflags, then calls BindFlags last.

Continue to config & viperutil for the full mechanics behind viperutil.Configure, Value[T], BindFlags, the Registry, and the flag > env > config > default precedence this page only sketched at the command layer.