Making agents useful without handing them the keys

The honest objection to giving coding agents real authority is not "they will write bad code." It is "they will do something irreversible at 2am while I am not watching, because the friction between intent and effect is too low." That is a solvable problem, but only if the answer is more specific than "be careful."

The working stance I have landed on is read-broad, write-narrow. Let the agent inspect anything in a repo, plan deeply, draft aggressively. Gate the small set of operations that mutate shared state, publish externally, expose services, or are expensive to undo. The leverage stays high; the blast radius stays bounded.

What "the keys" actually means

The phrase sounds dramatic but the list is short. In day-to-day work the things worth gating are:

  • destructive shell on the local filesystem (rm -rf on paths above the project, dd, mkfs, fork bombs);
  • git operations that overwrite history that has already been shared (push --force without --force-with-lease, reset --hard on a remote-tracked branch, force-deleting branches);
  • access to secrets — both reading them (cat .env, printenv) and exfiltrating them (curl -d @.env, scp .env remote:);
  • anything that talks to production, paid APIs, or other people (kubectl apply on prod, money transfers, posting in shared chats);
  • publishing — package releases, public deploys, anything that ends up on archive.org if the rollback is delayed.

Nothing on that list requires intelligence to gate. They are pattern-matchable, and the cost of a hook that exits non-zero with an explanatory message is approximately nothing.

Where to give the agent more rope

The complement is just as important. The agent should not need to ask permission to:

  • read any file in the working repo, including ones with sensitive-sounding names that are actually templates (.env.example, secrets.example.yml);
  • run tests, formatters, linters, type checkers — even ones that touch a local database or spin up containers;
  • create branches, commit, stash, write to scratch directories, generate files;
  • edit any source file, including config and infrastructure-as-code, as long as the change is reversible by a normal git operation.

If the agent needs to ask before doing those things, the friction will eat all of the leverage and you will end up doing the work yourself, slower, while still typing "yes."

The two hooks that actually carry the weight

A pre-tool gate on shell commands blocks the destructive pattern set above before the command runs. The pattern matching is deliberately conservative — it errs toward false positives that the human resolves, not false negatives that the agent silently executes. The hook ships its reason to the agent so the agent can correct itself instead of looping on the same blocked command.

A pre-tool gate on file access blocks reads, writes, and shell-level exfiltration of well-known secret files. Templates and examples are allowed through explicitly. The default is "deny unknown sensitive-looking paths, log the attempt."

Both are short shell scripts. Both are version-controlled with the rest of the project setup. Neither requires the agent's cooperation to work, which is the only property that matters.

Stop hooks and the "done" claim

The more interesting hook runs at the end of a task, not the beginning. A Stop hook can refuse to let the agent declare success while the test suite is red, while the type checker disagrees, or while a build is broken. This converts "I'm done" from an assertion into an inspection. The agent can still talk about what it tried; it just cannot close the loop with a green checkmark while reality is yellow.

This single rule changes the experience of working with an agent more than any prompt-engineering trick. You stop reviewing the agent's confidence and start reviewing its output, because the cheap form of "done" is no longer available to it.

What stays human-owned

The agent picks neither the project, nor the framing, nor the public posture. Positioning decisions, what is worth showing, what should remain private, the privacy stance, the legal posture, and every public deployment go through a human. Not because the agent cannot draft these — it can — but because the costs of these decisions are not technical and the agent has no skin in any of them.

The cost of skipping all of this

The boundary work is not theoretical. The failure modes you are trying to avoid are not exotic — they are the same lost work, leaked secrets, broken builds, and silent regressions that every engineering team has been managing forever. Agents do not invent new categories of mistake; they just operate at typing speed. The interventions are the same interventions that work on humans (least privilege, audit trails, confirmation before destructive operations) and the cost of installing them once is small relative to the cost of a single bad afternoon.

Read broadly. Write narrowly. Gate the small set of operations whose blast radius is larger than the local repo. Refuse to let "done" mean "the agent stopped typing." That is most of the discipline. The rest is taste.

The same boundary logic appears in public-signal-private-surfaces and what SCADA practice teaches about AI systems.