Every agent action passes through one decision point — authorized on the way in, contained on the way out, written to a tamper-evident log. No task, no tool.
An AI agent with tools can read, write, delete, spend, and reach the network on your behalf — usually with nothing between its intent and your system. That's fine until it isn't.
"My agent deleted my whole drive." One bad path, no blast-radius check, no undo.
A web page or document says "ignore your instructions" — and the agent obeys.
Secrets and context leave through a tool call no one reviewed.
You install a skill pack. You have no idea what it can actually do.
Starfish isn't a wrapper that asks nicely. It's an isolated Policy Decision Point that brackets every transport — file I/O, shell, network, MCP — on the way in and the way out. Nothing reaches your system except through it, and it defaults to deny.
The agent never receives a raw handle. File I/O, shell, network and MCP calls are bracketed behind a proxy that hands the proposed call to the PDP first — there is no code path to a transport that bypasses it.
Each entry stores a hash of (prev_hash + payload). Editing or dropping any record breaks every link after it, so --verify recomputes the chain. Roots can be anchored to an external notary.
The PDP and its signed manifest load before any transport is wired. If the manifest doesn't verify, Starfish boots into safe mode and denies everything — there is no degraded “open” state.
Eight controls that don't depend on the agent's cooperation.
A single choke point. If no policy explicitly allows it, it doesn't happen.
Append-only, tamper-evident record of every decision — allow and deny.
Per-agent visibility and write scopes; escapes (incl. symlinks) are denied and flagged.
Soft-warn, hard-pause budgets so a runaway agent can't run up a bill.
Every skill/tool is risk-rated before it enters the registry. The only door in.
No unbacked word: a claim ("tests pass") is blocked unless the deed is on the record.
Operator-signed manifest; tampered config boots into safe mode and denies all.
Batch audit roots to a notary/ledger for institutions that need it. Off by default.
Governance contains blast radius — it is not a force field. Knowing exactly where the boundary sits is the point. Pair Starfish with the controls below it in the stack.
Starfish governs what the agent does — its tool calls — not how the model reasons. A jailbroken model still can't act outside policy, but the prompt itself isn't "sanitized."
It runs in your environment. For genuinely hostile code, keep it inside a container or VM — Starfish decides intent, the OS enforces the floor.
Injection still happens. What changes: the injected action hits a deny-by-default boundary, external data is tainted on egress, and the attempt is on the record — contained, not invisible.
The one legitimately ungoverned action is a human launching the system. Starfish constrains agents, not the person who holds the keys — that's a deliberate trust boundary, not a gap.
Starfish ships a small governed crew. Each has a narrow mandate — and no one is above the rules, including the captain.
Delegates and sequences. Holds no special tool powers — privilege is a role, not a bypass.
Breaks missions into tasks and drafts plans for approval.
The only door into the registry. Vets and risk-rates every new capability — and gatekeeps secrets.
Read-only sweeps, reconciled against deterministic counters. Reports; never blocks.
Evidence → claims → governed knowledge, provenance first.
The only agent allowed cleanup — soft, file-level, reversible. Hard rules block system files.
Does the work inside a write-scoped worktree; high-risk acts go to you for go/no-go.
Final authority. The one legitimate ungoverned action is a human launching the system.
Claude, OpenAI, Gemini, OpenRouter, or a local model — the rules don't change. Your API key is sealed in the OS keychain and is never placed in a request object, a log, an audit entry, or a skill's reach.
Free for personal and commercial use under Apache-2.0. Install the CLI, initialize a governed root, then bring any existing skill pack under governance.
A fail-closed governance core, a vetting intake, a hash-chained audit log, and an optional desktop mission-control app — GCS Starfish.