The 1.0 candidate: ten releases of hardening
After the desktop app worked end to end, we ran a full code audit against the trusted base and then spent ten releases closing it - every one gated green (typecheck, the full test suite, a dependency-direction lint, and the CLI bundle) before it was committed. The final suite is 420 tests across 83 files. Nothing in this arc lowers the deny-by-default, fail-closed floor.
Hardening the core
The first four releases took the audit's findings to closure. Boundary containment now case- and Unicode-normalizes, so a case-varied path or a denied subtree still matches on Windows and macOS. The policy enforcement points re-check secret paths at execution time instead of trusting the decision alone. The sidecar builds its pending records from a strict allowlist, caps request bodies, checks the Host header, and writes its token file locked down.
Then the audit log itself. A torn final line - a partial write before a crash - is now healed on recovery and flagged, rather than throwing on boot; mid-file corruption is treated as tamper. A persisted head anchor catches tail truncation that a hash chain alone can't see, and the log rotates into chained segments so it verifies end to end across a reboot. Outbound calls to internal, loopback, or cloud-metadata hosts are denied by default, and the catastrophic-shell denylist got a bypass corpus.
Becoming embeddable
The bigger shift is that governance is now a surface you can drop into someone else's stack. @starfish/sdk runs the engine in-process; starfish serve is a loopback sidecar that gates tool calls for any language with server-assigned identity, so proposer never equals approver even over HTTP; and one sidecar can govern multiple roots with hard per-root isolation and per-root operator sets. A live SSE stream feeds a themeable dashboard whose payloads are redacted and scoped, and the engine never enters the browser bundle.
Operators also got a way to reason about policy: starfish policy explain gives the human-readable first-match reason for any request, and simulate is a dry-run that flags any change that would widen access - while always stating that the hard floors are not something a policy edit can weaken.
Freezing 1.0
Release engineering moved to CI: a secret scan and the full gate run on every push, and a tagged release publishes with npm provenance and an SBOM. The public API and the wire protocol are frozen behind conformance tests that act as the semver gate, documented alongside a control mapping to SOC 2, ISO 27001, and the EU AI Act. What remains for 1.0 is an independent security review and counsel sign-off on the commercial and trademark terms - deliberately the parts a machine shouldn't rubber-stamp.