yothere splits cleanly into two halves that never blur: an interface that decides when you should look, and a brain that does the actual work. This page walks the architecture from that split down to the files on disk and the wire protocol between them.
The mental model
yothere is the interface and the attention router. It spawns work, advances it, ranks the fleet, and decides when to interrupt you. It never reads your calendar, writes a draft, or sends a message itself.
A brain does all of that. The brain is harness-agnostic: local Claude Code, a WebSocket daemon like ubob, or a remote stack at a wss:// endpoint. yothere knows nothing about a brain’s internals; it only speaks the Brain Protocol to it.
The payoff of this split: a fleet of background agents is only useful if you know which one needs you now. yothere is an attention router for the human, not a work router for the agents. Pull-on-glance by default, one rate-limited nudge, never a firehose, never an interruption of your focus thread.
The core loop
You hail a task. yothere spawns a thread. The runner advances it (the brain works). If the brain blocks on a decision, yothere pings you; your reply resumes the thread. Otherwise it finishes. Repeat until done.
Text version of the diagram: Hail, then Spawn thread, then Runner advances while the brain works. If the thread is Blocked, yothere Pings you, You reply, and the thread Resumes, feeding back into the runner. When nothing blocks, the thread reaches Done.
Threads are files
Everything about a thread lives in a directory on disk. Its live state is a status.json written atomically (an atomic rename), so a reader never sees a torn file. Its session id lets a brain resume its own context turn after turn. There is no database and no message bus in the hot path; the filesystem is the source of truth.
This matters because of the two ways yothere runs the same work.
There are two execution substrates:
- In-session. A governed agent tool running inside a host application. It can spawn threads, read the board, and reply, but it draws a hard line at outward actions and blocks for your approval.
- Headless. The runner loop running as a background service, advancing threads on its own and enforcing the cost caps.
The contract between the two is the files, not the process. Both write through the same thread store (thread status) and the same cost ledger. So the in-session tool and the headless runner can operate on one fleet without a shared process, a shared socket, or a lock dance beyond the per-thread worker lock. Start a thread in-session at your desk and the headless runner picks it up on its next tick.
Brains and the Brain Protocol
A brain implements Brain Protocol v1. The transport is a WebSocket carrying JSON-RPC 2.0 text frames (one JSON object per frame). yothere opens a fresh connection per turn and closes it when the turn ends, which makes barge-in structurally clean: closing the socket aborts the turn server-side.
The methods yothere calls on a brain:
hello. A handshake. yothere learns the brain’s name, protocol version, and capability flags, and gates on the version.streamSubscribe. Opens a streaming channel for the next prompt and returns a subscription id (consumed by exactly one turn).turn(also accepted asprompt). Runs one turn. The key parameter isconversation_id, the join key for context continuity: the brain owns context server-side keyed by this id, so the same id across turns and reconnects means the same conversation.sessionStatus(optional). Reads a conversation’s state without running a turn, so a client with no local thread files (a phone, a web cockpit) can render the fleet by asking the brain directly.cancelandclose. Abort an in-flight turn, and tear down a conversation. Both are best-effort; yothere also just closes the socket.
During a turn the brain streams back JSON-RPC notification frames (method: "stream"):
delta(required). Token text appended to the running answer.progress(optional). Tool activity, so the cockpit shows the brain working, not stalled.status(optional). The thread state (working,blocked,needs_input,done). This is what drives the attention router. A brain that never emitsstatusstill advances, but its threads cannot signal “I’m blocked, come look”.cost(optional). A spend report. Because yothere cannot meter tokens on the brain’s side, the cost cap is advisory for remote brains and relies on honestcostevents.
The turn ends on the first done: true stream frame or the id-bearing result answering the turn request. The smallest valid brain accepts streamSubscribe and prompt, then emits a delta and a terminal done; the bundled echo_brain is exactly that and is the conformance fixture. Full details and a minimal implementation are in Agent onboarding.
hello gate warns on mismatch.The runner and the attention router
The runner is the headless advance engine, one tick roughly every 30 seconds. Each tick it reaps finished workers, sweeps stale threads to stuck, checks the cost caps, asks the attention router which threads to advance (bounded by max concurrency and a per-thread cooldown), spawns one worker subprocess per chosen thread, and coalesces any new “needs your eyes” transitions into a single rate-limited nudge.
The attention router is the deterministic core the runner leans on. It is offline-first (no model call, no online learning), so it is fully unit-testable and never stalls on a network hop. It scores the fleet, recommends one focus thread, and protects a pinned focus so the thread you chose to concentrate on is never bumped. Its full behavior and the cost-cap numbers are in Core concepts.
Where it runs
By default yothere runs entirely on your machine, on your own Claude subscription. State lives under ~/.yothere (YOTHERE_HOME), with a legacy ~/.relay used as a fallback. The runner is a background service; the cockpit is served locally on 127.0.0.1:8767. Nothing leaves the box that you did not send.
Two run modes exist, selected by configuration:
- Local single-user (the default). No login. The cockpit and its live stream are reachable over loopback and your tailnet; the dangerous surface stays behind a bearer gate.
- Hosted multi-tenant. A login is required and each account is scoped to its own isolated home, so two logins see two separate fleets with no cross-tenant read.
The knobs for both modes live in Configuration.