Back to blog
·by Patrick Hofmann

Blocking Was the Bug

Last weekend I thought my grant-secured shell was done. On Sunday I built a notification for the one remaining weak point. On Monday, during the first real end-to-end test, I found a whole set of problems — and realized that one of them wasn't a bug but a design decision I had made backwards. An article about hypotheses that all shared the same wrong assumption, and the moment you realize you're debugging at the wrong level.

OpenApeSystems DesignHuman in the LoopBuilding in PublicDebugging

On Saturday I thought the shell was done.

I had spent the weekend rebuilding ape-shell from an argv-rewriting wrapper into a real interactive shell: persistent bash over a pty bridge, marker-based prompt detection, grant integration directly in the REPL, per-session audit logging, install as login shell. Seven milestones, ten pull requests. The test suite was green, I was using it as my login shell in production, and I was convinced the thing was ready.

On Sunday I noticed one more thing that creaked architecturally. When an AI agent on the other end of the Telegram connection sends a command that needs a grant, the shell blocks in a polling loop waiting for approval. The user sees nothing about this in Telegram, because nobody tells them that something is waiting. So I built a notification component: when ape-shell enters the wait state, it calls a configured shell command, and that command can do whatever it wants — Telegram bot, macOS notification, say, ntfy, anything. Fire-and-forget, detached, ten-second kill timeout. Six unit tests plus one E2E test, committed Sunday evening. I thought: now the user also knows when to approve.

On Monday I used the whole thing end-to-end for the first time: openclaw on my machine at home, ape-shell as the login shell of the openclaw user, commands via Telegram, me sitting at the table with my phone in hand, two meters from the server. The setup was supposed to work in the simplest configuration before I tested anything harder.

What I found was a series of very different problems. One of them was fundamental enough that in the end I didn't fix a bug — I rethought a design decision backwards and recut half the model.

What Came Up During the First Test

  • My agent doesn't even use the shell for simple file operations. openclaw has built-in tools for reading, writing, editing — those run directly, never going through exec. ape-shell protects a layer that the agent doesn't enter for many tasks anyway. This isn't a bug, it's a category mismatch in my own security model.
  • Cache hits on already-approved grants run silently. When the agent executes a command for which the current session already has an approval, the behavior from outside is indistinguishable from "there is no shell at all": no ack, no log, nothing.
  • The approval flow itself was invisible. You'd see "Requesting grant for: ...", then "Approve at: ...", click in the browser, and then the command output just appeared. The one line "Grant approved, continuing" that makes the state flip visible was missing.
  • apes grants list from inside the REPL broke. If you type apes <subcommand> inside an interactive ape-shell session, you get ape-shell: unsupported invocation. Self-inspection impossible. The reason turned out to be very unpleasant — more on that shortly.
  • The Silent Agent Block. The core symptom: agent says in Telegram "please approve the grant," I approve in the browser, then nothing happens. After a while I have to type "confirmed" in the chat so the agent continues. The loop doesn't close on its own.
  • The REPL can enter an unrecoverable state without any way from inside to figure out what's broken or repair anything. Made worse by the broken apes grants list, because that also killed apes whoami as a last resort.
  • The diagnosis paradox. All my tools to inspect the shell live inside the shell. If the shell is broken, the diagnosis is broken too.

Most of these are visible UX and observability holes. Fixable. One — the Silent Agent Block — was something else.

My Hypotheses Were All Wrong, for the Same Reason

I got stuck on the Silent Agent Block because it felt like the hardest one. I wrote down several hypotheses, each with a reproduction test and a derived fix.

One: the LLM doesn't poll. When exec in openclaw returns to the agent after about half a minute with "Command still running, session X, use process tool for follow-up," the LLM should issue a process(action=poll, sessionId=X, timeout=...) as its next step. In my case it didn't — it passed the "please approve" message to Telegram and ended its turn. When the grant later arrived and the background process terminated, openclaw correctly called requestHeartbeatNow("exec-event"), but the agent still didn't wake up because it found nothing new in the user message queue. Fix: stronger hint in the yield result that forces the LLM to schedule the poll.

Another: the heartbeat wake targets the wrong session. Maybe the session key of the backgrounded exec run isn't identical to the session key of the Telegram-bound agent session — then the wake fires into the void. Fix: repair session key mapping.

And a third: ape-shell doesn't terminate cleanly after approval. Maybe the grant wait loop ran through to approval, but the shell child stayed in a state afterward that openclaw can't see as "exit." Fix: correct exit semantics in the grant dispatcher.

I wrote the plan, looked at it, and then realized the hypotheses all do the same thing. They ask how to get the waiting process to correctly wake up the agent. None of them asks why there's waiting in the first place. They take "shell blocks until grant is approved" as given and try to fix the waking afterward.

This is where I spent a while looking in the wrong corner before the penny dropped. Blocking was the bug. Not the timeout, not the heartbeat, not the session key. The wait itself.

Why Waiting Was the Wrong Primitive

My original design felt like normal bash: you fire a command, it runs, it terminates, you get an exit code. The grant flow inserted itself as a "step before execution" that the shell synchronously waits through. For a human at the terminal, that's correct. The human sits there, clicks the approval URL, comes back, sees the output. A second, maybe five. No problem.

For an AI agent that communicates with me via Telegram while I'm potentially somewhere else entirely, that's exactly the wrong semantics. The agent just sent a message, "please approve the grant." I'm somewhere — in the same room, in another room, in a conversation, in another task. I might approve immediately, or in two minutes, or this evening. In the meantime the agent should not block. It should be able to handle other requests, respond to other users, work on parallel tasks. The waiting isn't just cosmetically unpleasant — it's architecturally the wrong semantics for an asynchronous human-in-the-loop.

The right default for a grant-secured command is fire, announce, exit 0. apes run fires the grant request, prints the ID and approval URL to stdout, fires the configured notification out-of-band, and exits immediately with exit code 0. The agent does other things. I approve in the browser when I get around to it. Later the agent calls apes grants run <id> and retrieves the actual command result.

Two steps instead of one, yes — but two steps with a natural handoff point where the agent doesn't have to wait and the human doesn't have to be punctual.

This landed last night as @openape/apes@0.9.0 on npm. apes run and ape-shell -c are now non-blocking by default. Blocking remains available as opt-in: --wait on the command line or APE_WAIT=1 as an environment variable. CI scripts that still want to wait for the actual command's exit code can do exactly that with the flag. The interactive REPL (ape-shell without -c) stays unchanged, because there an actual human sits at the prompt who can and should wait.

Together with the pending notification from the previous day's version, this creates a new pattern:

  1. Agent fires apes run -- curl https://example.com
  2. apes creates the grant, prints grant ID, approval URL and execution hint, calls the notification command, exit 0
  3. Agent continues working
  4. I see the notification on my phone, approve in the browser
  5. When the agent is ready, it calls apes grants run <id> and gets the output
  6. No step in this blocks anything

The Silent Agent Block problem isn't fixed by this. It no longer exists. There's no block where something could silently hang, because there's no block at all.

What Else Was Fixed, in Order of Learning Value

In parallel with the 0.9.0 redesign, I fixed the other issues in an earlier-landed release — @openape/apes@0.8.0, three PRs.

The broken apes grants list in the REPL. The root cause wasn't what I expected. I had assumed "argv parsing bug or dispatch rule wrong" and was prepared to debug in the REPL command handler. The actual reason was a leaking environment marker. ape-shell internally sets APES_SHELL_WRAPPER=1 so the CLI knows "I was invoked as ape-shell." This env var was then inherited through the pty bridge to the bash child, and from there to every apes subcommand called within it. The nested apes saw the marker and thought it was an ape-shell invocation, couldn't find the subcommand args in the ape-shell argv schema, and threw unsupported invocation. The fix is one line: destructure the marker out of the environment before passing it to bash at pty spawn. The hard part wasn't the fix — it was that I was looking at the wrong level: dispatch logic instead of environment inheritance.

The invisible cache hits and approvals are two deliberate consola.info lines in the grant dispatcher. Trivial fixes — I had simply forgotten them during initial construction because I was thinking about function, not observability. A working system is not the same as an observable system, and you almost always notice this only at the moment you want to observe.

REPL recovery and external health probe are a few new meta-commands (:help, :status, :reset) in the REPL, plus a new subcommand apes health that runs standalone from any shell and outputs the complete auth and config state. With the latter I circumvented the diagnosis paradox. I can now check from outside whether my shell is healthy without having to enter the potentially broken REPL.

The shell bypass I deliberately didn't fix. More on that in a moment.

All of this is 0.8.0, with some shipping hurdles along the way, but live in the end.

What I Take Away

What I can't stop thinking about is the moment when I had written down several hypotheses and none of them were right — because they all shared the same assumption.

This is the kind of mistake you make when you take an existing design as given and only search for bugs within it. The hypotheses were locally well-reasoned — each one, had it been correct, would have led to a clean fix. But locally correct isn't enough when the wrong assumption sits one level above.

If during debugging you need multiple parallel hypotheses that all presuppose the same default behavior, stop and ask whether that default behavior is even correct. Multiple simultaneous hypotheses are a stronger signal for an architecture problem than for a bug. A bug usually has exactly one plausible cause. An architecture problem has several — and each one locally looks like a bug.


Open ending. The agent bypasses my grant-secured shell entirely for many simple operations, because it has built-in tools that work directly on the filesystem. This isn't a bug. It's a structural property of modern tool-based agent frameworks, and it doesn't disappear by fixing another loop. It raises the question what a grant-secured shell is even worth when the agent no longer needs a shell for most actions.

I don't have a good answer to this question yet. It's not solvable by a release. It's the topic of the coming weeks, and I'm honestly not sure whether it leads to a feature decision, an architecture decision, or an entirely different product. I still prefer it to a problem where I already know what I'm going to do — because it teaches me something while I think about it.

If any of you have encountered a similar pattern while debugging — where you realized your hypotheses all rested on the same wrong assumption — feel free to write to me. I'm currently collecting examples, not for another post, but because I want to understand the pattern better.


@openape/apes@0.9.0 has been live on npm since last night. The code is at github.com/openape-ai/openape. The previous two blog articles tell how the shell came about and how the pty bridge that gives it life works.