Agents

Batch API is terrible for one agent. It might be great for a fleet.

What does an agent harness feel like when every model turn goes through Anthropic’s Batch API instead of the synchronous endpoint?

Batches are 50% off. For anyone burning real money on agents (eval suites, background subagents, anything that runs unattended), half-price tokens are the kind of number that makes you stop and squint. The trade is latency: batches are asynchronous, with up to a 24-hour processing window.

[Read More]

#AI #Agents #Anthropic #Batch API #LunaRoute #AgentSH

AI finding more bugs is a good thing

Anthropic’s Mythos announcement set off exactly the reaction you would expect: if models can now find serious bugs at scale, software security must be about to get much worse.

I think that reaction mixes up two things.

A bug is not automatically a vulnerability.

And even a vulnerability is not automatically exploitable.

That distinction matters even more in the agent era. AI is getting better at surfacing weird behavior in code. It can push edge cases, strange paths, and awkward combinations much faster than most human teams. That is not bad news. That is visibility.

[Read More]

#AI #Agents #Security #AgentSH #Watchtower

I Asked Codex to Reverse Engineer My Webcam

Codex reverse engineering a webcam

I have an Anker PowerConf C200 webcam.

It is a solid little webcam, and one of the nice things about it is that it supports a few genuinely useful settings, especially Field of View. You can change how wide the camera frames you, which is great if you want a tighter shot or a wider view of the room.

There was just one problem: I run Linux. Yes, really. And I have for a long time.

[Read More]

#AI #Agents #Codex #OpenAI #Linux #Reverse Engineering

Your Agent Can Run printenv (and Your Runtime Can't Stop It)

Tool calls are the postcard. Process trees are the trip.

Work-Bench’s post is one of the clearer attempts to name what’s happening: a new “agent runtime” layer built to execute, constrain, observe, and improve agent work at scale.

I mostly agree with that framing. Where I think it stops short is inside their “Constrain” pillar.

They explicitly define “Constrain” as “two things: identity and permissions.” That’s correct - but incomplete once you accept the premise of agents: they execute arbitrary code, spawn subprocess trees, and interact with the OS in ways that don’t map cleanly to “API permission checks.”

It Was the Shell, Damn It: Why I Built AgentSH

I had a rule in my control file: never run database migrations without explicit approval. The agent followed it perfectly - until it didn’t. During a long debugging session, it decided the schema was the root cause, wrote a forty-line inline Python script, connected directly to the database, and altered the table. It never “ran a migration.” It just spoke SQL through a different channel. The table was altered, the script was gone, and my harness log showed “executed python command.”

[Read More]

#AI #Agents #Security #Shell #AgentSH #LLM #Execution Layer Security #DevOps