---
title: "Compact Before You Switch: My First Pi Extension"
description: "Switching to a smaller-context model mid-conversation can silently truncate everything you've loaded. My first Pi extension stops and compacts on the big model first - so you keep your work."
doc_version: "1"
last_updated: "2026-06-15"
date: 2026-06-15
tags: [AI, Agents, Agent Harness, Pi, Multi-Model, TypeScript]
canonical: "https://eran.sandler.co.il/post/2026-06-15-compact-before-you-switch-my-first-pi-extension/"
---

## Sitemap

- [Home](https://eran.sandler.co.il/)


![The pi-compact-before-switch extension catching a switch from MiniMax M3 to GLM-5.1 before the context overflows](/img/pi-compact-before-switch.jpg)

[Pi](https://pi.dev) lets you switch models mid-conversation. That's the feature. You start a session on one model, then switch to another with `/model` or by cycling with Ctrl+P - whichever model fits the next stretch of work best. Most of the time it just works.

Here's the time it doesn't.

You're deep in a session on MiniMax M3 and its roughly 1M-token context window. Your conversation has grown to 162,000 tokens - fine, plenty of headroom. Then you switch to GLM-5.1 because it's the model you want for the next part of the job. You're thinking about the model, not its context window - and GLM-5.1 tops out at 128,000 tokens. Your conversation no longer fits. Many providers won't warn you about this; they truncate silently. The model on the other side of the switch is now missing the front of your conversation, and you won't find out until it starts confidently forgetting things you told it ten minutes ago.

That's the exact failure my first Pi extension exists to prevent.

[`pi-compact-before-switch`](https://github.com/erans/pi-compact-before-switch) does one thing: when you switch to a model whose context window is smaller than what you currently have loaded, it stops and asks first. The screenshot above is it doing its job - catching a switch from MiniMax M3 (a 1,048,576-token window) down to GLM-5.1 (128,000) with 162,241 tokens already in play.

If you say yes, it reverts to the outgoing model - the big one, which can still see everything - compacts the conversation there, and *then* completes the switch to the smaller model. You arrive on the new model with a context that actually fits. If you cancel, it reverts the switch and leaves you exactly where you were. Nothing is lost either way.

It's deliberately narrow. It only fires when all of these are true:

- You switched via `/model`. Ctrl+P cycling and session restore pass through silently - no nagging.
- The target window is smaller than the source window.
- Your current context is close enough to overflow the target (within a 16K-token reserve).

If you're switching *up*, or you've got room to spare, you never see it. The whole point is to stay invisible until the one moment it matters.

A few things I cared about while building it:

- **No configuration.** There's nothing to set up and nothing to tune. The default behavior is the whole product: either it's what you want, or you uninstall it.
- **It can't wedge you.** If a compact ever hangs, a 30-second guard expires and `/model` goes back to working normally. A safety feature that can lock you out of switching models is worse than the problem it solves.
- **It's small and tested.** 31 tests on Node's built-in `node:test`, run straight off TypeScript with `--experimental-strip-types`. No build step.

If you use Pi:

```
pi install npm:pi-compact-before-switch
```

It registers on your next session start. MIT licensed, source on [GitHub](https://github.com/erans/pi-compact-before-switch).

A note on how it came together: I wrote this first extension with Pi itself, using MiniMax M3 - the same model holding the context in the screenshot above. Pi auto-loads any `.ts` file you drop into `extensions/`, no manifest required, so the whole thing took about an hour - barely more than writing the code.

I keep coming back to the same idea about agent harnesses: the model is a replaceable component, and the session is the thing worth protecting. This extension is that idea shrunk down to a single sharp edge. Switching models shouldn't quietly cost you your context. Compact first, then switch.