"It's Just a Skill File" (Famous Last Words)

its-just-another-skill-file

Skills like skills.sh (tiny text “how-to” files that steer an agent toward a task) feel harmless because they’re just instructions.

But that’s exactly why they can become an attack vector.

A skill file is basically executable intent:

it sets the agent’s assumptions (“trust this source”)
it defines the workflow (“run these steps”)
it can nudge boundaries (“skip confirmations”, “always do X”)

The tricky part is: this attack doesn’t have to come from external prompt injection at all.
Skills often live inside your environment (repo, dotfiles, shared templates, internal skill packs). If a malicious or compromised skill gets into that internal distribution path, it arrives with a “trusted” label by default.

And unlike one-off injections, skills can persist:

used across multiple projects
copied forward by templates
installed once and reused
surviving long after the original context is gone, with no obvious “reinstall” moment

So if skills are pulled from a repo, shared internally, copied from the internet, or composed dynamically… you’ve created a high-trust injection point that can quietly outlive the project that introduced it.

This isn’t theoretical:

Cisco’s team analyzed “community skills” for personal agents and showed how a skill can embed behavior that looks a lot like malware-by-instructions (data exfil patterns, unsafe actions, etc.)
Research explicitly calls out “Agent Skills” (markdown/text skill files) as enabling a new class of prompt injections—because attackers can hide malicious instructions inside long skill content or referenced scripts
Another large-scale study scanned tens of thousands of skills and found a meaningful fraction with vulnerability patterns, with skills that bundle executable scripts more likely to be risky

Takeaway: if your agent treats skill text as authoritative, then skill text is part of your security boundary.

Guardrails I’m adopting:

Treat skills like code: PR review, ownership, diff alerts
Pin to known commits / provenance (ideally signed)
Make skills immutable at runtime (no self-modifying “update your skill” loops)
Keep skill instructions separate from retrieved/untrusted content

We learned “config is code.”
Now it’s “prompts are code”… and skills are attack surface.

How are you governing skills in your agent stack today?