its-just-another-skill-file

Skills like skills.sh (tiny text “how-to” files that steer an agent toward a task) feel harmless because they’re just instructions.

But that’s exactly why they can become an attack vector.

A skill file is basically executable intent:

  • it sets the agent’s assumptions (“trust this source”)
  • it defines the workflow (“run these steps”)
  • it can nudge boundaries (“skip confirmations”, “always do X”)

The tricky part is: this attack doesn’t have to come from external prompt injection at all.
Skills often live inside your environment (repo, dotfiles, shared templates, internal skill packs). If a malicious or compromised skill gets into that internal distribution path, it arrives with a “trusted” label by default.

And unlike one-off injections, skills can persist:

  • used across multiple projects
  • copied forward by templates
  • installed once and reused
  • surviving long after the original context is gone, with no obvious “reinstall” moment

So if skills are pulled from a repo, shared internally, copied from the internet, or composed dynamically… you’ve created a high-trust injection point that can quietly outlive the project that introduced it.

This isn’t theoretical:

  • Cisco’s team analyzed “community skills” for personal agents and showed how a skill can embed behavior that looks a lot like malware-by-instructions (data exfil patterns, unsafe actions, etc.)
  • Research explicitly calls out “Agent Skills” (markdown/text skill files) as enabling a new class of prompt injections—because attackers can hide malicious instructions inside long skill content or referenced scripts
  • Another large-scale study scanned tens of thousands of skills and found a meaningful fraction with vulnerability patterns, with skills that bundle executable scripts more likely to be risky

Takeaway: if your agent treats skill text as authoritative, then skill text is part of your security boundary.

Guardrails I’m adopting:

  • Treat skills like code: PR review, ownership, diff alerts
  • Pin to known commits / provenance (ideally signed)
  • Make skills immutable at runtime (no self-modifying “update your skill” loops)
  • Keep skill instructions separate from retrieved/untrusted content

We learned “config is code.”
Now it’s “prompts are code”… and skills are attack surface.

How are you governing skills in your agent stack today?