
Skills like skills.sh (tiny text “how-to” files that steer an agent toward a task) feel harmless because they’re just instructions.
But that’s exactly why they can become an attack vector.
A skill file is basically executable intent:
- it sets the agent’s assumptions (“trust this source”)
- it defines the workflow (“run these steps”)
- it can nudge boundaries (“skip confirmations”, “always do X”)
The tricky part is: this attack doesn’t have to come from external prompt injection at all.
Skills often live inside your environment (repo, dotfiles, shared templates, internal skill packs). If a malicious or compromised skill gets into that internal distribution path, it arrives with a “trusted” label by default.
And unlike one-off injections, skills can persist:
- used across multiple projects
- copied forward by templates
- installed once and reused
- surviving long after the original context is gone, with no obvious “reinstall” moment
So if skills are pulled from a repo, shared internally, copied from the internet, or composed dynamically… you’ve created a high-trust injection point that can quietly outlive the project that introduced it.
This isn’t theoretical:
- Cisco’s team analyzed “community skills” for personal agents and showed how a skill can embed behavior that looks a lot like malware-by-instructions (data exfil patterns, unsafe actions, etc.)
- Research explicitly calls out “Agent Skills” (markdown/text skill files) as enabling a new class of prompt injections—because attackers can hide malicious instructions inside long skill content or referenced scripts
- Another large-scale study scanned tens of thousands of skills and found a meaningful fraction with vulnerability patterns, with skills that bundle executable scripts more likely to be risky
Takeaway: if your agent treats skill text as authoritative, then skill text is part of your security boundary.
Guardrails I’m adopting:
- Treat skills like code: PR review, ownership, diff alerts
- Pin to known commits / provenance (ideally signed)
- Make skills immutable at runtime (no self-modifying “update your skill” loops)
- Keep skill instructions separate from retrieved/untrusted content
We learned “config is code.”
Now it’s “prompts are code”… and skills are attack surface.
How are you governing skills in your agent stack today?