A top-downloaded OpenClaw skill is actually a staged malware delivery chain
[D] We scanned 18,000 exposed OpenClaw instances and found 15% of community skills contain malicious instructions
Every OpenClaw security vulnerability documented in one place — relevant if you're running it with local models
If you're self-hosting OpenClaw, here's every documented security incident in 2026 — 6 CVEs, 824+ malicious skills, 42,000+ exposed instances, and what to do about it
Here we go! As expected by most of us here.
Jason Meller from 1password argues that OpenClaw’s agent “skills” ecosystem has already become a real malware attack surface. Skills in OpenClaw are typically markdown files that include setup instructions, commands, and bundled scripts. Because users and agents treat these instructions like installers, malicious actors can disguise malware as legitimate prerequisites.
Meller discovered that a top-downloaded OpenClaw skill (apparently Twitter integration) was actually a staged malware delivery chain. It guided users to run obfuscated commands that ultimately installed macOS infostealing malware capable of stealing credentials, tokens, and sensitive developer data. Subsequent reporting suggested this was part of a larger campaign involving hundreds of malicious skills, not an isolated incident.
The core problem is structural: agent skill registries function like app stores, but the “packages” are documentation that users instinctively trust and execute. Security layers like MCP don’t fully protect against this because malicious skills can bypass them through social engineering or bundled scripts. As agents blur the line between reading instructions and executing commands, they can normalize risky behavior and accelerate compromise.
Meller urges immediate caution: don’t run OpenClaw on company devices, treat prior use as a potential security incident, rotate credentials, and isolate experimentation. He calls on registry operators and framework builders to treat skills as a supply chain risk by adding scanning, provenance checks, sandboxing, and strict permission controls.
His conclusion is that agent ecosystems urgently need a new “trust layer” — with verifiable provenance, mediated execution, and tightly scoped, revocable permissions — so agents can act powerfully without exposing users to systemic compromise.
https://1password.com/blog/from-magic-to-malware-how-openclaws-agent-skills-become-an-attack-surface
I do security research and recently started looking at autonomous agents after OpenClaw blew up. What I found honestly caught me off guard. I knew the ecosystem was growing fast (165k GitHub stars, 60k Discord members) but the actual numbers are worse than I expected.
We identified over 18,000 OpenClaw instances directly exposed to the internet. When I started analyzing the community skill repository, nearly 15% contained what I'd classify as malicious instructions. Prompts designed to exfiltrate data, download external payloads, harvest credentials. There's also a whack-a-mole problem where flagged skills get removed but reappear under different identities within days.
On the methodology side: I'm parsing skill definitions for patterns like base64 encoded payloads, obfuscated URLs, and instructions that reference external endpoints without clear user benefit. For behavioral testing, I'm running skills in isolated environments and monitoring for unexpected network calls, file system access outside declared scope, and attempts to read browser storage or credential files. It's not foolproof since so much depends on runtime context and the LLM's interpretation. If anyone has better approaches for detecting hidden logic in natural language instructions, I'd really like to know what's working for you.
To OpenClaw's credit, their own FAQ acknowledges this is a "Faustian bargain" and states there's no "perfectly safe" setup. They're being honest about the tradeoffs. But I don't think the broader community has internalized what this means from an attack surface perspective.
The threat model that concerns me most is what I've been calling "Delegated Compromise" in my notes. You're not attacking the user directly anymore. You're attacking the agent, which has inherited permissions across the user's entire digital life. Calendar, messages, file system, browser. A single prompt injection in a webpage can potentially leverage all of these. I keep going back and forth on whether this is fundamentally different from traditional malware or just a new vector for the same old attacks.
The supply chain risk feels novel though. With 700+ community skills and no systematic security review, you're trusting anonymous contributors with what amounts to root access. The exfiltration patterns I found ranged from obvious (skills requesting clipboard contents be sent to external APIs) to subtle (instructions that would cause the agent to include sensitive file contents in "debug logs" posted to Discord webhooks). But I also wonder if I'm being too paranoid. Maybe the practical risk is lower than my analysis suggests because most attackers haven't caught on yet?
The Moltbook situation is what really gets me. An agent autonomously created a social network that now has 1.5 million agents. Agent to agent communication where prompt injection could propagate laterally. I don't have a good mental model for the failure modes here.
I've been compiling findings into what I'm tentatively calling an Agent Trust Hub doc, mostly to organize my own thinking. But the fundamental tension between capability and security seems unsolved. For those of you actually running OpenClaw: are you doing any skill vetting before installation? Running in containers or VMs? Or have you just accepted the risk because sandboxing breaks too much functionality?