We turned a QA platform’s AI agent into a reverse shell — without touching a line of its code.
AI agents are being wired into the software delivery pipeline: reading test failures, recommending fixes, and increasingly applying them through connected tooling. Every one of those agents is a new attack surface — and classic AppSec tooling was never built to inspect it.
In a black-box engagement against a leading QA automation platform’s free demo agent, DWAAR.AI achieved a complete compromise. We coerced its Test Failure Analysis Agent into recommending a reverse shell as a “fix” — delivered straight through the platform’s own MCP-based autofix channel. Apply that “fix” in your IDE or let CI/CD run it unattended, and you’ve opened full shell access to attacker-controlled infrastructure.
No stolen credentials. No modified source. No touching the agent’s weights or system prompt. We won purely by shaping the agent’s inputs — the same way a malicious dependency or poisoned test artifact would.
How we did it:
- Layered reconnaissance to map the agent’s decision boundary — letting every refusal leak guardrail behavior
- Multi-turn prompt injection where each step looks benign, but the accumulated state walks the agent to the objective
- A safe-words pivot: malicious intent described in sanctioned vocabulary — semantically identical to an attacker, lexically invisible to a keyword filter
The platform had guardrails. They blocked the obvious attempts. They simply didn’t assure safety — because they reasoned about keywords while the agent reasoned about intent.