Skip to main content
Compliance

1 in 4 MCP servers. August 2, 2026. What auditors will ask about your agents.

Help Net Security ran a single number at the top of a piece in April: one in four MCP servers opens AI agents to code execution risk. That number sat next to an unrelated calendar item — the EU AI Act enforcement deadline. The two now share a row in every European risk register.

Mauro MeddaCo-Founder & CTO · HikmaAI
8 min read

Help Net Security ran a single number at the top of a piece in April 2026: one in four MCP servers opens AI agents to code execution risk. That number sat next to an unrelated calendar item — August 2, 2026, the enforceable deadline for Annex III high-risk AI systems under the EU AI Act. If you operate in Europe, those two facts now share a row in your risk register.

MCP — the Model Context Protocol — has been a quiet success story since Anthropic introduced it. It is now the most common way an enterprise agent reaches an internal tool or data source. It is also, by design, a permission grant. The protocol lets a client tell a server: invoke this tool, with these arguments. The server runs it. That is the point, and that is the attack surface.

What the disclosures are actually saying

Daniel Smith, in a widely-circulated thread on X covering the OX Security MCP STDIO findings, put it operationally: 'These flaws allow unsanitized user-controlled input.' Read that sentence with an agent in mind. An agent passes a user's instruction — sometimes mediated, often not — into an MCP tool. If the tool's STDIO interface does not sanitize what arrives, a carefully shaped prompt becomes a carefully shaped shell command. That is command injection. It is one of the oldest vulnerability classes in the book, surfacing in one of the newest interfaces in the industry.

The Hacker News thread that captured the practitioner side of the conversation — 'How are you handling security for AI agents that use MCP tools?' — is the question I now hear at the end of every assessment conversation. The honest answer is that most teams are not handling it. They installed the server because a developer asked. They wired the agent because the workflow needed it. The threat model came later, if it came at all.

The shape of an MCP assessment

A real MCP assessment answers four questions about a server before the agent is allowed to use it.

  1. Step 01

    What tools does this server expose, and what is the schema of each? An MCP server with eight tools has eight ways an agent can act through it. Enumeration before connection.

  2. Step 02

    What permissions does each tool require, and what does it return? A read-only directory listing tool is a different risk class than a tool that writes to a database or executes a shell command.

  3. Step 03

    What input validation does the server perform on tool arguments? STDIO command injection, SQL injection, path traversal — the classic web vulnerabilities, surfacing in a tool interface designed for an agent.

  4. Step 04

    What identity does the server run as, and what blast radius does that identity have? An MCP server running as root on the same host as your production database is not a tool; it is a privilege boundary that an agent now controls.

The reason these four questions matter together is that an agent in production answers them implicitly every time it makes a tool call. The assessment makes those answers explicit before the call happens.

Why August 2 changes the urgency

The EU AI Act enforces Articles 8 through 15 on Annex III high-risk systems beginning August 2, 2026. The list of high-risk categories includes biometric identification, critical infrastructure, education, employment, essential services, law enforcement, migration, justice, and credit scoring. If your organization deploys an AI agent that participates in any of those workflows, the article you are about to spend most of your time on is Article 12 — automatic recording of events over the lifecycle of the system.

Article 12 is the article that translates most directly into an engineering requirement. It is, in effect, asking for a log. Not just any log — a log that supports traceability, post-market monitoring, and the ability to reconstruct a decision after the fact. For an AI agent, that means the prompt, the tool calls, the arguments, the responses, the model output, and the action that resulted. For an MCP-connected agent, it means each tool invocation has to be attributable, signed, and exportable.

The penalty for non-compliance under the EU AI Act tops out at thirty-five million euros or seven percent of worldwide annual turnover, whichever is greater. For a regulated organization that has shipped an agent without thinking about Article 12 explicitly, the conversation that surrounds those numbers is not a security conversation. It is a board conversation.

What an audit-defensible log actually looks like

If I were sitting across from an auditor in September and they opened with 'show me the log for the agent that approved this credit decision,' I would want the answer to be one query away. That means a few things on the engineering side.

  • The log is structured, not free-text. Each event is a record with a timestamp, an actor, a source, an action, and a result.
  • The log is signed cryptographically — Ed25519 in our implementation — so the record can be presented as evidence rather than as an assertion the auditor takes on trust.
  • The log captures the full tool-call chain for an agent decision: the prompt, the MCP server invoked, the arguments, the response, and the resulting action.
  • The log is exportable in a format an auditor can ingest — JSON, PDF, or CSV depending on the article — and is mapped to the article of the regulation that the export answers.

None of those four properties is conceptually novel. They have been requirements for high-assurance systems for decades. What is novel is that an AI agent now has to meet them, and the default model-provider logging does not. That gap is the work between now and August 2.

The four-hour answer

A third-party voice on AISPM put the auditor expectation in one sentence: 'Auditors need answers within four hours.' I am quoting them, not promising it. But the framing is the right one. The question is not whether the answer exists somewhere in your stack. The question is whether it is retrievable, signed, and mapped to the article in time to keep the conversation calm.

Hikma's job is to make sure the evidence is there before the question is asked. The MCP assessment lives at the boundary the agent crosses on the way to the tool. The Article 12-compatible logging captures what the agent did once it got there. The export converts both into a document an auditor can read. The platform sits where the agent meets the rest of your infrastructure, because that is the only place all of those questions can be answered at the same time.

One in four MCP servers, August 2, four-hour answers. Three numbers, one calendar. The work between them is the work of the next ninety days.

Written by

Mauro Medda

Co-Founder & CTO, HikmaAI

Request Demo

Stop hoping.
Start proving.

Request a 30-minute demo. We walk your team through the threat model for your specific agentic footprint — and what controlling it looks like.