MCP Builder Security Risks May 2026 – Improving MCP Security

We Audited Our Own MCP Server Security Risks and Found Four Things We Missed

A technical account of what we found, what we fixed, and how to earn the AI SAFE2 MCP badge.

MCP Security – Executive Summary For Developers and Builders

This article documents the complete security audit of the AI SAFE2 model context protocol (MCP) server following the OX Security April 2026 disclosure, the four specific vulnerabilities found in our own codebase, the fixes applied, and the 137 tests now confirming correct behavior. It then provides a complete step-by-step guide for any MCP server builder to apply the same process, earn the AI SAFE2 MCP badge, and implement the full CP.5.MCP control profile and security controls.

Four vulnerabilities were identified. Finding 1 (Moderate): no output sanitization on tool returns, creating a supply chain injection path to LLM clients. Finding 2 (Low-Moderate): STDIO transport granted Pro access without any identity binding. Finding 3 (Low): rate limiting was declared in pyproject.toml but never wired into the application. Finding 4 (High): the HTTP tier authentication was completely broken at the data flow level, causing every HTTP-transport tool call to silently return free-tier responses regardless of token. Finding 4 was not identified by external security assessment. It required tracing the full request flow from middleware to ContextVar to tool.

The deeper lesson: the OX disclosure is a supply chain event, not a bug report. Builders who audit only against the OX CVE list will miss internal architectural failures that have equal or greater operational impact. The MCP Security Toolkit (mcp-score, mcp-scan, mcp-safe-wrap) provides systematic coverage of both external threat classes and internal implementation patterns.

Source code: github.com/CyberStrategyInstitute/ai-safe2-framework/tree/main/skills/mcp

We built the AI SAFE2 MCP server to connect Claude Code, Codex, and any MCP-compatible client directly to the AI SAFE2 v3.0 control taxonomy. 161 controls. 32 compliance frameworks. Live governance tooling at the agent boundary. We were confident in its security posture when we shipped v3.0.

Then OX Security published their April 2026 disclosure. We ran a full audit of our own codebase against the threat taxonomy. We found four things we had missed. One of them (Finding 4, which the external analysis did not catch) was the most impactful. It had been silently degrading service for every HTTP-transport Pro-tier user.

The Four MCP Security Risk Findings in Detail

Finding 1: No Output Sanitization on Tool Returns (Moderate)

Every tool handler returned raw data to LLM clients without scanning for injection patterns. The data source (ai-safe2-controls-v3.0.json) is a static file we control. But a supply chain attack on that file (malicious PR merge, poisoned CI/CD pipeline, compromised maintainer account) would make the server a delivery mechanism for prompt injection payloads embedded in what appears to be trusted governance content.

The code_review tool was the highest-risk path. It directly injects control descriptions and builder_problem fields as LLM reasoning context. A poisoned control description in that context reaches Claude Code, Codex, or Cursor as trusted governance guidance. This is precisely the ATPA (Advanced Tool Poisoning Attack) vector documented by CyberArk: the payload arrives in the tool response body, not in the schema, completely bypassing pre-deployment static analysis.

The fix: a sanitize.py module implementing sanitize_output() applied as the final return expression in every tool handler. The function scans recursively through nested dicts and lists, detecting and redacting 28 injection pattern families. Every match generates a structured audit log event. This addresses AI SAFE2 v3.0 CP.5.MCP-2 (Output Sanitization Before LLM Return).

Finding 2: STDIO Transport Grants Pro Access Without Identity Binding (Low-Moderate)

In the original auth.py, the STDIO code path was a single conditional with no verification: if TRANSPORT == ‘stdio’: request.state.tier = ‘pro’. Any process that launched with the right command received Pro-tier access unconditionally. Zero verification. Zero identity binding.

The OX Security research demonstrates that zero-click IDE injection (CVE-2026-30615 against Windsurf) can modify .claude/settings.json to insert a malicious STDIO server entry that executes on next agent startup. Our unconditional trust assumption was the attack surface. A malicious server launched in place of ours would inherit Pro-tier access without any credential.

The fix: verify_stdio_security() runs at STDIO startup before accepting any request. It checks that the invoked executable is in the ALLOWED_STDIO_COMMANDS allowlist, that sys.argv matches an expected module pattern, and optionally verifies that the resolved executable path is within a declared install path boundary. An optional source integrity hash (MCP_SOURCE_HASH environment variable, SHA-256 of all .py files and the controls JSON) enables tamper detection. Mismatch on startup triggers sys.exit(1). Fail closed. This addresses AI SAFE2 v3.0 CP.5.MCP-4 (STDIO Transport Integrity Binding).

Finding 3: Rate Limiting Declared but Not Wired (Low)

slowapi>=0.1.9 was declared in pyproject.toml as a dependency. It was never imported anywhere in app.py. The Starlette application had no Limiter instance, no decorators, no middleware. Rate limiting was entirely Caddy-dependent. Direct uvicorn connections (during local development, Railway deployments without Caddy, or any deployment where MCP_HOST is set to 0.0.0.0) had zero application-layer protection against brute-force and DoS.

This is exactly the gap documented in AI SAFE2 v3.0 CP.5.MCP-6 (MCP Server Network Isolation): Caddy and nginx rate limits are bypassed by direct port access. The fix: ratelimit.py implements a thread-safe token bucket rate limiter applied in BearerAuthMiddleware after tier resolution. Key format is tier:ip. Free and Pro buckets are completely isolated: exhausting a free-tier key has zero effect on a Pro key from the same IP. Rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Window, Retry-After) are attached to every response.

Finding 4: HTTP Tier Authentication Was Completely Broken (High)

This is the finding that the external security assessment did not identify. It required tracing the full data flow from BearerAuthMiddleware through to tool execution.

The original code used a module-level global: _current_request: Request | None = None. This variable was initialized to None and never updated anywhere in the codebase. get_tier_from_request(None) fell back to ‘free’. Every HTTP-transport tool call, regardless of whether the request carried a valid Pro token, silently returned free-tier responses.

BearerAuthMiddleware correctly validated the token and set request.state.tier. That value went nowhere useful. It never reached the tools. The ContextVar that was supposed to propagate tier across the async request lifecycle was present in the codebase but was never populated by the middleware.

This bug meant every Pro-tier customer using the HTTP transport was getting free-tier responses for the entire time the HTTP transport existed. The fix: context.py introduces a ContextVar for per-request tier propagation. BearerAuthMiddleware calls set_tier(tier) immediately after token validation. Tool functions call get_tier() which reads the ContextVar. Each asyncio coroutine gets its own copy. No cross-request contamination. No race conditions. This addresses AI SAFE2 v3.0 CP.5.MCP-4 (Transport Integrity) at the application data flow level.

The Test Results

86 new security tests were written covering all four findings plus regression coverage: ContextVar tier propagation and thread isolation for Finding 4, 30 or more injection pattern families across all detection classes for Finding 1, sanitization of nested dicts, lists, and edge cases, false positive validation against real control descriptions, command allowlist and module pattern verification for Finding 2, source hash computation, determinism, and tamper detection, token bucket limits, tier isolation, refill, GC, and headers for Finding 3, and end-to-end pipeline verification from middleware to ContextVar to tool to sanitize covering all four findings.

Combined with the original 51 functional tests: 137 passing. Zero failures. Complete test suite: github.com/CyberStrategyInstitute/ai-safe2-framework/tree/main/skills/mcp/tests

How to Secure Your Own MCP Server

Step 1: Scan Your Source Code

 

📄
filename
pip install aisafe2-mcp-tools
mcp-scan /path/to/your/mcp/server

The scanner covers 20 finding classes across the full threat taxonomy. RCE-001 through RCE-006 cover dynamic command construction in StdioServerParameters, shell=True in subprocess calls, eval() and exec() on external data, unsafe yaml.load() without SafeLoader, path traversal via unvalidated path construction, and kubectl argument injection. INJ-001 through INJ-005 cover missing output sanitization, SSRF-enabling URL parameters, OAuth audience validation failures, rug pull exposure (dynamic tool registration without schema change monitoring), and Full Schema Poisoning surface. SEC-001 through SEC-005 cover 0.0.0.0 host binding, session ID exposure, OAuth confused deputy patterns, cross-tenant isolation gaps, and file path construction without containment checks. Critical findings require manual review and are never auto-applied.

Step 2: Fix Critical Findings First

For RCE-001 (the root OX finding), make the command parameter a string constant:

📄
filename
# WRONG: params = StdioServerParameters(command=user_supplied_cmd, args=[])

# CORRECT: params = StdioServerParameters(command='python', args=['-m', 'my_mcp_server'])

For RCE-002 (shell=True), replace with a list of arguments. Never join user input into a shell string. For path traversal (RCE-005), always normalize and verify the path stays within the intended directory using os.path.realpath() and a startswith() containment check.

Step 3: Add Output Sanitization to Every Tool

📄
filename
from aisafe2_mcp_tools.shared.patterns import sanitize_value
@mcp.tool(description='...')
def my_tool(query: str) -> dict:
    result = do_the_actual_work(query)
    sanitized, findings = sanitize_value(result, 'my_tool')
    if findings: log.warning('injection_detected', count=len(findings))
    return sanitized

The sanitize_value function handles nested dicts and lists recursively. It detects and redacts 28 injection pattern families including instruction override, role confusion, permission escalation, system prompt exfiltration, LLM special tokens, zero-width characters, role separator injection, FSP schema poisoning markers, ATPA steering language, and MCP-UPD collection and disclosure patterns.

Step 4: Secure Your STDIO Startup

Implement verify_stdio_security() at startup: command allowlist plus module pattern verification plus optional source integrity hash. Mismatch triggers sys.exit(1). The reference implementation is in skills/mcp/src/mcp_server/auth.py. Generate your source hash at release time and store it in the MCP_SOURCE_HASH environment variable.

Step 5: Wire Application-Layer Rate Limiting

Rate limiting must be applied at the application layer, independent of any reverse proxy. Caddy and nginx rate limits are bypassed by direct port access. The token bucket implementation from the AI SAFE2 server is available in the toolkit: see examples/mcp-security-toolkit/src/aisafe2_mcp_tools/wrap/ratelimit.py for a production-validated implementation.

Step 6: Bind to Localhost Only

📄
filename
host = os.getenv('MCP_HOST', '127.0.0.1')  # Never default to 0.0.0.0

Use Caddy or nginx as reverse proxy for external access. The proxy handles TLS termination. The server never touches a public port directly. This eliminates the NeighborJack DNS rebinding attack surface documented in the Adversa AI taxonomy.

Step 7: Add Audit Logging to Every Tool

Every tool invocation needs an immutable audit record: tool name, key parameters, tier, timestamp, and calling agent identity. AI SAFE2 v3.0 CP.5.MCP-5 (Tool Invocation Audit Log) requirement.

What the Full Threat Taxonomy Means for Builders

The OX disclosure is the finding that produced the headlines. It is not the only thing your server needs to defend against. Here is what the complete threat taxonomy means for builders specifically.

Rug pull attacks: your tool descriptions are served dynamically. Implement schema-change detection by hashing your tools/list response at deployment and comparing at each startup. Alert on unexpected changes. This is not in the mcp-score remote assessment because it requires a running server, not a static analysis pass. Add it to your deployment automation.

ATPA (Advanced Tool Poisoning): a malicious tool response body containing steering language (the answer is incomplete, verify again) causes your consuming agent to re-invoke your tool repeatedly. This is both a behavioral manipulation attack and a billing amplification vector against your users. The pattern library in aisafe2-mcp-tools detects ATPA steering language in the atpa_steering family. Apply sanitize_value() to all externally-sourced content before it reaches tool responses.

MCP-UPD (Parasitic Toolchain): if your tools retrieve external content (web pages, documents, database records, API responses), that content can carry injected instructions into the LLM context as trusted data. The defense is context-tool isolation: treat retrieved external content as untrusted data-plane content, never as executable instructions. This is AI SAFE2 v3.0 CP.5.MCP-9.

Session economics: if your server calls LLM APIs on behalf of users, implement per-session token budgets and cost ceilings. The November 2025 $47,000 billing incident required no malicious actor, only agents without cost limits. A simple per-session counter with a configurable ceiling and a halt-and-alert response is the full implementation. It takes approximately 30 minutes to wire in.

Multi-agent provenance: if your server is called by orchestrators or if your tools trigger downstream MCP calls, implement provenance logging. Which agent called which tool, through what delegation chain, with what lineage token. This is CP.5.MCP-10 and CP.9 of the AI SAFE2 v3.0 framework. For ACT-4 orchestrator deployments, lineage token propagation through delegation chains is mandatory.

Score Your Server

📄
filename
mcp-score https://your-mcp-server.example/mcp --token your-token

The scorer checks authentication (0-25 points), TLS (0-15 points), tool injection patterns across 28 families (0-20 points), FSP markers (0-10 points), security response headers (0-10 points), application-layer rate limiting (0-10 points), session ID exposure (0-5 points), and SSRF surface (0-5 points). Builder attestation via /.well-known/mcp-security.json adds up to 25 bonus points for controls that cannot be verified remotely (MCP-1 no dynamic commands: +8, MCP-2 output sanitization library reference: +5, MCP-4 source integrity hash: +4, MCP-5 audit logging: +4, MCP-6 network isolation: +4).

Earn the AI SAFE2 MCP Badge

Step 1: Add .well-known/mcp-security.json

This file is publicly accessible (no auth required). It attests to controls not verifiable remotely and unlocks the attestation bonus points in mcp-score. Include your computed source hash in the MCP-4_source_hash field. Reference the aisafe2-mcp-tools library version in MCP-2_output_sanitization. Set MCP-6_network_isolation to ‘127.0.0.1 only’.

Step 2: Score with the –badge flag

📄
filename
mcp-score https://your-server.example/mcp --token your-token --badge

Servers scoring 70 or above are eligible. The output includes the badge markdown when eligible.

Step 3: Add the badge to your README

The badge links to the CSI verification page. Anyone can click it and re-run mcp-score to verify the score independently. This is intentional: the badge is not a self-reported claim. It is a claim that anyone can verify in seconds. Badge validity: 90 days from last assessment date. Re-scan required after any server update affecting scored controls.

The CI/CD Gate Pattern

After your server scores consistently, add mcp-score to your CI/CD pipeline as a deployment gate. This gives you two things: a deployment gate that prevents regressions in security posture, and an artifact trail of score reports that serves as governance evidence for auditors and enterprise customers.

📄
filename
# GitHub Actions example
mcp-score $SERVER_URL --token $MCP_TOKEN --ci-fail-below 70

Why This Standard Is the One That Wins

There are several MCP security guides in circulation. Most are checklists. Some have scanning tools. None have a remote scoring capability tied to a published governance framework with a verifiable badge system and a 134-test open-source toolkit that validates the tools work as a system, not just as individual files.

The AI SAFE2 approach differs from ad hoc checklists in three ways that matter for builders. It is grounded in a published governance framework: the CP.5.MCP controls are citable, versioned, and mapped to 32 external compliance frameworks. When an enterprise customer asks what standard your MCP server conforms to, there is a specific, documented answer. It is independently verifiable: the badge links to a live verification endpoint, the scoring logic is open source, and anyone can reproduce the assessment. It is honest about limitations: the assessment documentation explicitly states what mcp-score can and cannot verify remotely, with the well-known attestation mechanism handling the gap.

The developers choosing which MCP servers to connect their production agents to are making consequential decisions. The AI SAFE2 MCP badge is a credible, specific, verifiable signal backed by a published standard, an open-source toolkit, and a running assessment service. Build to it. Earn it. Put it in your README.

Toolkit: github.com/CyberStrategyInstitute/ai-safe2-framework/tree/main/examples/mcp-security-toolkit

Framework: github.com/CyberStrategyInstitute/ai-safe2-framework

Frequently Asked Questions

Q1. The OX disclosure says this is Anthropic’s design decision. Why should I fix it?

Because your users are exposed regardless of where the design decision was made. Anthropic’s policy transfers the remediation burden to developers. Every downstream breach (credential theft, data exfiltration, billing amplification) will be attributed to your platform, not to the SDK. The four findings in our own audit demonstrate that even security-focused builders miss things. The question is not whether Anthropic should have fixed it. It is whether your users are protected.

Q2. What is the actual risk of the broken HTTP tier auth (Finding 4)?

Every Pro-tier customer using HTTP transport received free-tier responses for the entire time the HTTP transport existed. They paid for Pro access, invoked Pro-tier tools, and received truncated results without any error or indication. This is not a security vulnerability in the traditional sense. It is a functional bug with revenue and trust implications that an external security assessment did not catch because it required understanding the full request flow, not just scanning for known vulnerability patterns.

Q3. How does verify_stdio_security() actually prevent the Windsurf attack?

CVE-2026-30615 works by injecting a malicious entry into the MCP configuration file. When the IDE restarts, it launches the malicious server via STDIO. verify_stdio_security() checks the executable allowlist and module pattern before accepting any requests. A malicious executable not in the allowlist triggers sys.exit(1). The optional source hash adds tamper detection: if any source file has been modified since the hash was computed at release, the server refuses to start.

Q4. Why does the source integrity hash use SHA-256 of all .py files and not a signed manifest?

Signed manifests require key management infrastructure. SHA-256 of source files is deployable in minutes with zero infrastructure. The tradeoff: a signed manifest is stronger against more sophisticated attackers; the hash approach is sufficient against the primary threat (malicious file injection via supply chain or IDE attack). For production deployments at scale, signed container images via Docker Content Trust provide equivalent protection with broader tooling support.

Q5. What is the difference between output sanitization in the MCP server and mcp-safe-wrap?

Server-side sanitization (sanitize_output() in the server) prevents injection payloads in the server’s data source from reaching LLM clients. mcp-safe-wrap consumer-side scanning prevents injection payloads from external servers from reaching your LLM client. They address different threat surfaces: server-side defends against your own server being compromised; consumer-side defends against external servers being compromised.

Q6. What does the mcp-score attestation bonus actually verify?

Nothing directly: it is a self-attested claim. The attestation file is a builder’s declaration that specific controls are implemented. The bonus points reward builders who publish verifiable evidence alongside their attestation: open-source code (anyone can audit MCP-1), sanitization library references (verifiable version), source hash (reproducible). The score report labels the points as builder attestation, not verified controls.

Q7. How do I implement schema change detection for rug pull defense?

At server startup, compute SHA-256 of the tools/list response and store it. On each subsequent startup, compare the current response hash to the stored baseline. Alert on mismatch that was not accompanied by a documented release event. For consumers, mcp-safe-wrap logs tool schema hashes to the audit log on each connection so you can compare across sessions.

Q8. Do I need to implement all 13 CP.5.MCP controls to earn the badge?

No. The badge requires a total score of 70 or above. The seven core controls (MCP-1 through MCP-7) plus builder attestation can achieve 70+ without implementing MCP-8 through MCP-13. MCP-8, MCP-9, and MCP-11 are required at ACT-2+; MCP-10 and MCP-12 are required at ACT-3+. If your server does not support multi-agent orchestration or swarm deployments, those controls are not mandatory for your tier.

Q9. How should I handle the SSRF surface if my server legitimately needs to fetch URLs?

Implement a blocklist for known-dangerous destinations: 169.254.169.254 and variants (AWS IMDS), 100.100.100.200 (Alibaba Cloud), all RFC 1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), loopback (127.0.0.0/8), and file:// URIs. The shared patterns library (SSRF_BLOCKED_PATTERNS) in aisafe2-mcp-tools implements this blocklist and can be imported directly.

Q10. What happens to my badge if my score drops after an update?

The badge becomes invalid 90 days after the last assessment date. If you update your server and the score drops below 70, the badge is no longer valid when anyone clicks through to re-verify. The CI/CD gate pattern (mcp-score with –ci-fail-below 70) prevents releases that would drop the score below the badge threshold.

Q11. Is the mcp-scan static analysis comprehensive enough to replace a security audit?

No. mcp-scan catches known vulnerability patterns across 20 finding classes. It does not perform manual code review, does not assess business logic flaws, does not evaluate dependency security beyond the known CVE list, and does not test runtime behavior. It is a first-pass automated gate. For production deployments at ACT-3 or ACT-4 tier, pair it with a manual security review.

Q12. How do I implement per-session token budgets for MCP-8 compliance?

In your agent orchestration layer (not in the MCP server itself), maintain a running token count per session using the usage field from LLM API responses. When the count exceeds your declared budget, halt tool invocations and require human authorization to continue. The MCP server can expose a session_status tool that the orchestrator calls to check budget state.

Q13. What is ATPA and how do I detect it in tool responses?

Advanced Tool Poisoning Attack places steering language in tool response bodies that causes the agent to re-invoke the tool (for example, the answer is incomplete, please call search again with a broader query). The sanitize_value() function in the shared patterns library detects ATPA steering language in the atpa_steering pattern family and redacts it before the response reaches the LLM client.

Q14. Do I need separate security.json attestation files for each environment (dev, staging, prod)?

Yes. Each deployed server URL should have its own /.well-known/mcp-security.json reflecting the actual controls in that environment. The source hash field (MCP-4) will differ between environments if the source is different. Keep them separate and accurate.

Q15. How does CP.5.MCP-9 context-tool isolation actually work at the code level?

MCP-9 requires external data retrieved via tools (documents, web pages, database records) to be classified as untrusted data-plane content. In practice: apply sanitize_value() from the shared patterns library to all externally-sourced content before including it in tool responses. Do not pass raw external content directly into tool return values. Document this separation in your .well-known/mcp-security.json attestation.

Q16. What Python version is required and are there known compatibility issues?

Python 3.11 or above is required. asyncio.get_running_loop() (used throughout) is Python 3.10 or above. The ContextVar implementation is tested on Python 3.12. asyncio.StreamWriter.write() is confirmed synchronous (not a coroutine) on Python 3.12.3: verify this on your target version if you deploy on an unusual Python build.

Q17. Where should I direct users who find security issues in my MCP server?

Establish a security.txt at /.well-known/security.txt (RFC 9116) with your disclosure contact and PGP key. Reference it in your .well-known/mcp-security.json contact field. For MCP-specific vulnerabilities that may be protocol-level, coordinate with the Agentic AI Foundation via the Linux Foundation disclosure process. For issues in the aisafe2-mcp-tools toolkit itself, file at github.com/CyberStrategyInstitute/ai-safe2-framework/security/advisories

KERNEL-LEVEL DEFENSE 2025 A Buyers Guide