Why Raw LLM Outputs Can’t Secure Your Enterprise – and How Cisco’s Foundry Spec Turns AI Agents into Auditable Security Tools

Raw LLM outputs are unsuitable for enterprise security; Cisco’s Foundry Spec provides a guardrail architecture to make AI agents auditable and low‑false‑positive.

12 min read
13 May 2026
Cisco Foundry Spec for AI Vulnerability Scanning

What does the Cisco Foundry Security Spec actually do for AI‑driven vulnerability scanning? → It wraps any LLM in a fixed orchestration, role‑based guardrails and an auditable provenance chain, turning chaotic model output into a bounded, verifiable security report.

Can I use the spec with today’s GPT‑4 or only with frontier models like Anthropic’s Mythos? → The spec is model‑agnostic; it works with any LLM that can be called through a standard API.

Will adopting a spec‑driven pipeline add latency to my security workflow? → Yes, but the added latency (typically 200‑500 ms per request) is offset by the reduction in false positives and the ability to prove compliance to auditors.

Is the Foundry approach just another compliance checklist? → No. It is an architectural pattern that forces you to define roles, coverage floors and economic‑yield thresholds before the model ever sees production data.

How does Plavno help enterprises implement this guardrail architecture? → We combine our AI‑security consulting practice with custom agent development to embed the spec into existing CI/CD pipelines and SIEM integrations.

Quick Answer

Enterprises that rely on raw LLM responses for vulnerability detection are exposing themselves to hallucinations, unbounded output and audit failures. The reliable way to secure AI‑driven security agents is to adopt a spec‑driven orchestration layer—like Cisco’s Foundry Security Spec—that defines explicit roles (orchestrator, detector, validator, etc.), enforces guardrails at the substrate level, and produces a verifiable provenance chain. This transforms a generative model from a demo tool into a production‑grade security evaluator that can be defended before a CISO and an external auditor.

Raw LLM Output Is a Security Liability

When a security analyst drops a prompt into a chat window and asks an LLM to “find the bugs” in a codebase, the model often returns a flood of findings that mix genuine vulnerabilities with fabricated ones. The problem is not the model’s knowledge; it is the lack of a deterministic workflow. In production environments, a security team needs three things that raw output cannot guarantee: bounded scope, reproducibility, and an audit trail. Without these, compliance teams cannot certify that the findings were generated under controlled conditions, and incident responders waste precious hours triaging false alarms. Moreover, the sheer volume of unstructured text makes downstream automation—such as ticket creation in ServiceNow or integration with a CVE database—practically impossible.

The Foundry Security Spec Provides a Guardrail Architecture

Cisco’s Foundry Security Spec is a concrete answer to the chaos described above. It does not prescribe a particular model; instead, it defines a set of functional requirements and role‑based responsibilities that any LLM‑powered agent must satisfy. The spec includes eight core roles—Orchestrator, Indexer, Cartographer, Detector, Validator, Reporter, Auditor, and Remediator—plus five extension roles for niche tasks such as policy translation. Each role is bound by a constitution of eleven principles that encode real production failures Cisco has observed. The result is a scaffolding that forces the model to operate within a deterministic pipeline, producing a bounded, prioritized list of findings, a clear “done” signal, and a full provenance chain from detection through triage to publication.

Core Components of an Orchestrated AI Security Pipeline

  • Orchestrator: The entry point that validates the request, enforces rate limits and injects a coverage‑floor policy. It ensures that the downstream detector only sees the subset of assets it is authorized to scan.
  • Detector: The LLM that performs the actual vulnerability analysis. It runs inside a sandbox that restricts file system access and injects a “no‑write” guardrail, preventing the model from inadvertently modifying production code.
  • Validator: A deterministic rule engine that cross‑checks the detector’s claims against an internal knowledge base of known CVEs and policy constraints. If a claim fails validation, the system flags it for human review instead of auto‑publishing.
  • Auditor: A logging service that records every API call, input prompt, model temperature, and output token. The audit log is immutable and can be exported to a SIEM for compliance reporting.
  • Remediator: An optional role that can automatically generate patches for verified findings, but only after a human‑in‑the‑loop approval step.

These components are wired together through a coordination substrate—typically an event‑driven message bus such as Apache Kafka or a serverless workflow engine like AWS Step Functions. The substrate guarantees exactly‑once processing, which eliminates duplicate tickets and ensures that the “done” signal is emitted only after all validation steps have completed.

Trade‑offs: Flexibility vs. Control

Adopting a spec‑driven architecture inevitably introduces friction. Teams that are accustomed to “fire‑and‑forget” prompts now have to define coverage floors, economic‑yield thresholds and role contracts before the model can be used. This adds roughly 200 ms of latency per request and requires additional engineering effort to integrate the orchestration layer with existing CI/CD pipelines. However, the trade‑off is a dramatic reduction in false positives—typically from 30 % down to under 5 %—and a measurable improvement in audit readiness. For regulated industries such as finance or healthcare, the ability to prove that a security scan was performed under a documented, repeatable process outweighs the modest performance hit.

Real‑World Scenario: Auditing a New LLM‑Based Vulnerability Scanner

Imagine a mid‑size fintech firm that wants to replace its legacy static analysis tool with a GPT‑4‑backed scanner. The engineering team builds a thin wrapper that sends source files to the model and receives a JSON list of potential CVEs. Within two weeks, the security team is overwhelmed by hundreds of “findings” that include invented CVE IDs and nonsensical line numbers. The firm decides to adopt the Foundry spec.

First, the Orchestrator is configured to only scan files that reside in the “/src” directory and to enforce a maximum token budget of 4 000 per file. The Detector runs the LLM inside a Docker container with a read‑only file system. The Validator cross‑references each claim with the NVD API; any claim that does not map to a known CVE is automatically relegated to a “review” queue. The Auditor logs every request, including the model temperature (set to 0.2 for determinism) and the exact prompt used. After a week of operation, the firm sees a 70 % drop in false positives, a 40 % reduction in time‑to‑triage, and a clean audit trail that satisfies the regulator’s requirement for reproducible security testing.

Plavno’s Perspective on Building Secure AI Agents

At Plavno we have helped dozens of enterprises embed AI agents into their security operations. Our AI‑security consulting practice emphasizes the same guardrail principles that Cisco codifies in the Foundry spec. We start by mapping the organization’s existing detection workflow to the eight core roles, then we inject the spec’s constitution of principles as policy as code. By leveraging our AI security solutions we can auto‑generate the orchestration layer, integrate it with the client’s SIEM, and provide a turnkey audit log that satisfies both internal governance and external auditors. The result is a production‑ready AI security agent that behaves predictably, scales horizontally, and can be updated without breaking the provenance chain.

Explore our AI consulting, cloud software development, custom software development, and digital enterprise software development services.

Business Impact of a Spec‑Driven Guardrail

Enterprises that move from ad‑hoc LLM prompts to a spec‑driven pipeline report three measurable benefits. First, the cost of triage drops because analysts spend less time chasing hallucinated findings; this translates to a 15‑30 % reduction in security staffing expenses. Second, compliance risk falls dramatically—audit failures drop from an average of 2‑3 per year to zero, because the audit log provides immutable evidence of every scan. Third, the organization gains a competitive edge: by publishing a “secure AI‑driven vulnerability report” that is auditable, the firm can market its security posture to customers and regulators, unlocking new business opportunities in highly regulated markets.

How to Evaluate This in Practice

When a CTO or CISO asks whether to invest in a spec‑driven AI security pipeline, the decision should be framed as a cost‑benefit analysis of risk reduction versus engineering effort. The first step is to quantify the current false‑positive rate and the average time‑to‑triage. Next, map the existing workflow to the eight Foundry roles and estimate the effort required to implement each role as a microservice. Finally, run a pilot on a single high‑value asset (for example, the payment gateway codebase) and measure the reduction in false positives, the latency added, and the completeness of the audit log. If the pilot shows a net reduction in total cost of ownership—taking into account staffing, compliance fines and potential breach costs—then scaling the architecture across the organization is justified.

Real‑World Applications Beyond Security

The guardrail pattern championed by the Foundry spec is not limited to vulnerability scanning. Any enterprise AI agent that performs autonomous actions—such as code generation, data extraction, or automated ticket routing—benefits from the same role‑based orchestration. For instance, a generative code assistant can be wrapped in an Orchestrator that enforces a “no‑write‑outside‑sandbox” rule, while a Validator checks generated snippets against internal style guides before they are merged. By reusing the same coordination substrate, organizations can build a family of compliant AI agents that share a common audit framework, reducing operational overhead and simplifying governance.

Risks and Limitations

While the spec‑driven approach mitigates many of the pitfalls of raw LLM usage, it is not a silver bullet. The architecture assumes that the underlying model is sufficiently capable of producing accurate findings when constrained; a poorly tuned model will still generate low‑quality output, and the Validator will only filter, not improve, those results. Additionally, the need to maintain a large set of functional requirements (approximately 130 in the current spec) can become a burden for small teams. Finally, the approach relies on the organization’s ability to enforce strict role boundaries; if a developer inadvertently grants the Detector write permissions, the guardrails collapse.

Closing Insight

The real breakthrough in AI‑driven security is not the raw intelligence of the LLM, but the discipline of wrapping that intelligence in a deterministic, auditable pipeline. Cisco’s Foundry Security Spec shows that the future of secure AI agents lies in architecture, not in model size. Engineers who treat the spec as a contract rather than a checklist will be able to deliver verifiable security findings, satisfy auditors and keep the organization’s risk profile under control. Those who continue to rely on ad‑hoc prompts will soon find themselves drowning in hallucinations and compliance gaps.

Eugene Katovich

Eugene Katovich

Sales Manager

Ready to secure your AI agents?

Ready to secure your AI agents with a proven guardrail framework? Contact Plavno’s AI security experts to launch a pilot that integrates the Foundry spec into your existing security stack and delivers auditable, low‑false‑positive vulnerability scans.

Schedule a Free Consultation

Frequently Asked Questions

Cisco Foundry Spec FAQs

Common questions about Cisco Foundry Spec

How much does implementing the Cisco Foundry Spec add to the cost of an AI vulnerability scanner?

The spec itself is free, but you need engineering resources to build the orchestration layer—typically $80‑120 k for a pilot and $150‑200 k for full rollout.

What is the typical implementation timeline for a spec‑driven guardrail architecture?

A proof‑of‑concept can be delivered in 4‑6 weeks; a production‑grade deployment across one asset class usually takes 8‑12 weeks.

What risks remain after adopting the Foundry spec?

Residual risks include model hallucinations that slip past validation, mis‑configured role permissions, and the overhead of maintaining a large functional requirement set.

Can the spec be integrated with existing CI/CD and SIEM tools?

Yes—Plavno provides connectors for Jenkins, GitHub Actions, Azure DevOps, and SIEMs such as Splunk, Elastic, and QRadar via standard webhooks or Kafka streams.

How does the solution scale for large codebases or multi‑cloud environments?

The orchestration layer is stateless and can be horizontally scaled; using a message bus (Kafka) ensures exactly‑once processing even across thousands of files and cloud regions.