Back to Blog
·8 min read·Abhijit

Provenance Graphs: Why Syscall Streams Are Not Enough for Agent Security

Event streams show what happened. Provenance graphs show why. Here's how Ring Zero reconstructs causal chains from kernel events to detect attacks that no individual alert would catch.

Provenance GraphGraph RAGDetection EngineeringAI Security

The Lede

Every security tool in the market today processes kernel events as a stream — a time-ordered sequence of syscalls, file accesses, and network connections. Each event gets scored independently. An alert fires when an individual event crosses a threshold. This model works for detecting malware (a single malicious binary) or known-bad infrastructure (a connection to a C2 domain). It fails catastrophically for AI agent attacks.

Agent attacks are not single events. They are causal chains — sequences of individually benign actions that become malicious only when connected. No individual event in the chain is anomalous. The anomaly is the chain itself.

The Problem with Event Streams

Consider this sequence of kernel events from an AI agent session:

14:32:01.001  open("/home/dev/project/README.md", O_RDONLY)
14:32:01.847  open("/home/dev/.ssh/id_rsa", O_RDONLY)
14:32:02.103  execve("/usr/bin/base64", ...)
14:32:02.891  connect(45.33.32.156:443)

In a stream-based system, each event is scored independently:

  • README read: score 0 (benign)
  • SSH key read: score 15 (sensitive file, but developers access SSH keys)
  • base64 execution: score 10 (common utility)
  • Network connection: score 20 (unknown IP, moderate risk)
No individual event crosses the alert threshold. The attack succeeds silently.

Provenance Graphs: Connecting the Dots

A provenance graph represents the same events as a directed graph where edges encode causal relationships:

[FILE_READ] README.md
  └─ contains injection payload
       └─ causes [AGENT_DECISION] to execute unauthorized actions
            ├─ [FILE_READ] ~/.ssh/id_rsa
            │    └─ content piped to base64
            │         └─ [EXEC] base64 (pid 48291)
            │              └─ output piped to curl
            │                   └─ [EXEC] curl POST (pid 48292)
            │                        └─ [NET_CONNECT] 45.33.32.156:443
            └─ [SSL_UPROBE] prompt was "refactor this code"
                 └─ INTENT MISMATCH: refactoring ≠ SSH key exfiltration

Now the attack is visible. Not because any individual event is anomalous, but because the causal chain from a code refactoring prompt to SSH key exfiltration is implausible.

How Ring Zero Builds the Graph

Node types

Every kernel event becomes a node with a type:

  • PROCESS: agent process, subprocess spawns
  • FILE: file open/read/write operations
  • NETWORK: outbound connections, DNS lookups
  • PROMPT: captured via SSL uprobes (intent layer)
  • VULN: OSV vulnerability check results (supply chain context)
  • VIOLATION: baseline policy violations

Edge construction

Edges are created based on causal relationships:

  • Process → File: process opened/read/wrote the file
  • Process → Process: parent spawned child (fork/exec)
  • Process → Network: process initiated the connection
  • Prompt → Process: prompt content led to agent action
  • File → Process: file content was input to subprocess (pipe tracing)

Per-query subgraph assembly

The full provenance graph for a long agent session can have thousands of nodes. For LLM analysis, Ring Zero extracts a per-query subgraph — the relevant neighborhood around a flagged event. This keeps the context window manageable while preserving causal context.

The subgraph is serialized as structured text and sent to the security LLM (Gemma 4 via Gemini API) for verdict.

Graph RAG vs. Traditional RAG

Traditional RAG retrieves text chunks from a vector store based on semantic similarity. This works for knowledge retrieval but fails for security analysis because:

  1. Security requires causal reasoning, not semantic similarity — "SSH key read" is semantically similar to "SSH key generation," but one is an attack and the other is not
  2. Context must be structurally connected, not topically related — the LLM needs to see the full chain from prompt to exfiltration, not a bag of related events
  3. Temporal ordering matters — the same events in different order have different security implications
Graph RAG solves these problems by retrieving a structurally connected subgraph rather than semantically similar text chunks. The LLM sees the causal chain, not a topic cluster.

Performance

Provenance graph construction adds minimal overhead:

  • Node insertion: O(1) — hash map backed
  • Edge insertion: O(1) — adjacency list
  • Subgraph extraction: O(k) where k is the neighborhood size (typically 20-50 nodes)
  • Full graph memory: ~100 bytes per node, capped at the configured context window
The graph is maintained in memory during the agent session and garbage-collected after session end (or persisted to sled for cross-session correlation in future versions).

Takeaways

  • Event streams detect anomalous individual events; provenance graphs detect anomalous causal chains
  • AI agent attacks are chains of individually benign actions — stream-based detection is structurally blind to them
  • Causal edges (not just temporal ordering) are required to distinguish "developer read SSH key intentionally" from "agent was tricked into exfiltrating SSH key"
  • Graph RAG provides structurally connected context to the LLM, enabling causal reasoning rather than pattern matching
  • The provenance graph is the core detection primitive for agent security — without it, you are scoring events in isolation
---

Want to see the provenance graph in action? Book a demo and we'll walk through a live attack chain reconstruction.

Protect your AI agents today

Install Ring Zero in under 5 minutes. Free for up to 3 agents.