Provenance Graphs: Why Syscall Streams Are Not Enough for Agent Security
Event streams show what happened. Provenance graphs show why. Here's how Ring Zero reconstructs causal chains from kernel events to detect attacks that no individual alert would catch.
The Lede
Every security tool in the market today processes kernel events as a stream — a time-ordered sequence of syscalls, file accesses, and network connections. Each event gets scored independently. An alert fires when an individual event crosses a threshold. This model works for detecting malware (a single malicious binary) or known-bad infrastructure (a connection to a C2 domain). It fails catastrophically for AI agent attacks.
Agent attacks are not single events. They are causal chains — sequences of individually benign actions that become malicious only when connected. No individual event in the chain is anomalous. The anomaly is the chain itself.
The Problem with Event Streams
Consider this sequence of kernel events from an AI agent session:
14:32:01.001 open("/home/dev/project/README.md", O_RDONLY)
14:32:01.847 open("/home/dev/.ssh/id_rsa", O_RDONLY)
14:32:02.103 execve("/usr/bin/base64", ...)
14:32:02.891 connect(45.33.32.156:443)
In a stream-based system, each event is scored independently:
- README read: score 0 (benign)
- SSH key read: score 15 (sensitive file, but developers access SSH keys)
- base64 execution: score 10 (common utility)
- Network connection: score 20 (unknown IP, moderate risk)
Provenance Graphs: Connecting the Dots
A provenance graph represents the same events as a directed graph where edges encode causal relationships:
[FILE_READ] README.md
└─ contains injection payload
└─ causes [AGENT_DECISION] to execute unauthorized actions
├─ [FILE_READ] ~/.ssh/id_rsa
│ └─ content piped to base64
│ └─ [EXEC] base64 (pid 48291)
│ └─ output piped to curl
│ └─ [EXEC] curl POST (pid 48292)
│ └─ [NET_CONNECT] 45.33.32.156:443
└─ [SSL_UPROBE] prompt was "refactor this code"
└─ INTENT MISMATCH: refactoring ≠ SSH key exfiltration
Now the attack is visible. Not because any individual event is anomalous, but because the causal chain from a code refactoring prompt to SSH key exfiltration is implausible.
How Ring Zero Builds the Graph
Node types
Every kernel event becomes a node with a type:
- PROCESS: agent process, subprocess spawns
- FILE: file open/read/write operations
- NETWORK: outbound connections, DNS lookups
- PROMPT: captured via SSL uprobes (intent layer)
- VULN: OSV vulnerability check results (supply chain context)
- VIOLATION: baseline policy violations
Edge construction
Edges are created based on causal relationships:
- Process → File: process opened/read/wrote the file
- Process → Process: parent spawned child (fork/exec)
- Process → Network: process initiated the connection
- Prompt → Process: prompt content led to agent action
- File → Process: file content was input to subprocess (pipe tracing)
Per-query subgraph assembly
The full provenance graph for a long agent session can have thousands of nodes. For LLM analysis, Ring Zero extracts a per-query subgraph — the relevant neighborhood around a flagged event. This keeps the context window manageable while preserving causal context.
The subgraph is serialized as structured text and sent to the security LLM (Gemma 4 via Gemini API) for verdict.
Graph RAG vs. Traditional RAG
Traditional RAG retrieves text chunks from a vector store based on semantic similarity. This works for knowledge retrieval but fails for security analysis because:
- Security requires causal reasoning, not semantic similarity — "SSH key read" is semantically similar to "SSH key generation," but one is an attack and the other is not
- Context must be structurally connected, not topically related — the LLM needs to see the full chain from prompt to exfiltration, not a bag of related events
- Temporal ordering matters — the same events in different order have different security implications
Performance
Provenance graph construction adds minimal overhead:
- Node insertion: O(1) — hash map backed
- Edge insertion: O(1) — adjacency list
- Subgraph extraction: O(k) where k is the neighborhood size (typically 20-50 nodes)
- Full graph memory: ~100 bytes per node, capped at the configured context window
Takeaways
- Event streams detect anomalous individual events; provenance graphs detect anomalous causal chains
- AI agent attacks are chains of individually benign actions — stream-based detection is structurally blind to them
- Causal edges (not just temporal ordering) are required to distinguish "developer read SSH key intentionally" from "agent was tricked into exfiltrating SSH key"
- Graph RAG provides structurally connected context to the LLM, enabling causal reasoning rather than pattern matching
- The provenance graph is the core detection primitive for agent security — without it, you are scoring events in isolation
Want to see the provenance graph in action? Book a demo and we'll walk through a live attack chain reconstruction.
Protect your AI agents today
Install Ring Zero in under 5 minutes. Free for up to 3 agents.