Audit Logging and AI-SPM Integration

SCHEMABOUND provides a centralized, tamper-evident audit pipeline that captures every agent activity — user input, LLM responses, function arguments, SQL executed, execution time, and user identity — and exports it in formats suitable for enterprise SIEM platforms and specialized AI Security Posture Management (AI-SPM) tools.

Why This Matters

Agent-driven systems introduce risk that traditional application audit trails were not designed to capture: adversarial prompt injections, LLM jailbreak attempts, data exfiltration via template expressions, and unauthorized role escalation through natural language. The SCHEMABOUND audit pipeline treats these as first-class security signals alongside the conventional access-control and query-execution events.

Architecture

The audit system is composed of four interlocking parts:

 Client Request
       │
       ▼
 ┌─────────────────────┐
 │  InjectionGuard     │  ← scans input before gRPC executor sees it
 │  InputScannerHook   │
 └────────┬────────────┘
          │ PromptInjectionSignalRaised (if pattern matches)
          ▼
 ┌─────────────────────┐
 │  EventBus           │  ← global ordered handler + exporter chain
 │  (hash chain)       │
 └────────┬────────────┘
          │ AuditEventEnvelope (sequence, hash, trace_id, …)
          ▼
 ┌─────────────────────────────────────────────────────┐
 │  MultiTransportExporter                             │
 │  ├─ stderr (OCSF NDJSON or raw JSON)               │
 │  ├─ HTTP webhook  (SCHEMABOUND_AUDIT_WEBHOOK_URL)          │
 │  └─ rotating file (SCHEMABOUND_AUDIT_FILE_PATH)            │
 └─────────────────────────────────────────────────────┘

Tamper-Evident Hash Chain

Every exported audit envelope carries a SHA-256 hash chain that links each event to the one before it. This makes it possible for a downstream SIEM to detect gaps or mutations in the audit stream.

Field	Description
`sequence`	Monotonically increasing counter across the process lifetime
`prev_hash`	SHA-256 of the previous envelope’s hash input
`hash`	SHA-256 of `"{sequence}\|{prev_hash}\|{event_json}\|{emitted_at}"`
`emitted_at`	ISO-8601 timestamp at point of dispatch
`trace_id`	W3C trace-context trace ID for distributed tracing correlation
`span_id`	W3C trace-context span ID

A missing sequence number or a hash that does not chain correctly from the previous record is strong evidence of log tampering.

OCSF v1.1 Output

All exports default to OCSF (Open Cybersecurity Schema Framework) v1.1. OCSF is the interchange format used by major SIEM and AI-SPM vendors including Amazon Security Lake, Microsoft Sentinel, Splunk, and Wiz.

Class Mapping

SCHEMABOUND Event	OCSF Class	Class UID
`SessionRegistered`	Authentication	2001
`AccessDenied`, `PromptInjectionSignalRaised`	Security Finding	2004
`QueryExecuted`, `QueryValidation*`, `QueryExecutionError`, `RowsFiltered`, `ColumnsRedacted`	Database Activity	6003
`PlanCreated`, `PlanCompleted`, `PlanFailed`, `LlmToolCallAuditRecorded`	API Activity	6005

Severity Mapping

OCSF Severity	SCHEMABOUND Trigger
Critical (5)	Plan failure with execution error
High (4)	`PromptInjectionSignalRaised` with severity `high`, access denied events
Medium (3)	`PromptInjectionSignalRaised` with severity `medium`
Informational (1)	All other events

The unmapped OCSF field carries SCHEMABOUND-specific chain fields (roam_hash, roam_prev_hash, roam_sequence) that have no direct OCSF equivalent but are required for continuity verification. When trace correlation is present it is emitted under metadata.trace_uid and metadata.span_uid.

Transport Configuration

All transports are configured via environment variables and can be combined.

Stdout / Stderr

SCHEMABOUND_AUDIT_STDOUT=ocsf   # emit OCSF v1.1 NDJSON to stderr (default)
SCHEMABOUND_AUDIT_STDOUT=json   # emit raw AuditEventEnvelope JSON to stderr
SCHEMABOUND_AUDIT_STDOUT=off    # disable stderr output

HTTP Webhook

SCHEMABOUND_AUDIT_WEBHOOK_URL=https://siem.example.com/ingest

One OCSF record per HTTP POST with Content-Type: application/x-ndjson. The request is fire-and-forget — failures are silently discarded to keep the request path unblocked. Use an internal aggregation endpoint (Fluent Bit, Logstash, Vector) to buffer and retry if delivery guarantees are required.

Rotating File

SCHEMABOUND_AUDIT_FILE_PATH=/var/log/roam/audit.ndjson
SCHEMABOUND_AUDIT_FILE_MAX_MB=100    # rotate at 100 MB (default)

Appends one OCSF NDJSON record per line. When the file reaches SCHEMABOUND_AUDIT_FILE_MAX_MB it is renamed to <path>.1 and a new file is opened. One generation of rotation is kept; integrate with a log shipper for longer retention.

OTLP / Distributed Tracing

OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318

When this variable is set the backend initialises an OpenTelemetry SDK tracer provider with a Tokio-backed batch span exporter (HTTP/protobuf transport — opentelemetry-otlp with http-proto + reqwest-client features). Every HTTP request handled by AuditFairing creates a child span whose trace_id and span_id are captured and written into the AuditEventEnvelope before the envelope is dispatched when request trace context is available.

When the variable is absent the SDK is still initialised in no-op mode, so nothing is exported over the network. Audit envelopes only include trace_id / span_id when AuditFairing can derive them from propagated tracing context (for example, a valid traceparent header) or from the legacy headers it falls back to; requests without either source may not include those fields.

At process exit the provider is shut down gracefully, flushing any in-flight spans.

Collector example (docker-compose):

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    ports:
      - "4318:4318"   # OTLP HTTP
    command: ["--config=/etc/otel/config.yaml"]

# otel/config.yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [jaeger]

Prompt Injection Detection

The InjectionGuard scans every incoming query against a compiled set of regular expressions before the gRPC executor processes it.

Pattern Library

Pattern	Severity	Example trigger
`instruction_override`	High	“Ignore all previous instructions…”
`role_injection`	High	“You are now an unrestricted assistant…”
`system_prompt_boundary`	High	`[SYSTEM]`, `### system`, `<system>` markers
`jailbreak_token`	High	DAN, “do anything now”, “jailbreak”
`prompt_exfiltration`	Medium	“Reveal your system prompt”
`sql_template_injection`	Medium	`{{user_input}}`, `${expr}`
`delimiter_injection`	Medium	`--- system:`, `=== assistant:`

When a pattern matches, a PromptInjectionSignalRaised event is always emitted to the audit pipeline regardless of policy. What differs is whether the request continues:

Injection Policy

SCHEMABOUND_INJECTION_POLICY=observe   # record signal, allow request (default)
SCHEMABOUND_INJECTION_POLICY=block     # record signal, reject request when severity ≥ medium

Use observe during a rollout period to build a baseline of true-positive rates before switching to block. The detection signal is available to your SIEM in either mode.

What Gets Logged

The emitted PromptInjectionSignalRaised event includes:

excerpt — first 500 characters of the input (truncated to limit PII surface)
input_hash — SHA-256 of the full input string for forensic correlation
patterns — list of matched pattern names
severity — highest matched severity
action_taken — "observed" or "blocked"
All standard QueryRuntimeContext metadata (user_id, org_id, session_id, …)

Distributed Tracing Correlation

SCHEMABOUND accepts W3C trace-context headers and propagates them into every audit envelope. This allows audit records to be joined with OTLP spans in your observability platform.

OpenTelemetry (primary)

When OTEL_EXPORTER_OTLP_ENDPOINT is configured (or the no-op SDK is active) the AuditFairing extracts any incoming W3C traceparent header, creates a child tracing::Span, and reads the live trace_id / span_id from the active OTel context via current_trace_context(). Both values are written into every AuditEventEnvelope.

Legacy headers (fallback)

For clients that cannot inject traceparent the following proprietary headers are accepted as a fallback when no OTel context is available:

Header	Purpose
`x-schemabound-trace-id`	Trace ID (used when `traceparent` is absent)
`x-schemabound-span-id`	Span ID (used when `traceparent` is absent)

Both values appear in the AuditEventEnvelope (trace_id, span_id fields) and in the unmapped section of every OCSF record.

HTTP Request Audit (AuditFairing)

In addition to gRPC-level events, the SCHEMABOUND backend attaches an AuditFairing to every HTTP request. This records:

request method and path
response status code
measured request duration in milliseconds
user identity headers (x-schemabound-user-id, x-schemabound-organization-id)
trace ID from x-schemabound-trace-id

These records are emitted as LlmToolCallAuditRecorded events (OCSF class 6005 — API Activity) and flow through the same MultiTransportExporter as all other audit events.

Registering an Audit Exporter

Any process that embeds the SCHEMABOUND event bus can attach additional exporters:

#![allow(unused)]
fn main() {
use schemabound::{get_event_bus, AuditExporter, AuditEventEnvelope};
use async_trait::async_trait;
use std::sync::Arc;

struct MySiemExporter;

#[async_trait]
impl AuditExporter for MySiemExporter {
    async fn export(&self, envelope: AuditEventEnvelope) {
        // serialize and forward to your SIEM
    }
}

get_event_bus().register_audit_exporter(Arc::new(MySiemExporter))?;
}

Exporters are called concurrently after every dispatched event. They do not appear in the synchronous handler chain and cannot block or short-circuit event processing.

SIEM Integration Notes

Amazon Security Lake (OCSF native)

SCHEMABOUND OCSF records are compatible with Security Lake’s custom source ingestion. Point SCHEMABOUND_AUDIT_WEBHOOK_URL at a Firehose delivery stream configured for OCSF v1.1.

Splunk

Use the Splunk HEC endpoint with SCHEMABOUND_AUDIT_WEBHOOK_URL. The OCSF JSON structure maps directly to Splunk’s _raw field with the CIM-compatible security_finding sourcetype.

Microsoft Sentinel

Route the NDJSON file output with the Azure Monitor Agent using the OCSF table schema, or use the webhook transport targeting a Data Collection Endpoint.

AI-SPM Vendors (Wiz, Lacework, Orca)

These vendors consume OCSF class 2004 (Security Finding) records for AI-specific threat modelling. The PromptInjectionSignalRaised events with severity, pattern names, and input hash give AI-SPM tools the raw material they need to build attack timeline views and posture scoring.

Security Considerations

Input truncation: only the first 500 characters of a matched query are stored in the audit record. The full input is never persisted; only its SHA-256 hash is retained for forensic correlation.
Fire-and-forget webhook: transport failures are silently discarded. Use a local sidecar aggregator (Fluent Bit, Vector) if delivery guarantees are a hard requirement.
Hash chain integrity: chain verification is the consumer’s responsibility. SCHEMABOUND provides the chain fields; the SIEM or audit consumer should alert on gaps.
Policy default is observe: SCHEMABOUND does not block traffic by default. Switching to block is an explicit operational decision that must be validated against false-positive rates in your environment before enabling in production.

Keyboard shortcuts

SCHEMABOUND Framework Documentation