Security

PAI Security System v4.0 — Inspector Pipeline

Last synced: Apr 22, 2026

PAI Security System v4.0 — Inspector Pipeline

Defense-in-depth via a composable inspector pipeline. Inspired by Block Goose’s ToolInspector architecture.

Constitutional security rules (external content = READ-ONLY, STOP and REPORT) are in the system prompt (PAI/PAI_SYSTEM_PROMPT.md). This file documents the security ARCHITECTURE.


Architecture

Four hooks, one pipeline, five inspectors (four active, one disabled).

                    ┌─────────────────────────────────┐
 PreToolUse ───────►│  SecurityPipeline.hook.ts        │
                    │  ┌───────────────────────────┐  │
                    │  │ InspectorPipeline          │  │
                    │  │  1. PatternInspector (100)  │  │──► deny (exit 2)
                    │  │  2. EgressInspector  (90)   │  │──► require_approval (ask)
                    │  │  3. RulesInspector   (50)   │  │──► alert (log + allow)
                    │  │     (DISABLED — empty rules) │  │
                    │  └───────────────────────────┘  │──► allow (silent)
                    └─────────────────────────────────┘

 PostToolUse ──────► ContentScanner.hook.ts
                     InjectionInspector scans tool output
                     Advisory only — injects warning, cannot block

 PermissionRequest ► SmartApprover.hook.ts
                     Trusted workspace → auto-approve
                     Read operations → auto-approve
                     Write operations → user decides

 UserPromptSubmit ─► PromptGuard.hook.ts
                     PromptInspector scans user prompts
                     Heuristic-only (no LLM) — can block

Inspector Pipeline Pattern

Every inspector implements:

interface Inspector {
  name: string;
  priority: number;  // Higher = runs first
  inspect(ctx: InspectionContext): InspectionResult;
}

Results: allow | deny | require_approval | alert

Pipeline runs inspectors in priority order. Short-circuits on first deny. Accumulates require_approval (returns highest-priority). Returns allow only if all inspectors allow. Inspector errors are logged and skipped.


Inspectors

PatternInspector (priority 100)

Pattern-based command and path validation using patterns.yaml.

Bash commands: trusted → blocked → confirm → alert → allow File paths: zeroAccess → alertAccess → confirmAccess → readOnly → confirmWrite → noDelete → allow

Fails closed if patterns.yaml is missing or corrupt.

EgressInspector (priority 90)

Monitors outbound data in Bash commands.

  • Deny: Credential patterns (sk_live_, sk-ant-, etc.) combined with outbound tools
  • Deny: Pipe to shell (| sh, | bash, | zsh)
  • Alert: HTTP POST, netcat, env dumps, inline interpreters

RulesInspector (priority 50) — DISABLED

Previously evaluated tool calls against natural language rules in SECURITY_RULES.md via Haiku LLM call (~3s per unique tool call). All rules have been migrated to deterministic inspectors (PatternInspector, EgressInspector, PromptInspector). SECURITY_RULES.md is now empty, which auto-disables RulesInspector at zero cost. The inspector code remains in the pipeline for future use if custom LLM-evaluated rules are needed.

PromptInspector (priority 95)

Scans user prompts for injection, exfiltration, evasion, and security disable attempts.

  • Injection: instruction overrides, role reassignment, system impersonation
  • Exfiltration: two-phase detection — sensitive data reference + outbound intent (both must match)
  • Evasion: base64 decode, hex-encoded payloads
  • Security disable: attempts to disable hooks, logging, monitoring
  • Returns deny for block-severity patterns, alert for warn-severity
  • Heuristic-only — no LLM inference (fast, deterministic, no cost)

InjectionInspector (priority 80)

Scans tool output for prompt injection patterns.

  • Instruction overrides, system impersonation, hidden instructions, urgency manipulation
  • Returns require_approval (PostToolUse cannot block — advisory only)

SmartApprover (PermissionRequest)

Three-tier permission model:

  1. Trusted workspace (fast path, no LLM): ~/.claude/, ~/Projects/, ~/LocalProjects/ → auto-approve
  2. Read operations: ls, cat, git status, rg, etc. → auto-approve
  3. Write operations: everything else → user decides

Caches classification decisions per session.


Hook Wiring (settings.json)

EventHookBehavior
PreToolUse (Bash, Write, Edit, MultiEdit)SecurityPipeline.hook.tsPipeline: deny=exit(2), approval=ask, alert=log
PreToolUse (Skill)Pulse HTTP /hooks/skill-guardFail-open
PreToolUse (Agent)Pulse HTTP /hooks/agent-guardFail-open
PostToolUse (WebFetch, WebSearch)ContentScanner.hook.tsAdvisory injection warning
PermissionRequest (Write|Edit|MultiEdit|Bash)SmartApprover.hook.tsAuto-approve trusted/read, ask for write
UserPromptSubmitPromptGuard.hook.tsPromptInspector: injection, exfiltration, evasion
ConfigChangeConfigAudit.hook.tsLogs settings.json changes

File Map

Hook Entry Points (~/.claude/hooks/)

FileEventPurpose
SecurityPipeline.hook.tsPreToolUseRuns inspector pipeline
ContentScanner.hook.tsPostToolUseInjection scanning
SmartApprover.hook.tsPermissionRequestSmart permission decisions
PromptGuard.hook.tsUserPromptSubmitPromptInspector — prompt security

Pipeline Core (~/.claude/hooks/security/)

FilePurpose
types.tsInspectionResult, Inspector interface, SecurityEvent
pipeline.tsInspectorPipeline orchestrator
logger.tsUnified security event logging

Inspectors (~/.claude/hooks/security/inspectors/)

FilePriorityPurpose
PatternInspector.ts100Pattern-based command/path validation
PromptInspector.ts95User prompt injection/exfiltration/evasion
EgressInspector.ts90Outbound data monitoring
InjectionInspector.ts80Tool output injection detection
RulesInspector.ts50User-written security rules via LLM (DISABLED — empty rules)

Policy Files (~/.claude/PAI/USER/SECURITY/)

FilePurpose
PATTERNS.yamlBlock/alert patterns and path protection tiers
SECURITY_RULES.mdUser-written natural language BLOCK/ALLOW rules
permission-cache.yamlSmartApprover cached classification decisions

Observatory Dashboard

The security page at http://localhost:31337/security provides a visual interface for managing the security system.

Tabs:

  • Policy — Edit PATTERNS.yaml: blocked/alert/trusted commands, path protection tiers, security rules preview, injection defense patterns, PromptGuard categories
  • Rules — Edit SECURITY_RULES.md: full textarea editor with save button
  • Events — Recent security events from MEMORY/SECURITY/YYYY/MM/
  • Hooks — Hook health status with expandable descriptions

Architecture visual at the top shows the inspector pipeline flow, other hooks (ContentScanner, SmartApprover, PromptGuard), and file paths with edit buttons.

Deployment:

cd ~/.claude/PAI/PULSE/Observability && bun run build
launchctl stop com.pai.pulse && launchctl start com.pai.pulse
# Then Cmd+Shift+R in browser

Pulse serves the dashboard from Pulse/Observability/out (configured in PULSE.toml). Do NOT use kill to restart Pulse — launchd auto-restarts it with stale code. Always use launchctl stop/start.


Policy Control

the user is the ONLY entity that can modify security policy. All security files are readOnly.

WhatFileEffect
Command block/alert patternspatterns.yamlbash.blocked/alertImmediate
File path protectionspatterns.yamlpaths.*Immediate
Natural language rulesSECURITY_RULES.mdImmediate (cached per session)
AI behavioral rulesPAI_SYSTEM_PROMPT.mdNext session
Hook wiringsettings.jsonhooksImmediate

Audit Trail

All security events log to MEMORY/SECURITY/YYYY/MM/ with descriptive filenames:

  • security-block-* — denied operations
  • security-confirm-* — prompted operations
  • security-alert-* — logged but allowed
  • prompt-guard-* — UserPromptSubmit scan results

The MEMORY/SECURITY/** path is readOnly — the AI can create new logs but cannot modify existing ones.


Limitations

LimitationWhy
Bash can bypass file-tool path controlscat, echo > via Bash are not subject to path-tier enforcement for Read/Write tools
PostToolUse cannot blockContent is already in conversation; scanner can only warn
Multi-step attacks invisiblePipeline checks one tool call at a time
Pattern matching is incompleteRegex cannot cover all shell obfuscation
RulesInspector adds latency~1-2s per unique tool call when SECURITY_RULES.md exists
SmartApprover heuristic-onlyRead/write classification is pattern-based, not LLM-based
MCP tool gapsmcp__* wildcard doesn’t match plugin-sourced tools