Observability

The Observability System

Last synced: Apr 22, 2026

The Observability System

Single-source, multi-destination event pipeline for PAI tool activity, voice events, subagent lifecycle, and tool failures.

Infrastructure: The observability HTTP server (localhost:31337) runs as a module inside the unified Pulse daemon (~/.claude/PAI/PULSE/Observability/observability.ts). There is no separate observability server process — Pulse serves all local HTTP endpoints on port 31337.

Architecture

JSONL Sources (local disk)          settings.json
  ├─ tool-activity.jsonl (100)   ──→  observability.targets[]
  ├─ tool-failures.jsonl (50)         ├─ { type: "cloudflare-kv", name: "production" }
  ├─ voice-events.jsonl (50)          ├─ { type: "http", name: "local", url: "..." }
  └─ subagent-events.jsonl (50)       └─ ... (0-N targets)
          │                                    │
          ▼                                    ▼
   collectEvents()  ───────────────→  pushEventsToTargets()
   (observability-transport.ts)       (fan-out to all targets)
          │                                    │
          ▼                                    ▼
   Pulse (Observability/observability.ts) CF KV (sync:events)
   localhost:31337                 └─→ Worker /api/events/recent
   └─→ /api/events/recent              └─→ admin.example.com

Data Flow

  1. Emitters — PostToolUse hooks write structured JSONL to MEMORY/OBSERVABILITY/
  2. CollectioncollectEvents() reads last N lines per source, merges, sorts newest-first, caps at 200
  3. TransportpushEventsToTargets() fans out to all configured targets in parallel
  4. Display — Frontend polls /api/events/recent every 3s on both local and remote dashboards

settings.json Configuration

The observability section in ~/.claude/settings.json controls where events are pushed:

{
  "observability": {
    "targets": [
      {
        "type": "cloudflare-kv",
        "name": "production",
        "url": "https://admin.example.com"
      },
      {
        "type": "http",
        "name": "local",
        "url": "http://localhost:31337"
      }
    ],
    "server": {
      "port": 31337,
      "enabled": true
    }
  }
}

Target Types

TypeTransportAuthUse Case
cloudflare-kvCF KV API PUT to namespaceCLOUDFLARE_API_TOKEN_WORKERS_EDIT or CLOUDFLARE_API_TOKEN from ~/.claude/PAI/.env (tried in that order; env var wins over .env)Production dashboards on Cloudflare Workers
httpPOST to {url}/api/observability/eventsOptional headers fieldLocal dev server, other HTTP receivers

Target Schema

interface ObservabilityTarget {
  name: string;                        // Human label (e.g. "production", "local")
  type: 'http' | 'cloudflare-kv';      // Transport mechanism
  url?: string;                        // Base URL (required for http, optional for cloudflare-kv)
  headers?: Record<string, string>;    // Optional headers for http targets
}

Adding a New Target

Add an entry to observability.targets[] in settings.json. The transport module picks it up on next hook execution. No code changes needed.

Example — adding a staging environment:

{
  "type": "http",
  "name": "staging",
  "url": "https://staging.example.com",
  "headers": { "Authorization": "Bearer ${STAGING_TOKEN}" }
}

Event Sources

SourceJSONL PathPer-Source CountHook
Tool activityMEMORY/OBSERVABILITY/tool-activity.jsonl100ToolActivityTracker.hook.ts (PostToolUse, catch-all)
Tool failuresMEMORY/OBSERVABILITY/tool-failures.jsonl50ToolFailureTracker.hook.ts (PostToolUseFailure)
Voice eventsMEMORY/VOICE/voice-events.jsonl50Voice notification server
Subagent eventsMEMORY/OBSERVABILITY/subagent-events.jsonl50SubagentTracker.hook.ts (SubagentStart/Stop)
Agent watchdogstdout (Monitor notifications)Tools/AgentWatchdog.ts via Monitor tool. Reads tool-activity.jsonl + subagent-starts.json; alerts on 90s silence with active agents. Auto-triggered by Pulse agent-guard hook on background agent spawn.

Per-source counts match between Pulse/Observability/observability.ts (local) and observability-transport.ts (KV push) to ensure identical data on all destinations.

Event Format

All events conform to the PAIEvent interface:

interface PAIEvent {
  timestamp: string;     // ISO-8601 with timezone
  session_id: string;    // Claude Code session ID
  source: string;        // "tool-activity" | "tool-failure" | "voice" | "subagent"
  type: string;          // Event type (e.g. "tool_use", "voice_start", "subagent_start")
  [key: string]: unknown; // Additional fields per source
}

Push Timing

TriggerWhat Gets PushedLatency
Every tool callEvents (via ToolActivityTracker async hook)~200ms (JSONL read + KV PUT)
Session start/endEvents + work state (via KVSync hook)~500ms
PRD write/editWork state only (via PRDSync hook)~300ms

Key Files

FileRole
~/.claude/hooks/lib/observability-transport.tscollectEvents() + pushEventsToTargets() — the core pipeline
~/.claude/hooks/lib/identity.tsObservabilityTarget type + getObservabilityConfig() — reads settings.json
~/.claude/hooks/ToolActivityTracker.hook.tsPostToolUse catch-all — writes JSONL + triggers KV push
~/.claude/hooks/KVSync.hook.tsSessionEnd — batch pushes events + state to all targets
~/.claude/PAI/PULSE/Observability/observability.tsObservability module inside unified Pulse daemon — serves events from JSONL at :31337
~/.claude/PAI/PULSE/Observability/Next.js static dashboard — polls /api/events/recent
~/.claude/settings.json → observabilityTarget configuration — add/remove destinations here

Dashboard Locations

DestinationURLData Source
PAI Observatorylocalhost:31337/agents → Actions tabLocal JSONL via Pulse Observability/observability.ts
ULAdminadmin.example.com/agents → Actions tabCF KV sync:events via Worker /api/events/recent

Observatory Dashboard

The PAI Observatory is the local observability UI — a Next.js 15.5 static export served by Pulse on localhost:31337.

Project Layout

ItemValue
Source~/.claude/PAI/PULSE/Observability/
Build commandcd ~/.claude/PAI/Observability && bun run build (outputs to out/)
Serving mechanismDirect: ~/.claude/PAI/PULSE/Observability/out (configured in PULSE.toml dashboard_dir)
URLhttp://localhost:31337/ (served by Pulse observability module)
Process managementPulse runs under launchd (com.pai.pulse) with auto-restart. Always use launchctl stop/start com.pai.pulse — never kill.

Dashboard Pages

PageURLPurpose
Agents/agents (default)Work dashboard — iterations, optimize, ideate, loops
Knowledge/knowledgeKnowledge archive browser
Security/securitySecurity system management — patterns, rules, events, hooks
Ladder/ladderImprovement ladder tracking
Novelty/noveltyNovelty detection dashboard

Security Page (/security)

The security page provides full management of the PAI security system through four tabs:

TabFunction
PolicyEdit PATTERNS.yaml — blocked/alert/trusted commands, path protection tiers
RulesEdit SECURITY_RULES.md — natural language BLOCK/ALLOW rules, currently disabled (saved via POST /api/security/rules)
EventsRecent security events from MEMORY/SECURITY/YYYY/MM/
HooksHook health status with expandable descriptions

Additional features:

  • Architecture visual — Inspector pipeline flow diagram displayed at top of page
  • Injection defense — Shows InjectionInspector patterns and PromptInspector categories (injection, exfiltration, evasion, security_disable)
  • Live editing — All changes write directly to disk and take effect on next tool call

API Reference (all served by Pulse on localhost:31337)

All endpoints served by the Pulse daemon’s observability module (Observability/observability.ts) unless noted.

Core Observability

EndpointMethodPurposeSource
/healthGETPulse daemon health checkpulse.ts
/api/observability/stateGETCurrent session state (PRD, phase, progress)observability
/api/observability/statePOSTPush session state from hooksobservability
/api/observability/eventsGETRaw event dataobservability
/api/observability/eventsPOSTPush events from hooksobservability
/api/events/recentGETMerged recent events across all sourcesobservability
/api/observability/voice-eventsGETVoice event logobservability
/api/observability/tool-failuresGETTool failure logobservability

Algorithm & Sessions

EndpointMethodPurposeSource
/api/algorithmGETWork sessions — PRD metadata, ISC progress, phase historyobservability
/api/agentsGETSubagent events — start/stop/duration from JSONLobservability
/api/noveltyGETLearning signals, ratings, failure patternsobservability
/api/ladderGETImprovement pipeline dataobservability

Security

EndpointMethodPurposeSource
/api/securityGETCombined: PATTERNS.yaml + SECURITY_RULES.md + events + hooks + PromptInspector patternsobservability
/api/security/patternsPOSTMutate PATTERNS.yaml (add/remove/edit patterns and paths)observability
/api/security/rulesPOSTSave SECURITY_RULES.md contentobservability
/api/security/hooks-detailGETHook descriptions, events, blocking capabilityobservability

Knowledge

EndpointMethodPurposeSource
/api/knowledgeGETKnowledge archive — domains, notes, MOC dataobservability
/api/knowledge/:domain/:slugGETIndividual knowledge note contentobservability
/api/knowledge/:domain/:slugPUTUpdate knowledge noteobservability

Wiki (PAI system docs + knowledge browser)

EndpointMethodPurposeSource
/api/wikiGETSystem doc indexmodules/wiki.ts
/api/wiki/searchGETFull-text search across system docsmodules/wiki.ts
/api/wiki/graphGETKnowledge graph data for visualizationmodules/wiki.ts

DA (Digital Assistant)

EndpointMethodPurposeSource
/assistant/healthGETDA subsystem healthAssistant/module.ts
/assistant/identityGETCurrent DA identity summaryAssistant/module.ts
/assistant/personalityGETDA personality traitsAssistant/module.ts
/assistant/personality/traitsPATCHUpdate personality traitsAssistant/module.ts
/assistant/avatarGETDA avatar imageAssistant/module.ts
/assistant/tasksGETUnified task view (DA + Pulse cron + CC triggers)Assistant/module.ts
/assistant/tasksPOSTCreate DA scheduled taskAssistant/module.ts
/assistant/tasks/:idDELETECancel DA taskAssistant/module.ts
/assistant/diaryGETRecent diary entriesAssistant/module.ts
/assistant/opinionsGETCurrent DA opinionsAssistant/module.ts

Voice & Notifications

EndpointMethodPurposeSource
/notifyPOSTSend TTS notification via ElevenLabspulse.ts
/notify/personalityPOSTPersonality-aware notificationpulse.ts
/voiceGETVoice statuspulse.ts

Hook Validation

EndpointMethodPurposeSource
/hooks/skill-guardPOSTValidate Skill tool calls (PreToolUse HTTP hook)modules/hooks.ts
/hooks/agent-guardPOSTValidate Agent tool calls (PreToolUse HTTP hook)modules/hooks.ts

Stubs (reserved, not yet implemented)

EndpointMethodPurpose
/api/loopsGETLoop system index
/api/loops/controlGET/POSTLoop control
/api/loops/startPOSTStart a loop

Deployment Checklist

  1. Edit source in ~/.claude/PAI/PULSE/Observability/src/
  2. Build: cd ~/.claude/PAI/PULSE/Observability && bun run build
  3. Restart Pulse: launchctl stop com.pai.pulse && launchctl start com.pai.pulse
  4. Hard refresh browser: Cmd+Shift+R

Session State Tracking

Distinct from the event pipeline above, session state (active sessions, phase, progress, criteria, ratings) flows through a single canonical file. Both the Pulse dashboard and the ULAdmin /agents page read the same file so they never drift.

Canonical source: $PAI_DIR/MEMORY/STATE/work.json

Writers (atomic read-modify-write via prd-utils.ts:writeRegistry)
├─ SessionAnalysis.hook.ts      UserPromptSubmit → upsertSession (native or starting)
├─ ToolActivityTracker.hook.ts  PostToolUse → bumpLastToolActivity (30s debounced)
├─ PRDSync.hook.ts              syncToWorkJson() → promote native entry to full PRD session
└─ PRDAutoName.hook.ts          updateSessionNameInWorkJson()

Readers (both use identical mapping)
├─ Pulse Observability          localhost:31337 → observability.ts handleAlgorithmApi
└─ ULAdmin daemon               localhost:4000  → server/src/algorithm-watcher.ts

Display lanes:

  • Mode starting → Algorithm tab, phase strip (OBSERVE/THINK/PLAN/BUILD/EXECUTE/VERIFY/LEARN).
  • Mode native → Native tab, no phase strip.

Classifier: SessionAnalysis.hook.ts:ALGO_ACTION_RE — narrow 8-verb regex (implement|build|create|architect|design|migrate|deploy|refactor). Everything else that passes the trivia filter (POSITIVE_PRAISE_WORDS, SYSTEM_TEXT_PATTERNS, MIN_PROMPT_LENGTH=3) is native. Do not broaden — see feedback_state_monitoring_requires_starting_gate.md.

Staleness thresholds: 5 min native, 10 min algorithm. Matched in both readers.

Loud-fail: algorithm-watcher.ts emits console.error on missing work.json at startup; /api/algorithm returns HTTP 503 with the resolved path. ToolActivityTracker.hook.ts logs exceptions via console.error so a silently-broken tracker shows up in session logs.

Self-healing: Both readers use Math.max(updatedAt, lastToolActivity) for the activity signal, so a fresh user prompt revives a stale session even if the tool-activity tracker is down.

See Also

  • ~/.claude/PAI/DOCUMENTATION/PAISystemArchitecture.md — Master PAI architecture reference