Overview

PAI PRD Format Specification v2.1

Last synced: Apr 22, 2026

PAI PRD Format Specification v2.1

The PRD (Product Requirements Document) is the single source of truth for every Algorithm run. The AI writes all PRD content directly using Write/Edit tools. Hooks only read PRDs to sync state.

Frontmatter (YAML)

Eight required fields, two optional:

---
task: "8 word task description"           # What this work is
slug: YYYYMMDD-HHMMSS_kebab-task          # Unique ID, directory name
effort: standard                          # standard|extended|advanced|deep|comprehensive
effort_source: auto                       # auto|explicit — whether effort was auto-detected or set via /eN
phase: observe                            # observe|think|plan|build|execute|verify|learn|complete
progress: 0/8                             # checked criteria / total criteria
mode: interactive                         # interactive|loop|optimize
started: 2026-02-24T02:00:00Z            # Creation timestamp (ISO 8601)
updated: 2026-02-24T02:00:00Z            # Last modification timestamp (ISO 8601)
---

Optional field (added on rework/continuation):

iteration: 2                              # Incremented when revisiting a completed task

Optional fields (optimize mode — shared):

eval_mode: metric                          # metric|eval — which evaluation strategy
target_type: code                          # skill|prompt|agent|code|function — auto-detected
target_path: "train.py"                    # What we're optimizing (file or directory)
baseline: 0.9979                          # Current best score (updated on improvement)
experiment_count: 0                       # Total experiments run
max_experiments: null                     # Optional: stop after N experiments
time_budget: 300                          # Seconds per experiment (0 = unlimited)
sandbox_path: ""                          # Auto-populated: where sandbox copy lives

Optional fields (optimize mode — metric mode):

metric_name: "val_bpb"                    # Human-readable metric name
metric_command: "uv run train.py"         # Shell command that produces the metric
metric_direction: lower                   # lower|higher — which direction is "better"
metric_extract: "grep '^val_bpb:' run.log | cut -d' ' -f2"  # Extract metric from output
mutable_files: ["train.py"]              # Files the agent may modify
metric_target: null                       # Optional: stop when metric reaches this value

Optional fields (optimize mode — eval mode):

eval_criteria:                             # Binary yes/no eval questions (3-6)
  - "Does the output contain specific facts with sources?"
  - "Is the output structured with clear sections?"
  - "Does the output avoid generic filler content?"
test_inputs:                               # Representative inputs to test against (3-5)
  - "research AI trends"
  - "quick research on quantum computing"
  - "deep investigation of supply chain attacks"
runs_per_experiment: 3                    # How many times to run per experiment

Optional fields (tunable parameters — ideate, optimize, loop modes):

algorithm_config:
  preset: explore                          # Named preset (optional)
  focus: 0.25                              # Composite focus 0.0-1.0 (optional, ideate only)
  params:                                  # Resolved individual parameter values
    problemConnection: 0.28                # Ideation: problem connection strictness
    selectionPressure: 0.30                # Ideation: cull aggressiveness
    domainDiversity: 0.74                  # Ideation: source domain diversity
    phaseBalance: 0.33                     # Ideation: generative vs analytical phase balance
    ideaVolume: 31                         # Ideation: ideas per cycle
    mutationRate: 0.63                     # Ideation: evolution mutation intensity
    generativeTemperature: 0.74            # Ideation: DREAM/DAYDREAM wildness
    maxCycles: 4                           # Ideation: evolutionary cycles
    stepSize: 0.3                          # Optimize: mutation boldness
    regressionTolerance: 0.1               # Optimize: accept temporary regression
    earlyStopPatience: 3                   # Optimize: no-improvement patience
    maxIterations: 10                      # Optimize/Loop: hard iteration cap
    contextCarryover: 0.43                 # Cross-mode: history carried between cycles
    parallelAgents: 1                      # Cross-mode: agents per workstream
  locked_params: [parallelAgents]          # Params meta-learner cannot adjust
  user_overrides: []                       # Params user explicitly set (auto-locked)
  meta_learner_adjustments:                # History of meta-learner changes (ideate)
    - cycle: 2
      parameter: selectionPressure
      from: 0.30
      to: 0.45
      rationale: "Ideas converging too slowly"

Field Rules

  • task: Imperative mood, max 60 chars. Describes the deliverable, not the process.
  • slug: Format YYYYMMDD-HHMMSS_kebab-description. Used as directory name under MEMORY/WORK/.
  • effort: Determines ISC count range and time budget. See Algorithm for tier definitions.
  • phase: Updated at the START of each Algorithm phase. Set to complete when done.
  • progress: Format M/N where M = checked ISC criteria, N = total ISC criteria. Updated immediately when a criterion passes (don’t wait for VERIFY).
  • mode: interactive (single Algorithm run), loop (multiple iterations toward ideal state), or optimize (autonomous optimization loop). interactive and loop determine whether iteration tracking is active. optimize enables experiment tracking. When optimize mode is active, eval_mode determines whether measurement uses a shell command (metric) or LLM-as-judge (eval).
  • started: Set once at creation. Never modified.
  • updated: Set on every Edit/Write. Use current ISO 8601 timestamp.
  • iteration: Omitted on first run. Set to 2 on first continuation, incremented thereafter.
  • algorithm_config: Omitted for interactive mode. Written during OBSERVE when mode is ideate, optimize, or loop. Contains resolved parameter values (after preset → focus → overrides resolution). The params section always contains the RESOLVED values, not raw user input. Full parameter schema: ~/.claude/PAI/ALGORITHM/parameter-schema.md.

Body Sections

Five sections. Each appears only when populated — never create empty placeholder sections.

## Context

Written during OBSERVE. Captures:

  • What was explicitly requested and not requested
  • Why this task matters
  • Key constraints and dependencies
  • Risks and riskiest assumptions (merged here, no separate Risks section)

For Advanced+ effort, a ### Plan subsection may be added with technical approach details.

## Criteria

ISC (Ideal State Criteria) checkboxes. Written during OBSERVE, checked during EXECUTE/VERIFY.

- [ ] ISC-1: Criterion text (8-12 words, binary testable, state not action)
- [ ] ISC-2: Another criterion
- [ ] ISC-A-1: Anti: What must NOT happen

Rules:

  • Each criterion: 8-12 words, describes an end state (not an action)
  • Binary testable: either true or false, no judgment required
  • Atomic: one verifiable thing per criterion — no compound statements
  • Anti-criteria prefixed ISC-A-: things that must NOT be true
  • ID format: ISC-N for criteria, ISC-A-N for anti-criteria
  • Check (- [x]) immediately when satisfied — don’t batch at VERIFY
  • Update frontmatter progress on every check change

Atomicity — the Splitting Test (apply to every criterion):

  • Contains “and”/“with”/“including” joining two verifiable things? → split
  • Can part A pass while part B fails independently? → split
  • Contains “all”/“every”/“complete”? → enumerate what that means
  • Crosses domain boundaries (UI/API/data/logic)? → one per boundary

ISC Tags (v3.13.0+):

  • Category: [F] Functional, [S] Structural, [B] Behavioral, [N] Negative/Anti, [E] Edge/Prereq
  • Format: - [ ] ISC-N [category]: criterion text
  • Anti-criteria: - [ ] ISC-A-N [N]: Anti: what must NOT happen

Count enforcement: Total ISC must meet effort tier floor (Standard: 8, Extended: 16, Advanced: 24, Deep: 40, Comprehensive: 64). If below floor after first pass, decompose compound criteria until met.

## Experiments (optimize mode only)

Experiment results table. Written during EXECUTE in optimize mode. Shows the most recent 10 experiments plus a summary line.

| # | Hypothesis | Metric | Delta | Status | Duration |
|---|-----------|--------|-------|--------|----------|
| 1 | Reduce embedding dim 512→256 | 1.392 | -0.031 | kept | 45s |
| 2 | Add layer normalization | 1.401 | +0.009 | reverted | 62s |
| 3 | Switch to GELU activation | 1.378 | -0.014 | kept | 38s |

**Summary:** 23 experiments, 8 kept (35% hit rate). Baseline: 1.423 → Current: 1.312 (-7.8%)

A complete results.tsv is also maintained in the PRD directory for machine-parseable history.

Guard Rail Semantics (optimize mode)

In optimize mode, ISC criteria serve as guard rails — assertions that must remain true across ALL experiments, not convergence goals to check off:

- [x] ISC-1: Test suite passes after every kept change
- [x] ISC-2: No type errors in mutable files
- [x] ISC-A-1: Anti: No hardcoded values replacing computed values

Guard rails are checked every experiment cycle. A violation triggers automatic revert regardless of metric improvement. They start checked and must REMAIN checked. The progress field in optimize mode represents kept_experiments/total_experiments, not ISC completion.

## Decisions

Timestamped decision log. Written during any phase when non-obvious choices are made. Include dead ends — failed approaches prevent future sessions from re-exploring them.

- 2026-02-24 02:00: Chose X over Y because Z
- 2026-02-24 02:15: Rejected approach A due to performance concern
- 2026-02-24 02:30: ❌ DEAD END: Tried B — failed because C (don't retry)

## Verification

Evidence for each criterion. Written during VERIFY phase.

- ISC-1: Screenshot confirms layout renders correctly
- ISC-2: `bun test` passes, 14/14 tests green
- ISC-A-1: Confirmed no PII in output via grep

File Location

~/.claude/PAI/MEMORY/WORK/{slug}/PRD.md

Directory created with mkdir -p MEMORY/WORK/{slug}/ during OBSERVE.

Continuation / Rework

When a follow-up prompt continues the same task:

  1. AI detects recent PRD matching the task context
  2. Edit existing PRD: reset phase: observe, add/increment iteration, update updated
  3. Re-enter Algorithm phases as needed
  4. Phase history in work.json tracks re-entry (COMPLETE → OBSERVE)

When it’s a genuinely new task: create a new PRD with a new slug.

Sync Pipeline

PRD is read-only from hooks’ perspective:

  1. AI writes PRD via Write/Edit tools
  2. PRDSync hook fires on PostToolUse, reads frontmatter + criteria
  3. work.json updated with session state (keyed by slug)
  4. PRDSync pushes work.json to CF KV (sync:work_state) via REST API (5s timeout)
  5. KVSync hook also pushes work.json to KV at SessionStart and SessionEnd (redundant sync)
  6. Worker reads KV at /api/algorithm, transforms via transformWorkState()
  7. Dashboard polls /api/algorithm every 2 seconds

The AI is the sole writer. Hooks only read. work.json is derived state. KV is derived from work.json.

Design Rationale

This format is informed by research across Kiro (AWS), spec-kit (GitHub), OpenSpec, BMAD, Google Design Docs, Amazon 6-pagers, Shape Up pitches, and 48 production PAI PRDs.

Key design choices:

  • 8 fields, not 15: Only fields consumed by the sync pipeline. Dead fields waste tokens.
  • 4 sections, not 7: Risks merged into Context. Plan merged into Context. Changelog dropped (git serves this purpose).
  • Checkboxes over EARS/BDD: Simpler to parse, write, and verify. ISC pattern proven over 48 PRDs.
  • YAML frontmatter over JSON: Universal standard (Jekyll, Hugo, Astro, Kiro, spec-kit all use it).
  • Convention-based sections: Sections appear when needed, not as empty boilerplate.
  • Reference file pattern: This spec lives at ~/.claude/PAI/DOCUMENTATION/PrdFormat.md, not inline in CLAUDE.md. Saves ~2,500 tokens/response.