PAI PRD Format Specification v2.1
PAI PRD Format Specification v2.1
The PRD (Product Requirements Document) is the single source of truth for every Algorithm run. The AI writes all PRD content directly using Write/Edit tools. Hooks only read PRDs to sync state.
Frontmatter (YAML)
Eight required fields, two optional:
---
task: "8 word task description" # What this work is
slug: YYYYMMDD-HHMMSS_kebab-task # Unique ID, directory name
effort: standard # standard|extended|advanced|deep|comprehensive
effort_source: auto # auto|explicit — whether effort was auto-detected or set via /eN
phase: observe # observe|think|plan|build|execute|verify|learn|complete
progress: 0/8 # checked criteria / total criteria
mode: interactive # interactive|loop|optimize
started: 2026-02-24T02:00:00Z # Creation timestamp (ISO 8601)
updated: 2026-02-24T02:00:00Z # Last modification timestamp (ISO 8601)
---
Optional field (added on rework/continuation):
iteration: 2 # Incremented when revisiting a completed task
Optional fields (optimize mode — shared):
eval_mode: metric # metric|eval — which evaluation strategy
target_type: code # skill|prompt|agent|code|function — auto-detected
target_path: "train.py" # What we're optimizing (file or directory)
baseline: 0.9979 # Current best score (updated on improvement)
experiment_count: 0 # Total experiments run
max_experiments: null # Optional: stop after N experiments
time_budget: 300 # Seconds per experiment (0 = unlimited)
sandbox_path: "" # Auto-populated: where sandbox copy lives
Optional fields (optimize mode — metric mode):
metric_name: "val_bpb" # Human-readable metric name
metric_command: "uv run train.py" # Shell command that produces the metric
metric_direction: lower # lower|higher — which direction is "better"
metric_extract: "grep '^val_bpb:' run.log | cut -d' ' -f2" # Extract metric from output
mutable_files: ["train.py"] # Files the agent may modify
metric_target: null # Optional: stop when metric reaches this value
Optional fields (optimize mode — eval mode):
eval_criteria: # Binary yes/no eval questions (3-6)
- "Does the output contain specific facts with sources?"
- "Is the output structured with clear sections?"
- "Does the output avoid generic filler content?"
test_inputs: # Representative inputs to test against (3-5)
- "research AI trends"
- "quick research on quantum computing"
- "deep investigation of supply chain attacks"
runs_per_experiment: 3 # How many times to run per experiment
Optional fields (tunable parameters — ideate, optimize, loop modes):
algorithm_config:
preset: explore # Named preset (optional)
focus: 0.25 # Composite focus 0.0-1.0 (optional, ideate only)
params: # Resolved individual parameter values
problemConnection: 0.28 # Ideation: problem connection strictness
selectionPressure: 0.30 # Ideation: cull aggressiveness
domainDiversity: 0.74 # Ideation: source domain diversity
phaseBalance: 0.33 # Ideation: generative vs analytical phase balance
ideaVolume: 31 # Ideation: ideas per cycle
mutationRate: 0.63 # Ideation: evolution mutation intensity
generativeTemperature: 0.74 # Ideation: DREAM/DAYDREAM wildness
maxCycles: 4 # Ideation: evolutionary cycles
stepSize: 0.3 # Optimize: mutation boldness
regressionTolerance: 0.1 # Optimize: accept temporary regression
earlyStopPatience: 3 # Optimize: no-improvement patience
maxIterations: 10 # Optimize/Loop: hard iteration cap
contextCarryover: 0.43 # Cross-mode: history carried between cycles
parallelAgents: 1 # Cross-mode: agents per workstream
locked_params: [parallelAgents] # Params meta-learner cannot adjust
user_overrides: [] # Params user explicitly set (auto-locked)
meta_learner_adjustments: # History of meta-learner changes (ideate)
- cycle: 2
parameter: selectionPressure
from: 0.30
to: 0.45
rationale: "Ideas converging too slowly"
Field Rules
task: Imperative mood, max 60 chars. Describes the deliverable, not the process.slug: FormatYYYYMMDD-HHMMSS_kebab-description. Used as directory name underMEMORY/WORK/.effort: Determines ISC count range and time budget. See Algorithm for tier definitions.phase: Updated at the START of each Algorithm phase. Set tocompletewhen done.progress: FormatM/Nwhere M = checked ISC criteria, N = total ISC criteria. Updated immediately when a criterion passes (don’t wait for VERIFY).mode:interactive(single Algorithm run),loop(multiple iterations toward ideal state), oroptimize(autonomous optimization loop).interactiveandloopdetermine whetheriterationtracking is active.optimizeenables experiment tracking. When optimize mode is active,eval_modedetermines whether measurement uses a shell command (metric) or LLM-as-judge (eval).started: Set once at creation. Never modified.updated: Set on every Edit/Write. Use current ISO 8601 timestamp.iteration: Omitted on first run. Set to2on first continuation, incremented thereafter.algorithm_config: Omitted for interactive mode. Written during OBSERVE when mode is ideate, optimize, or loop. Contains resolved parameter values (after preset → focus → overrides resolution). Theparamssection always contains the RESOLVED values, not raw user input. Full parameter schema:~/.claude/PAI/ALGORITHM/parameter-schema.md.
Body Sections
Five sections. Each appears only when populated — never create empty placeholder sections.
## Context
Written during OBSERVE. Captures:
- What was explicitly requested and not requested
- Why this task matters
- Key constraints and dependencies
- Risks and riskiest assumptions (merged here, no separate Risks section)
For Advanced+ effort, a ### Plan subsection may be added with technical approach details.
## Criteria
ISC (Ideal State Criteria) checkboxes. Written during OBSERVE, checked during EXECUTE/VERIFY.
- [ ] ISC-1: Criterion text (8-12 words, binary testable, state not action)
- [ ] ISC-2: Another criterion
- [ ] ISC-A-1: Anti: What must NOT happen
Rules:
- Each criterion: 8-12 words, describes an end state (not an action)
- Binary testable: either true or false, no judgment required
- Atomic: one verifiable thing per criterion — no compound statements
- Anti-criteria prefixed
ISC-A-: things that must NOT be true - ID format:
ISC-Nfor criteria,ISC-A-Nfor anti-criteria - Check (
- [x]) immediately when satisfied — don’t batch at VERIFY - Update frontmatter
progresson every check change
Atomicity — the Splitting Test (apply to every criterion):
- Contains “and”/“with”/“including” joining two verifiable things? → split
- Can part A pass while part B fails independently? → split
- Contains “all”/“every”/“complete”? → enumerate what that means
- Crosses domain boundaries (UI/API/data/logic)? → one per boundary
ISC Tags (v3.13.0+):
- Category:
[F]Functional,[S]Structural,[B]Behavioral,[N]Negative/Anti,[E]Edge/Prereq - Format:
- [ ] ISC-N [category]: criterion text - Anti-criteria:
- [ ] ISC-A-N [N]: Anti: what must NOT happen
Count enforcement: Total ISC must meet effort tier floor (Standard: 8, Extended: 16, Advanced: 24, Deep: 40, Comprehensive: 64). If below floor after first pass, decompose compound criteria until met.
## Experiments (optimize mode only)
Experiment results table. Written during EXECUTE in optimize mode. Shows the most recent 10 experiments plus a summary line.
| # | Hypothesis | Metric | Delta | Status | Duration |
|---|-----------|--------|-------|--------|----------|
| 1 | Reduce embedding dim 512→256 | 1.392 | -0.031 | kept | 45s |
| 2 | Add layer normalization | 1.401 | +0.009 | reverted | 62s |
| 3 | Switch to GELU activation | 1.378 | -0.014 | kept | 38s |
**Summary:** 23 experiments, 8 kept (35% hit rate). Baseline: 1.423 → Current: 1.312 (-7.8%)
A complete results.tsv is also maintained in the PRD directory for machine-parseable history.
Guard Rail Semantics (optimize mode)
In optimize mode, ISC criteria serve as guard rails — assertions that must remain true across ALL experiments, not convergence goals to check off:
- [x] ISC-1: Test suite passes after every kept change
- [x] ISC-2: No type errors in mutable files
- [x] ISC-A-1: Anti: No hardcoded values replacing computed values
Guard rails are checked every experiment cycle. A violation triggers automatic revert regardless of metric improvement. They start checked and must REMAIN checked. The progress field in optimize mode represents kept_experiments/total_experiments, not ISC completion.
## Decisions
Timestamped decision log. Written during any phase when non-obvious choices are made. Include dead ends — failed approaches prevent future sessions from re-exploring them.
- 2026-02-24 02:00: Chose X over Y because Z
- 2026-02-24 02:15: Rejected approach A due to performance concern
- 2026-02-24 02:30: ❌ DEAD END: Tried B — failed because C (don't retry)
## Verification
Evidence for each criterion. Written during VERIFY phase.
- ISC-1: Screenshot confirms layout renders correctly
- ISC-2: `bun test` passes, 14/14 tests green
- ISC-A-1: Confirmed no PII in output via grep
File Location
~/.claude/PAI/MEMORY/WORK/{slug}/PRD.md
Directory created with mkdir -p MEMORY/WORK/{slug}/ during OBSERVE.
Continuation / Rework
When a follow-up prompt continues the same task:
- AI detects recent PRD matching the task context
- Edit existing PRD: reset
phase: observe, add/incrementiteration, updateupdated - Re-enter Algorithm phases as needed
- Phase history in work.json tracks re-entry (COMPLETE → OBSERVE)
When it’s a genuinely new task: create a new PRD with a new slug.
Sync Pipeline
PRD is read-only from hooks’ perspective:
- AI writes PRD via Write/Edit tools
- PRDSync hook fires on PostToolUse, reads frontmatter + criteria
- work.json updated with session state (keyed by slug)
- PRDSync pushes work.json to CF KV (
sync:work_state) via REST API (5s timeout) - KVSync hook also pushes work.json to KV at SessionStart and SessionEnd (redundant sync)
- Worker reads KV at
/api/algorithm, transforms viatransformWorkState() - Dashboard polls
/api/algorithmevery 2 seconds
The AI is the sole writer. Hooks only read. work.json is derived state. KV is derived from work.json.
Design Rationale
This format is informed by research across Kiro (AWS), spec-kit (GitHub), OpenSpec, BMAD, Google Design Docs, Amazon 6-pagers, Shape Up pitches, and 48 production PAI PRDs.
Key design choices:
- 8 fields, not 15: Only fields consumed by the sync pipeline. Dead fields waste tokens.
- 4 sections, not 7: Risks merged into Context. Plan merged into Context. Changelog dropped (git serves this purpose).
- Checkboxes over EARS/BDD: Simpler to parse, write, and verify. ISC pattern proven over 48 PRDs.
- YAML frontmatter over JSON: Universal standard (Jekyll, Hugo, Astro, Kiro, spec-kit all use it).
- Convention-based sections: Sections appear when needed, not as empty boilerplate.
- Reference file pattern: This spec lives at
~/.claude/PAI/DOCUMENTATION/PrdFormat.md, not inline in CLAUDE.md. Saves ~2,500 tokens/response.