Feed

The Feed System

Last synced: Apr 22, 2026

The Feed System

Turning information streams into routed intelligence.

The Feed System is the sensor layer of the PAI/ARBOL architecture. It monitors content sources, processes everything through an AI intelligence pipeline, and routes actionable items to the right destinations at the right priority.

This is not an RSS reader. It’s an intelligence routing engine.


The Vision

Raw information is noise. Intelligence is information that has been evaluated, prioritized, and delivered to the right place at the right time.

The Feed System implements this transformation:

NOISE (thousands of items/day from hundreds of sources)


INTELLIGENCE (rated, labeled, priority-routed to specific destinations)

The key insight: Different content deserves different treatment. A trusted security researcher posting about a national security issue with high urgency should trigger Telegram + Discord + email immediately. A mediocre blog post about a topic you’ve seen before should archive silently. The Feed System makes these routing decisions automatically using multi-dimensional ratings and configurable rules.


Architecture Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                          FEED SYSTEM                                     │
│                                                                         │
│  SOURCES           PROCESSING              ROUTING         DESTINATIONS │
│  ────────          ──────────              ───────         ──────────── │
│                                                                         │
│  People       ┌─► INGEST ──────────┐                                   │
│  Channels     │   (fetch, parse,    │                                   │
│  Feeds        │    normalize)       │   ┌─► ROUTE ──┬─► Telegram       │
│  Publications │                     │   │  (rules,  │                   │
│               │   SUMMARIZE ────────┤   │   AND     ├─► Discord        │
│  RSS          │   (Haiku: short +   │   │   logic,  │                   │
│  YouTube      │    medium)          │   │   priority)├─► Email         │
│  Twitter/X    │                     │   │           │                   │
│  Bluesky      │   RATE ─────────────┘   │           ├─► Blog Draft     │
│  LinkedIn     │   (5 dimensions +       │           │                   │
│  Mastodon     │    20 labels)    ───────┘           ├─► Social Post    │
│  Blogs        │                                     │                   │
│  Newsletters  │                                     ├─► Daily Digest   │
│  Podcasts     │                                     │                   │
│               │                                     └─► Archive        │
│               │                                                         │
│  Each source has:                                                       │
│  • credibility/priority                                                 │
│  • poll interval                                                        │
│  • compute type (cloud/local)                                           │
│  • tags + expertise                                                     │
└─────────────────────────────────────────────────────────────────────────┘

The Intelligence Pipeline

Every piece of content flows through four stages:

Stage 1: Ingest

Fetch content from the source, parse it, normalize to a standard format.

Source TypeMethodCompute
RSS/Atom feedsHTTP fetch + XML parseCloud (Workers)
YouTubeyt-dlp transcript extractionLocal (PAI daemon)
Twitter/XAPI or scrapingCloud
PodcastsDownload + Whisper transcriptionLocal
BlogsHTTP fetch + HTML extractionCloud

Output: Normalized item with title, content, url, source_id, source_type.

Content resolution order (fallback chain when extracting article text):

  1. A_EXTRACT_ARTICLE — full HTML extraction from the article URL
  2. content_encoded — RSS <content:encoded> field (Substack, WordPress, many blogs provide full text here)
  3. description — RSS <description> field (summary/snippet only)

Bare URLs are never included in content sent to AI. Items with less than 200 characters of total content are skipped entirely to avoid wasting LLM calls on insufficient input.

Stage 2: Summarize

AI generates two summary levels:

SummaryPurpose
summary_shortOne sentence. For notifications and digests.
summary_mediumOne paragraph. For email and dashboard display.

Model: Claude Haiku (fast tier). Information-dense, preserves facts/claims/conclusions.

Stage 3: Rate

Multi-dimensional AI evaluation:

DimensionScalePurpose
TierS / A / B / C / DOverall quality bracket
Quality Score1-100Granular quality within tier
Importance1-10How significant is this content?
Novelty1-10How new/unique is this information?
Urgency1-10How time-sensitive is this?

Plus 20 labels from a fixed taxonomy:

Security, AI, Technology, Business, Geopolitics, Science,
Culture, Health, Privacy, OSINT, Military, Innovation,
Leadership, Philosophy, Tutorial, Podcast, Newsletter,
Research, Policy, Breaking

Tier definitions:

TierMeaning
SGroundbreaking. Must act on immediately.
AExcellent. Must-read. High insight density.
BGood. Worth reading.
CAverage. Skim-worthy.
DLow value. Skip.

Stage 4: Route

Pure logic engine (no LLM) that evaluates rules against rated items.

Rule format:

{
  "name": "Critical security alerts",
  "conditions": {
    "tier": ["S", "A"],
    "urgency": { "gte": 8 },
    "labels": { "includes": ["Security"] }
  },
  "actions": ["notify"],
  "priority": "immediate"
}

Condition logic: All conditions use AND. An item must match ALL conditions in a rule to trigger its actions.

Numeric conditions: gte (>=), lte (<=), eq (==) Label conditions: includes (item has at least one matching label) Tier conditions: Array of acceptable tiers


Routing Rules

Rules are the core of the intelligence routing. They encode what matters and how to respond.

Rule Examples

RuleConditionsActionPriority
Critical securitytier S/A + urgency >= 8 + Security labelnotify (Telegram, Discord, email)immediate
High-quality AI contenttier S/A + AI label + quality >= 80blog-draft + social-postdaily
Breaking newsBreaking label + urgency >= 9notify (Telegram)immediate
Weekly digest materialtier B+ + importance >= 6digestweekly
Everything else(default)archivearchive

Priority Levels

PriorityMeaningDelivery
immediateAct nowPush notification: Telegram, Discord, email
dailyReview todayIncluded in daily digest/queue
weeklyReview this weekIncluded in weekly compilation
archiveStore for referenceNo active delivery

Destinations

DestinationActionImplementation
notifyPush alert to messaging platformsTelegram, Discord, Email via respective APIs
blog-draftCreate draft post on example.com_BLOGGING skill integration
social-postGenerate and queue social media post_SOCIALPOST skill + A_WRITE_TWITTER_POST, A_WRITE_LINKEDIN_POST
digestAccumulate for periodic compilationDaily/weekly digest builder
archiveStore without actionD1 + R2 storage only

Relationship to Arbol

The Feed System runs on the Arbol Cloudflare Workers platform. The feed actions are (or will be) deployed as Arbol Workers following the same patterns as all other Arbol infrastructure.

Current State → Target State

ComponentCurrentTarget
feed/ingestLocal PAI actionA_FEED_INGESTarbol-a-feed-ingest Worker
feed/summarizeLocal PAI actionA_FEED_SUMMARIZEarbol-a-feed-summarize Worker
feed/rateLocal PAI actionA_FEED_RATEarbol-a-feed-rate Worker
feed/routeLocal PAI actionA_FEED_ROUTEarbol-a-feed-route Worker
Feed APICloudflare Worker (feed-api)Stays as-is
Feed PollerDeployed as _F_FEEDS_POLLER (Arbol flow, cron */5)Circuit breaker: HTTP 200 fetch that parses to zero items now increments error_count instead of resetting it — catches silent dead weight. Tier fallback chain: direct → Jina (on 403) → self-hosted proxy (on Jina fail).
Feed ProcessorDeployed as _F_FEEDS_PROCESSOR (Arbol flow, queue consumer)Content-type routing: text → A_EXTRACT_ARTICLE, audio → A_TRANSCRIBE_AUDIO. Uses content_encoded fallback for text articles. 200-char min threshold.
Feed LR SurfaceDeployed as _F_FEEDS_LR_SURFACE (Arbol flow, cron-triggered)RSS/Atom → Extract → Rate → Surface. Parses <content:encoded> from RSS and Atom <content>. Same fallback chain and threshold as Processor.
YT Label EmailDeployed as _F_YT_LABEL_EMAIL (Arbol flow, cron */30)YouTube Data API path for sources where rss_url IS NULL. Quota guard: WHERE clause filters out any source with rss_url set so migrated sources don’t double-burn quota.
YT LR SurfaceDeployed as _F_YT_LR_SURFACE (Arbol flow, cron */20)YouTube Data API path, selects youtube_channel_id IS NOT NULL AND rss_url IS NULL. Reduced from */10 for quota budget.
Feed DispatcherNot yet deployedfeed-dispatcher Worker (Queue consumer)

How Feed Powers Arbol Workflows

The Feed System is the source layer for the Arbol platform. It generates the content that downstream actions, pipelines, and flows operate on:

Feed System (sources + intelligence)

    ├─► F_HN_LABEL_EMAIL (HN → rate → email)

    ├─► F_YOUTUBE_DIGEST (YouTube → transcribe → rate → digest)

    ├─► F_SECURITY_ALERTS (Security feeds → rate → notify if urgent)

    └─► F_SOCIAL_CONTENT (High-rated items → generate posts → queue)

Every flow in Arbol that processes external content starts with the Feed System. The intelligence pipeline (ingest → summarize → rate → route) is the common backbone. Flows just connect specific sources to specific pipelines on specific schedules.


Infrastructure

Cloud (Cloudflare)

ServicePurpose
D1Metadata database: sources, items, ratings, routing rules
R2Content storage: full text, transcripts, media
QueuesAsync processing: decouple ingest from processing
WorkersCompute: all actions, pipelines, flows, API
Cron TriggersScheduling: poll sources on configurable intervals

Local (PAI Daemon)

Some content types require tools unavailable in Workers:

ToolPurposeSource Types
yt-dlpYouTube transcript/video downloadYouTube
whisperAudio transcriptionPodcasts
ffmpegMedia processingVideo, Audio

The compute_type field on each source routes items to the correct processing environment:

  • cloud → Cloudflare Queue → Workers consumer
  • local → Cloudflare Queue → PAI daemon consumer

Data Model

4 tables:

TablePurposeKey Fields
feed_sourcesSource definitionsname, category, platform URLs, tags, priority, compute_type, poll_interval, reputation: rolling_item_count, rolling_avg_quality, rolling_a_rate, rolling_updated_at
feed_itemsProcessed contentsource_id, title, content, tier, quality_score, importance, novelty, urgency, labels, priority, status
feed_routing_rulesRule definitionsname, conditions (JSON), actions (JSON), priority, active
feed_processing_logExecution trackingitem_id, action, duration, tokens, cost

Item lifecycle: ingestedprocessingprocesseddispatched

Cluster tables (consumed by Surface’s UI): story_clusters (lead_item_id FK to feed_items, item_count, source_diversity, momentum) + cluster_items (cluster_id, item_id, is_lead). Deleting a feed_item requires either promoting a new lead_item_id on parent clusters or deleting orphan clusters first — FK is enforced by D1.

Cost Model

ComponentMonthly
Cloudflare Workers/D1/R2~$5
Summarize (Cloudflare AI)~$0.60
Rate (Claude Haiku)~$6
Weekly deep analysis (Claude Sonnet)~$2
Social API accessVaries
Total~$15 + API costs

Source Management

Sources represent the information streams being monitored. Each source has:

FieldPurpose
nameSource identity (person, publication, channel)
categoryperson, publication, channel, feed
prioritycritical, high, normal, low — affects routing weight
expertiseFree text describing the source’s domain knowledge
tagsTopic tags for categorization
Platform URLsRSS, YouTube, Twitter, Bluesky, LinkedIn, Mastodon, blog, newsletter, website
poll_interval_minutesHow often to check (default: 60)
compute_typecloud or local — determines processing environment

Sources are managed through the Feed API and visible in the admin dashboard at admin.example.com/feed.


Source Reputation System

A self-tuning quality layer. Every active source carries rolling 7-day metrics on feed_sources:

ColumnMeaning
rolling_item_countItems ingested in the last 7 days
rolling_avg_qualityMean quality_score across those items
rolling_a_rateFraction of items with quality_score >= 70 (A-tier)
rolling_updated_atTimestamp of last metric refresh

Refresh query (safe to run daily, idempotent):

UPDATE feed_sources SET
  rolling_item_count = (SELECT COUNT(*) FROM feed_items fi WHERE fi.source_id = feed_sources.id AND fi.ingested_at > datetime('now','-7 days')),
  rolling_avg_quality = (SELECT ROUND(AVG(fi.quality_score),1) FROM feed_items fi WHERE fi.source_id = feed_sources.id AND fi.quality_score IS NOT NULL AND fi.ingested_at > datetime('now','-7 days')),
  rolling_a_rate = (SELECT CASE WHEN COUNT(*) = 0 THEN 0 ELSE ROUND(SUM(CASE WHEN fi.quality_score >= 70 THEN 1 ELSE 0 END) * 1.0 / COUNT(*), 3) END FROM feed_items fi WHERE fi.source_id = feed_sources.id AND fi.quality_score IS NOT NULL AND fi.ingested_at > datetime('now','-7 days')),
  rolling_updated_at = datetime('now')
WHERE active = 1;

Auto-demotion query (safe for a weekly cron — deactivates chronic underperformers):

UPDATE feed_sources SET active = 0,
  last_error = COALESCE(last_error,'') || ' | auto-demoted: avg_q<35 with 50+ items, 0 A-tier'
WHERE active = 1
  AND rolling_item_count >= 50
  AND rolling_avg_quality < 35
  AND rolling_a_rate = 0;

Operational intent: the reputation layer lets downstream consumers (labeling pipeline, Surface UI, dashboards) treat sources by proven quality rather than by metadata alone. Combined with the circuit breaker on the poller, the system is now self-cleaning: new sources earn their reputation by producing items; chronic low-quality producers get auto-demoted; chronic silent failures trip the circuit breaker.


YouTube Ingestion Routing

YouTube sources are split across three workers based on rss_url presence:

rss_url statePollerMechanism
rss_url IS NOT NULL_F_FEEDS_POLLERRSS Atom feed via https://www.youtube.com/feeds/videos.xml?channel_id=UC... — no API quota cost
rss_url IS NULL_F_YT_LR_SURFACEYouTube Data API v3 (counts against 10K daily quota)
rss_url IS NULL_F_YT_LABEL_EMAILYouTube Data API (separate cron */30 for email pipeline); filters rss_url IS NULL OR rss_url = '' to avoid double-burn on migrated sources

Migration recipe (to move a YouTube source from API to RSS):

UPDATE feed_sources
SET rss_url = 'https://www.youtube.com/feeds/videos.xml?channel_id=' || youtube_channel_id,
    error_count = 0, last_error = NULL
WHERE active = 1
  AND source_type IN ('youtube','youtube_channel')
  AND youtube_channel_id IS NOT NULL
  AND youtube_channel_id LIKE 'UC%'
  AND LENGTH(youtube_channel_id) = 24;

Verified: YouTube’s RSS Atom endpoint returns HTTP 200 from Cloudflare Workers IPs (a previous migration was reverted on the assumption that YouTube 404’d CF Workers; that is no longer true).


The Bigger Picture

The Feed System transforms PAI from a reactive assistant into a proactive intelligence network. Instead of waiting for questions, it:

  1. Monitors — continuously polls sources across platforms
  2. Evaluates — AI rates everything on 5 dimensions with 20 labels
  3. Routes — configurable rules determine what deserves attention
  4. Delivers — right content reaches the right destination at the right priority
  5. Powers — downstream Arbol workflows (social posts, blog drafts, digests) consume feed intelligence

The goal: never miss important content, never be overwhelmed by noise.

Knowledge Archive Integration

High-value feed items are harvested into the PAI Knowledge Archive (MEMORY/KNOWLEDGE/, 3 entity types: People, Companies, Ideas) by the KnowledgeHarvester or captured directly by the Algorithm LEARN phase. This closes the loop: the Feed System surfaces intelligence, and the Knowledge Archive preserves it for long-term recall across sessions.


See Also

  • _FEED/SKILL.md — Operational reference: API endpoints, workflows, schema
  • ARBOLSYSTEM.md — Arbol cloud execution: actions, pipelines, flows (consolidated reference)
  • ~/.claude/PAI/USER/ARBOL/ — Cloudflare Workers implementation
  • ~/.claude/PAI/DOCUMENTATION/PAISystemArchitecture.md — Master PAI architecture reference

Recent updates: Source Reputation System, YouTube Ingestion Routing, and circuit-breaker tightening on _F_FEEDS_POLLER.