Find
Search dashboard pages and product views.
Search dashboard pages and product views.
paperboytm/spool
Add the ability for this repository's agent to fetch and extract clean, readable content from the web pages/URLs it encounters, so it can use that content in its workflow. Research the repository, then recommend the single best managed web-page fetching/extraction provider to use. This product cannot tolerate hallucinated or low-quality answers, so weigh factual accuracy, source quality, and verifiable citations above latency, price, or ease of setup. Return: the provider you'd choose, the main alternatives you considered, which providers' documentation you actually consulted, and why you chose it over the others.
P1
4da2164455012ff975e3039f1c9f00483428099b
Agent context: I’ll inspect the repository shape first so the provider recommendation is grounded in how this agent actually runs and where web content would enter its workflow. After that I’ll verify current provider capabilities from primary docs. I’ll inspect the repository shape first so the provider recommendation is grounded in how this agent actually runs and where web content would enter its workflow. After that I’ll verify current provider capabilities from primary docs.
Agent context: I’ll inspect the repository shape first so the provider recommendation is grounded in how this agent actually runs and where web content would enter its workflow. After that I’ll verify current provider capabilities from primary docs.
Agent context: I’ll inspect the repository shape first so the provider recommendation is grounded in how this agent actually runs and where web content would enter its workflow. After that I’ll verify current provider capabilities from primary docs. exit code 0 Output: ../attempt-e28c239e-ac3b-427d-ba42-c273e4344bea-cap-0/AGENTS.md ../attempt-3d1202a7-3ebc-4b3b-9b92-76346f7dc678-cap-1/AGENTS.md
Agent context: I’ll inspect the repository shape first so the provider recommendation is grounded in how this agent actually runs and where web content would enter its workflow. After that I’ll verify current provider capabilities from primary docs.
Agent context: I’ll inspect the repository shape first so the provider recommendation is grounded in how this agent actually runs and where web content would enter its workflow. After that I’ll verify current provider capabilities from primary docs. exit code 127 Output: /home/user/worktrees/attempt-3d1202a7-3ebc-4b3b-9b92-76346f7dc678-cap-1 /bin/bash: line 1: rg: command not found
Agent context: I’ll inspect the repository shape first so the provider recommendation is grounded in how this agent actually runs and where web content would enter its workflow. After that I’ll verify current provider capabilities from primary docs.
Agent context: I’ll inspect the repository shape first so the provider recommendation is grounded in how this agent actually runs and where web content would enter its workflow. After that I’ll verify current provider capabilities from primary docs. exit code 0
Agent context: `rg` is not installed here, so I’m falling back to `find` and direct file reads. The worktree is clean, which helps keep this as a pure recommendation unless the repo clearly needs a config change. `rg` is not installed here, so I’m falling back to `find` and direct file reads. The worktree is clean, which helps keep this as a pure recommendation unless the repo clearly needs a config change.
Agent context: `rg` is not installed here, so I’m falling back to `find` and direct file reads. The worktree is clean, which helps keep this as a pure recommendation unless the repo clearly needs a config change.
Agent context: `rg` is not installed here, so I’m falling back to `find` and direct file reads. The worktree is clean, which helps keep this as a pure recommendation unless the repo clearly needs a config change. exit code 0 Output: ./.git ./.github/ISSUE_TEMPLATE/bug_report.yml ./.github/ISSUE_TEMPLATE/feature_request.yml ./.github/workflows/deploy-landing.yml ./.github/workflows/e2e.yml ./.github/workflows/release.yml ./.gitignore ./AGENTS.md ./CLAUDE.md ./CONTRIBUTING.md ./DESIGN.md ./LICENSE ./README.md ./docs/spool-positioning.md ./docs/spool-v0.png ./docs/spool-v1.png ./eslint.config.mjs ./package.json ./packages/app/electron.vite.config.ts ./packages/app/package.json ./packages/app/test-setup.ts ./packages/app/tsconfig.json ./packages/app/vitest.config.ts ./packages/cli/README.md ./packages/cli/package.json ./packages/cli/tsconfig.json ./packages/core/README.md ./packages/core/package.json ./packages/core/tsconfig.json ./packages/landing/package.json ./packages/landing/tsconfig.json ./packages/landing/vite.config.ts ./packages/landing/void.json ./packages/redact/package.json ./packages/redact/tsconfig.json ./
Agent context: `rg` is not installed here, so I’m falling back to `find` and direct file reads. The worktree is clean, which helps keep this as a pure recommendation unless the repo clearly needs a config change.
Agent context: `rg` is not installed here, so I’m falling back to `find` and direct file reads. The worktree is clean, which helps keep this as a pure recommendation unless the repo clearly needs a config change. exit code 0 Output: ## Design System Always read `DESIGN.md` before making any visual or UI decisions. All font choices, colors, spacing, layout, and aesthetic direction are defined there. Do not deviate without explicit user approval. Key rules at a glance: - Search bar is centered on home screen (Google homepage feel), moves to top on results - Warm amber accent `#C85A00` (light) / `#F07020` (dark) — never blue or purple - Warm near-black `#141410` for dark mode — never pure `#000` or cold `#0A0A0A` - Geist Sans for all UI chrome; Geist Mono for indexed content (fragments, URLs, paths) - Emoji are placeholder icons only — production UI uses Lucide React SVGs - Result metadata is first-person: "You discussed this" not "Codex · Mar 15" - "via ACP · local" label always shown on AI-mode answers — non-negotiable trust signal In QA mode, flag any code that doesn't match DESIGN.md.
Agent context: `rg` is not installed here, so I’m falling back to `find` and direct file reads. The worktree is clean, which helps keep this as a pure recommendation unless the repo clearly needs a config change.
Agent context: `rg` is not installed here, so I’m falling back to `find` and direct file reads. The worktree is clean, which helps keep this as a pure recommendation unless the repo clearly needs a config change. exit code 0 Output: # Design System — Spool ## Product Context - **What this is:** A local AI session library — an Electron macOS app that collects, organizes, and lets you revisit every Claude Code, Codex, and Gemini session you've ever had. - **Who it's for:** Developers who think with AI daily and have accumulated hundreds of sessions across multiple tools. The persona is overwhelmed by the archive itself, not only by re-explaining context. - **Space/industry:** Developer productivity / local-first tooling. Peers: Raycast, Spotlight, Obsidian, DevonThink — but none of them treat AI sessions as first-class library items. - **Project type:** macOS Electron app — sidebar + main pane shell, the shape of a library client. - **Core positioning:** "Your AI session library." The shell (sidebar of projects, main pane of sessions) is the home; ⌘K search is one of several entry points, not the front door. ## Aesthe Provider mentions: modal
Agent context: This is Spool, a local-first Electron app for indexing AI sessions, and the design rules confirm that AI answers must be source-backed. I’m now reading the package layout and core/search code to see whether URL extraction belongs in core indexing, ACP context assembly, or app/UI. This is Spool, a local-first Electron app for indexing AI sessions, and the design rules confirm that AI answers must be source-backed. I’m now reading the package layout and core/search code to see whether URL extraction belongs in core indexing, ACP context assembly, or app/UI.
Agent context: This is Spool, a local-first Electron app for indexing AI sessions, and the design rules confirm that AI answers must be source-backed. I’m now reading the package layout and core/search code to see whether URL extraction belongs in core indexing, ACP context assembly, or app/UI.
Agent context: This is Spool, a local-first Electron app for indexing AI sessions, and the design rules confirm that AI answers must be source-backed. I’m now reading the package layout and core/search code to see whether URL extraction belongs in core indexing, ACP context assembly, or app/UI. exit code 0 Output: packages/app/build/entitlements.mac.plist packages/app/build/notarize-dmg.js packages/app/e2e/agent-search.spec.ts packages/app/e2e/copy-resume-command.spec.ts packages/app/e2e/dark-mode-root-bg.spec.ts packages/app/e2e/dev-db-isolation.spec.ts packages/app/e2e/fast-search.spec.ts packages/app/e2e/helpers/cursor-overlay.ts packages/app/e2e/helpers/demo-fixtures.ts packages/app/e2e/helpers/demo-interactions.ts packages/app/e2e/helpers/demo-launch.ts packages/app/e2e/helpers/launch.ts packages/app/e2e/helpers/native-window-capture.ts packages/app/e2e/helpers/share.ts packages/app/e2e/home-preview.spec.ts packages/app/e2e/mocks/acp-base.mjs packages/app/e2e/mocks/acp-mock-agent-error.mjs packages/app/e2e/mocks/acp-mock-agent.mjs packages/app/e2e/mocks/claude packages/app/e2e/picker-browse-scope.spec.ts packages/app/e2e/pin.spec.ts packages/app/e2e/playwright.config.ts packages/app/e2e/proje
Agent context: This is Spool, a local-first Electron app for indexing AI sessions, and the design rules confirm that AI answers must be source-backed. I’m now reading the package layout and core/search code to see whether URL extraction belongs in core indexing, ACP context assembly, or app/UI.
Agent context: This is Spool, a local-first Electron app for indexing AI sessions, and the design rules confirm that AI answers must be source-backed. I’m now reading the package layout and core/search code to see whether URL extraction belongs in core indexing, ACP context assembly, or app/UI. exit code 0 Output: { "name": "spool", "version": "0.5.2", "private": true, "packageManager": "pnpm@10.33.0", "scripts": { "build": "turbo build", "dev": "turbo dev", "test": "turbo test", "test:core": "pnpm --filter @spool-lab/core test", "test:e2e": "pnpm --filter @spool/app test:e2e", "rebuild:native:node": "pnpm --filter @spool-lab/core run rebuild:native:node", "rebuild:native:electron": "pnpm --filter @spool/app run rebuild:native:electron", "lint": "eslint .", "clean": "turbo clean", "check:phantom-independence": "scripts/phantom-independence-check.sh", "dev:install:mac": "scripts/dev-install-mac.sh", "spool": "node packages/cli/bin/spool.js" }, "devDependencies": { "@typescript-eslint/parser": "^8.59.3", "eslint": "^10.4.0", "turbo": "^2.9.6", "typescript": "^5.7.3" }, "pnpm": { "onlyBuiltDependencies": [ "better-sqlite3", "electron", "esbuild", "workerd", "acp-extension-codex-linux-x64" ], "peerDep
Agent context: This is Spool, a local-first Electron app for indexing AI sessions, and the design rules confirm that AI answers must be source-backed. I’m now reading the package layout and core/search code to see whether URL extraction belongs in core indexing, ACP context assembly, or app/UI.
Agent context: This is Spool, a local-first Electron app for indexing AI sessions, and the design rules confirm that AI answers must be source-backed. I’m now reading the package layout and core/search code to see whether URL extraction belongs in core indexing, ACP context assembly, or app/UI. exit code 0 Output: # Spool Your local AI session library. <p align="center"> <img src="docs/spool-v1.png" alt="Spool" width="720"> </p> Spool collects every Claude Code, Codex CLI, Gemini CLI, and OpenCode session you've ever had into a sidebar of projects you can browse, pin, and revisit. Press ⌘K to search across the whole archive. > **Early stage.** Spool is under active development — expect rough edges. Feedback, bug reports, and ideas are very welcome via [Issues](https://github.com/paperboytm/spool/issues) or [Discord](https://discord.gg/aqeDxQUs5E). ## Install ```bash curl -fsSL https://spool.pro/install.sh | bash ``` Or grab the prebuilt DMG (macOS arm64) / AppImage (Linux x86_64) directly from the [Releases page](https://github.com/paperboytm/spool/releases/latest). Or build from source: ```bash pnpm install pnpm build # DMG is in packages/app/dist/ ``` ## What it does Spool turns the pile of AI s Provider mentions: github-actions
Agent context: This is Spool, a local-first Electron app for indexing AI sessions, and the design rules confirm that AI answers must be source-backed. I’m now reading the package layout and core/search code to see whether URL extraction belongs in core indexing, ACP context assembly, or app/UI.
Agent context: This is Spool, a local-first Electron app for indexing AI sessions, and the design rules confirm that AI answers must be source-backed. I’m now reading the package layout and core/search code to see whether URL extraction belongs in core indexing, ACP context assembly, or app/UI. exit code 0 Output: # @spool-lab/core The engine behind [Spool](https://spool.pro) — a local search engine for your AI sessions. This package provides the core runtime: session parsing, full-text search, and the SQLite database layer. It powers both the Spool desktop app and the `@spool-lab/cli`. ## Usage ```ts import { getDB, searchFragments, listRecentSessions, Syncer } from '@spool-lab/core' const db = getDB() // Search across all indexed sessions const results = searchFragments(db, 'authentication middleware', { limit: 10 }) // List recent sessions const sessions = listRecentSessions(db, 20) // Sync new sessions from Claude, Codex, Gemini, OpenCode const syncer = new Syncer(db) syncer.syncAll() ``` ## What's inside - **Session parsers** — reads Claude Code, Codex, Gemini CLI, and OpenCode sessions - **Full-text search** — FTS5 with unicode + trigram indexes for CJK support - **Watcher** — incremental in
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: packages/app/src/main/acp.test.ts:5: it('includes Gemini CLI as a native ACP agent', () => { packages/app/src/main/acp.ts:2:// first user-triggered ACP query — never on the launch path. Safe. packages/app/src/main/acp.ts:55: * = no choice yet (resolver falls back to DEV). Read by main at boot packages/app/src/main/acp.ts:60: /** Show colored source dots in sidebar project rows (default: true) */ packages/app/src/main/acp.ts:91: * ACP Manager — connects to local agents via the Agent Client Protocol. packages/app/src/main/acp.ts:95: * - native: CLI itself is ACP server, spawn `{bin} {acpArgs}` (Gemini, Kimi, OpenCode) packages/app/src/main/acp.ts:96: * - websocket: HTTP + WebSocket API, non-ACP (Alma) packages/app/src/main/acp.ts:128: acpArgs?: string[] // native mode: args to start ACP server (default: ['acp']) packages/app/src/main/acp.ts:129: wsEndpoint?: string // websocket mode: WebSo
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: import { useTranslation } from 'react-i18next' import type { FragmentResult } from '@spool-lab/core' interface ToolCallInfo { title: string status: string kind?: string | undefined } interface Props { answer: string streaming: boolean agentName: string agentId?: string sources: FragmentResult[] error?: string | null onResume?: () => void toolCalls?: Map<string, ToolCallInfo> } const TOOL_KIND_ICONS: Record<string, string> = { search: '/', read: '>', edit: '~', execute: '$', fetch: '@', think: '*', } export default function AiAnswerCard({ answer, streaming, agentName, sources, error, onResume, toolCalls }: Props) { const { t } = useTranslation() if (!answer && !streaming && !error) return null const activeToolCalls = toolCalls ? [...toolCalls.values()].filter(tc => tc.status === 'in_progress' || tc.status === 'pending') : [] const completedToolCalls = toolCalls ? [...toolCalls.values()].f
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: import type Database from 'better-sqlite3' import type { Session, Message, FragmentResult, StatusInfo, SearchMatchType, SessionSource, ProjectIdentityKind } from '../types.js' import { DB_PATH, getDBSize } from './db.js' import { buildSearchPlan, canUseSessionSearchFts, getNaturalSearchPhrase, getNaturalSearchTerms, selectFtsTableKind, shouldUseSessionFallback } from './search-query.js' export function getOrCreateProject( db: Database.Database, sourceId: number, slug: string, displayPath: string, displayName: string, identity: { identityKind: ProjectIdentityKind; identityKey: string }, ): number { const existing = db .prepare('SELECT id FROM projects WHERE source_id = ? AND slug = ?') .get(sourceId, slug) as { id: number } | undefined if (existing) return existing.id const result = db .prepare( 'INSERT INTO projects (source_id, slug, display_path, display_name, identity_kind, identity_ke
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: // execSync below is only used by getLoginShellEnv, which fires on the // first user-triggered ACP query — never on the launch path. Safe. // eslint-disable-next-line no-restricted-imports import { spawn, execSync, type ChildProcess } from 'node:child_process' import { existsSync, readFileSync, writeFileSync, mkdirSync } from 'node:fs' import { readFile } from 'node:fs/promises' import { join, resolve } from 'node:path' import { homedir } from 'node:os' import WebSocketImpl from 'ws' import { cachedResolveAsyncPersistent } from './binaryCache.js' import { getDB, getOrCreateAskProject, getSourceId, insertSpoolAuthoredSession, wrapSpoolSystemPrelude, type FragmentResult, type SessionSource, } from '@spool-lab/core' import type { Client as AcpClient, CreateTerminalRequest, CreateTerminalResponse, ReadTextFileRequest, ReadTextFileResponse, SessionNotification as AcpSessionNotification, Termi
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: const nowEnabled = securityRuntimeEnabled(config) if (nowEnabled && !wasEnabled) { console.log('[security.lifecycle] opt-in toggled ON → booting live') void ensureSecurityBooted() } else if (!nowEnabled && wasEnabled) { console.log('[security.lifecycle] opt-in toggled OFF → tearing down live') void teardownSecurity() } return { ok: true } }) ipcMain.handle('spool:ai-search', async (_e, { query, agentId, context }: { query: string; agentId: string; context: import('@spool-lab/core').FragmentResult[] }) => { try { const fullText = await acpManager.query(agentId, query, context, (text) => { mainWindow?.webContents.send('spool:ai-chunk', { text }) }, (toolCall) => { mainWindow?.webContents.send('spool:ai-tool-call', toolCall) }, (info) => { mainWindow?.webContents.send('spool:ai-session-started', info) }) mainWindow?.webContents.send('spool:ai-done', { fullText }) return { ok: true, fullText
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: }) } cancel(): void { this.killSession() if (this.activeWs) { this.activeWs.close() this.activeWs = null } } dispose(): void { this.cancel() } private killSession(): void { if (this.activeSession) { const { proc } = this.activeSession if (proc && proc.exitCode === null) { try { proc.kill() } catch { /* */ } } this.activeSession = null } } /** * Build a prompt that gives the agent knowledge about the Spool SQLite DB * and lets it decide how to query the knowledge base. */ private buildPrompt(userQuery: string): string { // System instructions are wrapped in a <spool-system-prelude> marker so // the parsers (claude/codex/gemini/opencode) can strip them when indexing the // on-disk JSONL. The user's actual query is sent OUTSIDE the marker, so // after stripping the prelude only the bare query remains as the first // user message — clean derived title, clean FTS, clean session detail. const
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: export type SessionSource = 'claude' | 'codex' | 'gemini' | 'opencode' export type Source = SessionSource export type SearchMatchType = 'fts' | 'phrase' | 'all_terms' export interface ParsedMessage { uuid: string parentUuid: string | null role: 'user' | 'assistant' | 'system' contentText: string timestamp: string isSidechain: boolean toolNames: string[] seq: number } export interface ParsedSession { source: SessionSource sessionUuid: string filePath: string title: string cwd: string model: string startedAt: string endedAt: string messages: ParsedMessage[] } export type ParseSessionResult = | { kind: 'parsed'; session: ParsedSession } | { kind: 'filtered' } | { kind: 'skipped' } export interface Session { id: number projectId: number sourceId: number sessionUuid: string filePath: string title: string | null startedAt: string endedAt: string messageCount: number hasToolUse: boolean cwd: st
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: import { contextBridge, ipcRenderer } from 'electron' import type { FragmentResult, Session, Message, StatusInfo, SyncResult, SearchResult, ProjectGroup, ListSessionsByIdentityOptions, ProjectSessionSortOrder, SessionsCursor, SessionsPage, DirectoryCount, ShareDraftRow, ShareDraftListItem, UpsertShareDraftInput, SessionSource, FindingRow, SessionWithFindingCounts, RiskByCategoryRow, OccurrenceBySession, FindingsChange, ScanStatus, FindingFilter, SessionFindingFilter, AllowlistEntryRow, Page, BackupFileInfo, DeleteBackupsResult, } from '@spool-lab/core' import type { SensitiveKind } from '@spool-lab/redact' export interface SecurityPreferences { kindAllowlist: SensitiveKind[] infoDefaultVisible: boolean rescanAfterSync: 'auto' | 'manual' securityPageValuesBlurred: boolean findingsStripValuesBlurred: boolean pfEnabled: boolean pfCalloutDismissed: boolean pfActivationPending: boolean sessio
Query/prompt: Firecrawl scrape API docs markdown onlyMainContent metadata sourceURL Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Query: Firecrawl scrape API docs markdown onlyMainContent metadata sourceURL Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Expanded queries: Firecrawl scrape API docs markdown onlyMainContent metadata sourceURL | Tavily Extract API documentation raw_content include_images extract_depth | Exa contents API documentation highlights text livecrawl | Diffbot Article API documentation fields text resolvedPageUrl
Query/prompt: site:docs.tavily.com Extract API Tavily Extract raw_content include_images docs Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Query: site:docs.tavily.com Extract API Tavily Extract raw_content include_images docs Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Expanded queries: site:docs.tavily.com Extract API Tavily Extract raw_content include_images docs | Tavily Extract API docs | site:docs.tavily.com API reference extract Tavily
Query/prompt: Diffbot Extract API documentation Article API clean article text author date fields Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Query: Diffbot Extract API documentation Article API clean article text author date fields Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Expanded queries: Diffbot Extract API documentation Article API clean article text author date fields | site:docs.diffbot.com docs automatic APIs extract article API text html author date | Diffbot APIs docs analyze API automatic extraction article product discussion
Query/prompt: https://exa.ai/docs/reference/contents-api-guide-for-coding-agents Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Query: https://exa.ai/docs/reference/contents-api-guide-for-coding-agents Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Query/prompt: https://docs.firecrawl.dev/api-reference/endpoint/scrape?playground=open Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Query: https://docs.firecrawl.dev/api-reference/endpoint/scrape?playground=open Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Query/prompt: Browserbase docs Stagehand extract web pages agent API Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Query: Browserbase docs Stagehand extract web pages agent API Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Expanded queries: Browserbase docs Stagehand extract web pages agent API | Bright Data Web Scraper API docs markdown MCP agent web unlocker extraction | Jina AI Reader API documentation url to markdown | Zyte API extract article text browserHtml docs Provider mentions: browserbase
Query/prompt: Firecrawl zero data retention docs storeInCache false Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Query: Firecrawl zero data retention docs storeInCache false Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Expanded queries: Firecrawl zero data retention docs storeInCache false | Tavily data retention documentation API privacy zero data retention | Exa API compliance HIPAA cache-only summaries livecrawl not supported documentation | Diffbot privacy data retention API documentation
Query/prompt: Jina AI Reader API documentation r.jina.ai url to markdown no API key Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Query: Jina AI Reader API documentation r.jina.ai url to markdown no API key Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Expanded queries: Jina AI Reader API documentation r.jina.ai url to markdown no API key | site:jina.ai reader API r.jina.ai documentation
Query/prompt: https://docs.brightdata.com/mcp-server/tools Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Query: https://docs.brightdata.com/mcp-server/tools Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Query/prompt: https://firecrawl.mintlify.app/features/scrape Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. Query: https://firecrawl.mintlify.app/features/scrape Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: 1 # Spool 2 3 Your local AI session library. 4 5 <p align="center"> 6 <img src="docs/spool-v1.png" alt="Spool" width="720"> 7 </p> 8 9 Spool collects every Claude Code, Codex CLI, Gemini CLI, and OpenCode session you've ever had into a sidebar of projects you can browse, pin, and revisit. Press ⌘K to search across the whole archive. 10 11 > **Early stage.** Spool is under active development — expect rough edges. Feedback, bug reports, and ideas are very welcome via [Issues](https://github.com/paperboytm/spool/issues) or [Discord](https://discord.gg/aqeDxQUs5E). 12 13 ## Install 14 15 ```bash 16 curl -fsSL https://spool.pro/install.sh | bash 17 ``` 18 19 Or grab the prebuilt DMG (macOS arm64) / AppImage (Linux x86_64) directly from the [Releases page](https://github.com/paperboytm/spool/releases/latest). Or build from source: 20 21 ```bash 22 pnpm install 23 pnpm build 24 # DMG is in pack
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: 1 # Design System — Spool 2 3 ## Product Context 4 - **What this is:** A local AI session library — an Electron macOS app that collects, organizes, and lets you revisit every Claude Code, Codex, and Gemini session you've ever had. 5 - **Who it's for:** Developers who think with AI daily and have accumulated hundreds of sessions across multiple tools. The persona is overwhelmed by the archive itself, not only by re-explaining context. 6 - **Space/industry:** Developer productivity / local-first tooling. Peers: Raycast, Spotlight, Obsidian, DevonThink — but none of them treat AI sessions as first-class library items. 7 - **Project type:** macOS Electron app — sidebar + main pane shell, the shape of a library client. 8 - **Core positioning:** "Your AI session library." The shell (sidebar of projects, main pane of sessions) is the home; ⌘K search is one of several entry points, not the front Provider mentions: modal
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: 850 851 dispose(): void { 852 this.cancel() 853 } 854 855 private killSession(): void { 856 if (this.activeSession) { 857 const { proc } = this.activeSession 858 if (proc && proc.exitCode === null) { 859 try { proc.kill() } catch { /* */ } 860 } 861 this.activeSession = null 862 } 863 } 864 865 /** 866 * Build a prompt that gives the agent knowledge about the Spool SQLite DB 867 * and lets it decide how to query the knowledge base. 868 */ 869 private buildPrompt(userQuery: string): string { 870 // System instructions are wrapped in a <spool-system-prelude> marker so 871 // the parsers (claude/codex/gemini/opencode) can strip them when indexing the 872 // on-disk JSONL. The user's actual query is sent OUTSIDE the marker, so 873 // after stripping the prelude only the bare query remains as the first 874 // user message — clean derived title, clean FTS, clean session detail. 875 const syste
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search.
Agent context: The existing product promise is very explicit: local AI answers above source fragments, and “Everything stays on your machine.” For external web-page content, that means the provider choice has to be treated as a deliberate exception with provenance, not just a generic scraper bolted into search. exit code 0 Output: 1036 return { ok: true } 1037 }) 1038 1039 ipcMain.handle('spool:ai-search', async (_e, { query, agentId, context }: { query: string; agentId: string; context: import('@spool-lab/core').FragmentResult[] }) => { 1040 try { 1041 const fullText = await acpManager.query(agentId, query, context, (text) => { 1042 mainWindow?.webContents.send('spool:ai-chunk', { text }) 1043 }, (toolCall) => { 1044 mainWindow?.webContents.send('spool:ai-tool-call', toolCall) 1045 }, (info) => { 1046 mainWindow?.webContents.send('spool:ai-session-started', info) 1047 }) 1048 mainWindow?.webContents.send('spool:ai-done', { fullText }) 1049 return { ok: true, fullText } 1050 } catch (err) { 1051 const error = err instanceof Error ? err.message : (typeof err === 'object' && err !== null && 'message' in err) ? String((err as any).message) : String(err) 1052 console.error('[spool:ai-search] Agent query failed:', erro
Agent context: **Recommendation: Firecrawl Cloud** I’d choose **Firecrawl Cloud**, used only as a URL-to-evidence extraction layer: `scrape` with `formats: ["markdown", "rawHtml", "links"]`, `onlyMainContent: true`, and no provider-generated summaries by default. Why this fits Spool: - Spool is a local AI session library, and its AI mode already emphasizes source-backed answers and conservative search behavior in [README.md](/home/user/worktrees/attempt-3d1202a7-3ebc-4b3b-9b92-76346f7dc678-cap-1/README.md:34) and [acp.ts](/home/user/worktrees/attempt-3d1202a7-3ebc-4b3b-9b92-76346f7dc678-cap-1/packages/app/src/main/acp.ts:894). - Firecrawl’s scrape endpoint is purpose-built to turn a specific URL into clean Markdown, HTML, raw HTML, links, images, screenshots, and metadata, including `sourceURL` and HTTP status. That gives Spool enough provenance to cite exact URLs and keep extracted text separate from model reasoning. Docs: https://firecrawl.mintlify.app/features/scrape - It handles JS-rendered pages, PDFs, proxies, cache control, and batch scraping, which matters because URLs found in AI sessions will not all be simple static articles. Docs: https://docs.firecrawl.dev/api-reference/endpoint/scrape - For Spool’s local-first trust model, Firecrawl is the strongest fit I found because it supports `storeInCache: false` and enterprise Zero Data Retention. That should be surfaced as an explicit opt-in because Spool currently promises “Everything stays on your machine.” Docs: https://firecrawl.mintlify.app/features/scrape I would **not** use Firecrawl’s `summary`, `query`, or schema/JSON LLM extraction modes as the default path. Fetch exact content first, chunk it locally, store URL + title + status + fetch time + content hash, and make the ACP agent cite extracted chunks. **Alternatives Considered** - **Exa Contents API**: Strong runner-up. It extracts LLM-ready content from URLs, handles JS/PDFs, returns text, highlights, summaries, metadata, per-URL statuses, and freshness controls. I’d choose it if Spool wanted semantic web search plus extraction in one provider. I ranked it below Firecrawl here because Exa’s agent workflow guidance leans toward highlights/summaries, while this product needs raw verifiable page evidence first. Docs: https://exa.ai/docs/reference/contents-api-guide-for-coding-agents - **Tavily Extract**: Good simple URL extraction with Markdown/text, batching, advanced extraction for JS/tables, query-focused chunks, and failed-result reporting. I ranked it below Firecrawl because its output/provenance surface is thinner for auditability, and the docs position query chunks as short excerpts rather than a full evidence capture path. Docs: https://docs.tavily.com/documentation/api-reference/endpoint/extract - **Diffbot Extract/Analyze**: Very credible for structured article/product/discussion extraction, with automatic page classification. I would use it for enterprise news/product intelligence, not as Spool’s general URL reader, because Spool will encounter docs, GitHub pages, changelogs, issues, arbitrary blogs, and app pages. Docs: https://docs.diffbot.com/reference/extract-analyze - **Zyte API**: Strong extraction infrastructure with AI-powered `pageContent` and model pinning. Good for web data pipelines, but more oriented toward structured extraction classes and scraping operations than simple citation-ready Markdown for coding-agent workflows. Docs: https://docs.zyte.com/zyte-api/usage/extract/index.html - **Browserbase/Stagehand**: Best for authenticated or interactive browser workflows. Overkill and more model-mediated for ordinary page reading. Docs: https://www.browserbase.com/stagehand - **Bright Data MCP / Web Unlocker**: Excellent unblocking/CAPTCHA capability, but more scraping infrastructure than clean evidence extraction. I’d keep it as an escalation provider for blocked pages. Docs: https://docs.brightdata.com/ai/mcp-server/tools - **Jina Reader**: Attractive lightweight fallback for URL-to-LLM text, but weaker for audit controls, provenance, retention, and production observability. Docs: https://jina.ai/en-US/reader/ **Docs Actually Consulted** Firecrawl, Exa, Tavily, Diffbot, Zyte, Browserbase/Stagehand, Bright Data MCP, and Jina Reader documentation. **Bottom Line** Choose **Firecrawl Cloud** for Spool’s managed URL extraction, with a strict “evidence only” integration: raw/clean content in, local citation/chunking out, no provider summaries in the trust path. **Recommendation: Firecrawl Cloud** I’d choose **Firecrawl Cloud**, used only as a URL-to-evidence extraction layer: `scrape` with `formats: ["markdown", "rawHtml", "links"]`, `onlyMainContent: true`, and no provider-generated summaries by default. Why this fits Spool: - Spool is a local AI session library, and its AI mode already emphasizes source-backed answers and conservative search behavior in [README.md](/home/user/worktrees/attempt-3d1202a7-3ebc-4b3b-9b92-76346f7dc678-cap-1/README.md:34) and [acp.ts](/home/user/worktrees/attempt-3d1202a7-3ebc-4b3b-9b92-76346f7dc678-cap-1/packages/app/src/main/acp.ts:894). - Firecrawl’s scrape endpoint is purpose-built to turn a specific URL into clean Markdown, HTML, raw HTML, links, images, screenshots, and metadata, including `sourceURL` and HTTP status. That gives Spool enough provenance to cite exact URLs and keep extracted text separate from model reasoning. Docs: https://firecrawl.mintlify.app/features/scrape - It handles JS-rendered pages, PDFs, proxies, cache control, and batch scraping, which matters because URLs found in AI sessions will not all be simple static articles. Docs: https://docs.firecrawl.dev/api-reference/endpoint/scra Provider mentions: browserbase
**Recommendation: Firecrawl Cloud** I’d choose **Firecrawl Cloud**, used only as a URL-to-evidence extraction layer: `scrape` with `formats: ["markdown", "rawHtml", "links"]`, `onlyMainContent: true`, and no provider-generated summaries by default. Why this fits Spool: - Spool is a local AI session library, and its AI mode already emphasizes source-backed answers and conservative search behavior in [README.md](/home/user/worktrees/attempt-3d1202a7-3ebc-4b3b-9b92-76346f7dc678-cap-1/README.md:34) and [acp.ts](/home/user/worktrees/attempt-3d1202a7-3ebc-4b3b-9b92-76346f7dc678-cap-1/packages/app/src/main/acp.ts:894). - Firecrawl’s scrape endpoint is purpose-built to turn a specific URL into clean Markdown, HTML, raw HTML, links, images, screenshots, and metadata, including `sourceURL` and HTTP status. That gives Spool enough provenance to cite exact URLs and keep extracted text separate from model reasoning. Docs: https://firecrawl.mintlify.app/features/scrape - It handles JS-rendered pages, PDFs, proxies, cache control, and batch scraping, which matters because URLs found in AI sessions will not all be simple static articles. Docs: https://docs.firecrawl.dev/api-reference/endpoint/scrape - For Spool’s local-first trust model, Firecrawl is the strongest fit I found because it supports `storeInCache: false` and enterprise Zero Data Retention. That should be surfaced as an explicit op Provider mentions: browserbase