Find
Search dashboard pages and product views.
Search dashboard pages and product views.
MatrixAges/polywise
Add the ability for this repository's agent to fetch and extract clean, readable content from the web pages/URLs it encounters, so it can use that content in its workflow. Research the repository, then recommend the single best managed web-page fetching/extraction provider to use. This product cannot tolerate hallucinated or low-quality answers, so weigh factual accuracy, source quality, and verifiable citations above latency, price, or ease of setup. Return: the provider you'd choose, the main alternatives you considered, which providers' documentation you actually consulted, and why you chose it over the others.
P1
1eab79217258b1314fe9302fb16113154535049b
Agent context: I’ll first read the repository rules and the local agent structure so the recommendation fits this codebase, then I’ll verify current provider docs from primary sources before comparing options. I’ll first read the repository rules and the local agent structure so the recommendation fits this codebase, then I’ll verify current provider docs from primary sources before comparing options.
Agent context: I’ll first read the repository rules and the local agent structure so the recommendation fits this codebase, then I’ll verify current provider docs from primary sources before comparing options.
Agent context: I’ll first read the repository rules and the local agent structure so the recommendation fits this codebase, then I’ll verify current provider docs from primary sources before comparing options. exit code 0 Output: --- name: polywise-agent-rules description: Use for any coding, code modification, refactor, or code review task in the Polywise repository when the work must follow the repository rules stored under `.agent/rules` and nearby `unify.md` or `agentmap.md` files. --- # Polywise Agent Rules 1. Read `.agent/rules/global.md` before doing work in this repository. 2. For coding, refactor, code-generation, and review tasks, also read `.agent/rules/coding.md`. 3. Before creating files, refactoring modules, or writing core logic, read the target package's outline-level `unify.md` and follow the matched route's sample paths. 4. If a package-level `agentmap.md` exists and the task changes that package's structure or responsibility boundaries, update that outline-level `agentmap.md` before finishing. 5. Treat `.agent/rules/` as the source of truth. Use this skill as the Codex-native bridge rather than
Agent context: I’ll first read the repository rules and the local agent structure so the recommendation fits this codebase, then I’ll verify current provider docs from primary sources before comparing options.
Agent context: I’ll first read the repository rules and the local agent structure so the recommendation fits this codebase, then I’ll verify current provider docs from primary sources before comparing options. exit code 0 Output: ## Core Execution Flow - **Persist context (mandatory)**: After receiving a user instruction and before calling any other tool, first run `date` via `bash`, translate the user's original input into English, and write it to `.prompts/YYYY-MM-DD/HH-mm-ss.md`. - **Language consistency**: Always reply in exactly the same language as the user's input. - **Path safety**: Before using file tools, if the path is not an explicit absolute path, first confirm the real location with `glob` or `ls`; never guess paths from memory. - **Complete replacement**: When using editing tools, replacement snippets must include enough context and contain the full business logic; never use `...` to omit code. - **Scope restriction**: Limit changes strictly to the files or modules explicitly mentioned by the user; if unmentioned modules are involved, confirm with the user first. - **Proposal first**: Before any wr
Agent context: I’ve loaded the Polywise bridge skill and global rules. The repo has a mandatory prompt-persistence rule, so I’m recording the original request now before continuing repository inspection. I’ve loaded the Polywise bridge skill and global rules. The repo has a mandatory prompt-persistence rule, so I’m recording the original request now before continuing repository inspection.
Agent context: I’ve loaded the Polywise bridge skill and global rules. The repo has a mandatory prompt-persistence rule, so I’m recording the original request now before continuing repository inspection.
Agent context: I’ve loaded the Polywise bridge skill and global rules. The repo has a mandatory prompt-persistence rule, so I’m recording the original request now before continuing repository inspection. exit code 0 Output: 2026-06-06 05:05:17 UTC
Agent context: Draft for the required prompt record: original request plus an English restatement under `.prompts/2026-06-06/05-05-17.md`. This is repository bookkeeping required by `.agent/rules/global.md`; I’ll keep it isolated from product code. Draft for the required prompt record: original request plus an English restatement under `.prompts/2026-06-06/05-05-17.md`. This is repository bookkeeping required by `.agent/rules/global.md`; I’ll keep it isolated from product code.
Agent context: Draft for the required prompt record: original request plus an English restatement under `.prompts/2026-06-06/05-05-17.md`. This is repository bookkeeping required by `.agent/rules/global.md`; I’ll keep it isolated from product code.
Agent context: Draft for the required prompt record: original request plus an English restatement under `.prompts/2026-06-06/05-05-17.md`. This is repository bookkeeping required by `.agent/rules/global.md`; I’ll keep it isolated from product code. exit code 2 Output: ls: cannot access '.prompts/2026-06-06': No such file or directory
Agent context: Draft for the required prompt record: original request plus an English restatement under `.prompts/2026-06-06/05-05-17.md`. This is repository bookkeeping required by `.agent/rules/global.md`; I’ll keep it isolated from product code.
Agent context: Draft for the required prompt record: original request plus an English restatement under `.prompts/2026-06-06/05-05-17.md`. This is repository bookkeeping required by `.agent/rules/global.md`; I’ll keep it isolated from product code. exit code 0
Agent context: The required prompt record is in place. I’m now mapping the agent/runtime surface so the provider choice is grounded in how this repository actually routes tool use and citations. The required prompt record is in place. I’m now mapping the agent/runtime surface so the provider choice is grounded in how this repository actually routes tool use and citations.
Agent context: The required prompt record is in place. I’m now mapping the agent/runtime surface so the provider choice is grounded in how this repository actually routes tool use and citations.
Agent context: The required prompt record is in place. I’m now mapping the agent/runtime surface so the provider choice is grounded in how this repository actually routes tool use and citations.
Agent context: The required prompt record is in place. I’m now mapping the agent/runtime surface so the provider choice is grounded in how this repository actually routes tool use and citations. exit code 127 Output: /bin/bash: line 1: rg: command not found
Agent context: The required prompt record is in place. I’m now mapping the agent/runtime surface so the provider choice is grounded in how this repository actually routes tool use and citations. exit code 127 Output: /bin/bash: line 1: rg: command not found
Agent context: The required prompt record is in place. I’m now mapping the agent/runtime surface so the provider choice is grounded in how this repository actually routes tool use and citations.
Agent context: The required prompt record is in place. I’m now mapping the agent/runtime surface so the provider choice is grounded in how this repository actually routes tool use and citations. exit code 0
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` and keeping the search read-only. `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` and keeping the search read-only.
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` and keeping the search read-only.
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` and keeping the search read-only. exit code 0 Output: ./.agent/rules/unify.md ./.github/unify.md ./README.md ./package.json ./packages/app/agentmap.md ./packages/app/package.json ./packages/app/unify.md ./packages/desktop/README.md ./packages/desktop/agentmap.md ./packages/desktop/package.json ./packages/desktop/unify.md ./packages/erpc/agentmap.md ./packages/erpc/package.json ./packages/erpc/unify.md ./packages/polywise/README.md ./packages/polywise/agentmap.md ./packages/polywise/package.json ./packages/polywise/unify.md ./packages/stk/README.md ./packages/stk/agentmap.md ./packages/stk/package.json ./packages/stk/src/storage/README.md ./packages/stk/unify.md ./packages/website/package.json ./packages/website/unify.md ./readme/README.ar.md ./readme/README.bn.md ./readme/README.br.md ./readme/README.bs.md ./readme/README.da.md ./readme/README.de.md ./readme/README.es.md ./readme/README.fr.md ./readme/README.gr.md ./readme/README.it.md ./re
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` and keeping the search read-only.
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` and keeping the search read-only. exit code 0 Output: ./.agent/rules/coding.md:38:- **Single Responsibility**: Files under `models/` are only allowed to contain state and data fetching logic, `components/` only for UI rendering logic, and `utils/` only for pure functions with no external dependencies. ./.agent/rules/coding.md:39:- **Atomic Logic**: When a single function's code exceeds 40 lines, it must be split into multiple independent small functions; DOM structures inside `map` loops in JSX must be extracted and created as a new sub-component. ./.agent/rules/global.md:3:- **Persist context (mandatory)**: After receiving a user instruction and before calling any other tool, first run `date` via `bash`, translate the user's original input into English, and write it to `.prompts/YYYY-MM-DD/HH-mm-ss.md`. ./.agent/rules/global.md:5:- **Path safety**: Before using file tools, if the path is not an explicit absolute path, first confirm the rea
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` and keeping the search read-only.
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` and keeping the search read-only. exit code 0 Output: .agent/rules/coding.md .agent/rules/global.md .agent/rules/unify.md .draft/2026-05-31-cli-input-schema-plan.md .draft/2026-05-31-electron-close-behavior.md .draft/2026-05-31-global-panel-mention-tools-plan.md .draft/2026-05-31-polywise-tool-plan.md .draft/2026-05-31-website-steps-plan.md .draft/2026-06-01-21-06-05-legacy-db-plan.md .draft/2026-06-01-header-update-plan.md .draft/2026-06-02-bash-tool-high-availability.md .draft/2026-06-02-desktop-release-asset-path-plan.md .draft/2026-06-02-r2-workflow-readability-plan.md .draft/2026-06-02-readme-localization-plan.md .draft/2026-06-02-readme-relocation-plan.md .draft/2026-06-02-release-notes-debug-plan.md .draft/agent-private-knowledge-import-export-plan.md .draft/cli-design-plan.md .draft/content-callback-plan.md .draft/home-module-design.md .draft/im-runtime-plan.md .draft/rewire-plan.md .draft/session-mention-plan.md .draft/workflow-gha
Agent context: The repository points to `packages/polywise` as the server/tool runtime. I’m reading its map and the relevant tool/session files next, then I’ll line that up with managed extraction providers. The repository points to `packages/polywise` as the server/tool runtime. I’m reading its map and the relevant tool/session files next, then I’ll line that up with managed extraction providers.
Agent context: The repository points to `packages/polywise` as the server/tool runtime. I’m reading its map and the relevant tool/session files next, then I’ll line that up with managed extraction providers.
Agent context: The repository points to `packages/polywise` as the server/tool runtime. I’m reading its map and the relevant tool/session files next, then I’ll line that up with managed extraction providers. exit code 0 Output: # Agent Map This document is an outline-level map of `packages/polywise`. It captures the package's long-lived runtime domains instead of enumerating every leaf file. ## 1. Module Overview - **Description**: Polywise backend and AI runtime package. - **Architecture**: TypeScript service runtime around search/memory pipelines, session orchestration, and SQLite-backed state. ## 2. Outline Tree ```json { "entry_and_bootstrap": { "package_files": [ "package.json", "rslib.config.ts", "drizzle.config.ts", "rstest.config.ts", "tsconfig.json", "tsconfig.build.json", "inspect.sh" ], "src/index.ts": "Primary package runtime entry.", "src/server.ts": "Standalone server entry.", "src/cli": "CLI surface for local runtime and tooling flows.", "src/env.ts": "Environment normalization.", "src/auth.ts": "Authentication bootstrap." }, "service_interfaces": { "src/api": "HTTP-facing endpoints and session/p
Agent context: The repository points to `packages/polywise` as the server/tool runtime. I’m reading its map and the relevant tool/session files next, then I’ll line that up with managed extraction providers.
Agent context: The repository points to `packages/polywise` as the server/tool runtime. I’m reading its map and the relevant tool/session files next, then I’ll line that up with managed extraction providers. exit code 0 Output: { "name": "polywise", "version": "0.0.5", "license": "MIT", "homepage": "https://polywise.io", "repository": { "type": "git", "url": "https://github.com/MatrixAges/polywise.git" }, "keywords": [ "agent", "decision", "workflow", "database", "ai", "memory", "self-hosted", "assistant", "twin", "rag", "decision-system", "graph-rag", "polywise", "llm-wiki" ], "type": "module", "main": "./dist/index.js", "sideEffects": false, "exports": { ".": { "types": "./dist/src/index.d.ts", "import": "./dist/index.js", "default": "./dist/index.js" } }, "bin": { "polywise": "./dist/cli.js" }, "files": [ "dist", "./package.json", "./README.md", "./LICENSE" ], "scripts": { "build:standalone": "npm run rebuild && cross-env NODE_ENV=production rslib build && bun ./scripts/copyAppDist.ts", "build:electron": "cross-env NODE_ENV=production rslib build", "build:watch": "cross-env NODE_ENV=development rslib build - Provider mentions: openai
Agent context: The repository points to `packages/polywise` as the server/tool runtime. I’m reading its map and the relevant tool/session files next, then I’ll line that up with managed extraction providers.
Agent context: The repository points to `packages/polywise` as the server/tool runtime. I’m reading its map and the relevant tool/session files next, then I’ll line that up with managed extraction providers. exit code 0 Output: packages/polywise/src/auth.ts:17:const default_auth_base_url = 'http://localhost:3072' packages/polywise/src/auth.ts:32: default_auth_base_url packages/polywise/src/auth.ts:45: return default_auth_base_url packages/polywise/src/auth.ts:82:const appendHeaders = (target: Headers, source: Headers) => { packages/polywise/src/auth.ts:83: source.forEach((value, key) => { packages/polywise/src/callback/applyContentGraphFeedback.ts:122: WHERE source_id = ? AND target_id = ? AND ((? is null AND agent_id is null) OR agent_id = ?) packages/polywise/src/callback/applyContentGraphFeedback.ts:127: id, relation, agent_id, source_id, target_id, packages/polywise/src/cli/index.ts:4:import { fileURLToPath, pathToFileURL } from 'node:url' packages/polywise/src/cli/index.ts:17:const server_base_url = (process.env.POLYWISE_SERVER_URL || 'http://localhost:3072').replace(/\/$/, '') packages/polywise/src/cli/in
Agent context: The repository points to `packages/polywise` as the server/tool runtime. I’m reading its map and the relevant tool/session files next, then I’ll line that up with managed extraction providers.
Agent context: The repository points to `packages/polywise` as the server/tool runtime. I’m reading its map and the relevant tool/session files next, then I’ll line that up with managed extraction providers. exit code 0 Output: packages/polywise/src/api/im.ts packages/polywise/src/api/index.ts packages/polywise/src/api/page.ts packages/polywise/src/api/session.ts packages/polywise/src/api/test.ts packages/polywise/src/auth.ts packages/polywise/src/callback/applyContentCallback.ts packages/polywise/src/callback/applyContentGraphFeedback.ts packages/polywise/src/callback/buildContentCallbackKey.ts packages/polywise/src/callback/cleanupContentCallbackStore.ts packages/polywise/src/callback/constants.ts packages/polywise/src/callback/createContentSearchTrace.ts packages/polywise/src/callback/ensureContentCenterNode.ts packages/polywise/src/callback/getContentCallbackStorePath.ts packages/polywise/src/callback/index.ts packages/polywise/src/callback/normalizeContentCallbackQuery.ts packages/polywise/src/callback/readContentCallbackStore.ts packages/polywise/src/callback/resolveContentArticleNodes.ts packages/polywis Provider mentions: together-ai, openai
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. exit code 0 Output: packages/polywise/src/fst/tools/agent.ts packages/polywise/src/fst/tools/api.ts packages/polywise/src/fst/tools/bash.ts packages/polywise/src/fst/tools/content.ts packages/polywise/src/fst/tools/context.ts packages/polywise/src/fst/tools/cron.ts packages/polywise/src/fst/tools/edit/apply.ts packages/polywise/src/fst/tools/edit/count.ts packages/polywise/src/fst/tools/edit/error.ts packages/polywise/src/fst/tools/edit/getLang.ts packages/polywise/src/fst/tools/edit/index.ts packages/polywise/src/fst/tools/edit/types.ts packages/polywise/src/fst/tools/error_collect.ts packages/polywise/src/fst/tools/glob.ts packages/polywise/src/fst/tools/index.ts packages/polywise/src/fst/tools/linkcase.ts packages/polywise/src/fst/tools/mcp.ts packages/polywise/src/fst/tools/message.ts packages/polywise/src/fst/tools/meta/createCustomToolSet.ts packages/polywise/src/fst/tools/meta/getToolDir.ts packages/
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. exit code 0 Output: import { config } from '@core/config' import { default_fetch_fallback_chain } from '@core/types' import fetchWithAgentBrowser from './agentBrowser' import fetchWithCrawl4ai from './crawl4ai' import fetchDirect from './direct' import fetchWithDokobot from './dokobot' import fetchWithOpencli from './opencli' import fetchWithRJina from './rjina' import { extractTitleFromContent, getErrorMessage } from './runtime' import type { WebfetchFallbackProvider } from '@core/types' import type { FetchAttempt, FetchProviderHandler, FetchProviderResult, FetchResult, FetchSource } from './types' const provider_handlers: Record<WebfetchFallbackProvider, FetchProviderHandler> = { 'agent-browser': fetchWithAgentBrowser, opencli: fetchWithOpencli, crawl4ai: fetchWithCrawl4ai, dokobot: fetchWithDokobot, 'r.jina.ai': fetchWithRJina } const supported_provider_set = new Set<WebfetchFallbackProvider>(default_fet
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. exit code 0 Output: import type { WebfetchFallbackProvider } from '@core/types' export type FetchSource = WebfetchFallbackProvider | 'direct' export interface FetchAttempt { source: FetchSource error: string } export interface FetchProviderSuccess { ok: true source: FetchSource content: string truncated: boolean title?: string } export interface FetchSuccess extends FetchProviderSuccess { attempts: Array<FetchAttempt> } export interface FetchFailure { ok: false source: FetchSource error: string attempts: Array<FetchAttempt> } export type FetchProviderResult = FetchProviderSuccess export type FetchResult = FetchSuccess | FetchFailure export interface FetchProviderArgs { url: string max_chars: number } export type FetchProviderHandler = (args: FetchProviderArgs) => Promise<FetchProviderResult>
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. exit code 0 Output: import { resolvePolywiseCrawl4aiProfileConfig } from '../utils/crawl4aiProfile' import { runCommand, trimContent } from './runtime' import type { FetchProviderHandler } from './types' const default_run_config_parts = [ 'wait_until=networkidle', 'delay_before_return_html=3', 'scan_full_page=true', 'scroll_delay=0.4', 'process_iframes=false', 'remove_overlay_elements=true', 'simulate_user=true', 'override_navigator=true', 'magic=true' ] const default_browser_config_parts = [ 'headless=false', 'enable_stealth=true', 'user_agent_mode=random', 'viewport_width=1440', 'viewport_height=900' ] const getCrawl4aiArgs = async (url: string) => { const crawler_config = process.env.CRAWL4AI_RUN_CONFIG?.trim() const args = [url, '-o', 'markdown', '--bypass-cache'] const browser_config = process.env.CRAWL4AI_BROWSER_CONFIG?.trim() const browser_type = process.env.CRAWL4AI_BROWSER_TYPE?.trim() || 'chromiu
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. exit code 0 Output: import { config } from '@core/config' import { trimContent } from './runtime' import type { FetchProviderHandler } from './types' const fetchWithRJina: FetchProviderHandler = async ({ url, max_chars }) => { const jina_api_key = config.jina_api_key?.trim() const resp = await fetch(`https://r.jina.ai/${url}`, { signal: AbortSignal.timeout(30000), headers: { ...(jina_api_key ? { Authorization: `Bearer ${jina_api_key}` } : {}) } }) if (!resp.ok) { throw new Error(`Jina returned HTTP ${resp.status}`) } const markdown = await resp.text() if (!markdown.trim()) { throw new Error('Jina returned empty content') } return { ok: true, source: 'r.jina.ai', ...trimContent(markdown, max_chars) } } export default fetchWithRJina
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. exit code 0 Output: import { htmlToMarkdown, trimContent } from './runtime' import type { FetchProviderHandler } from './types' const fetchDirect: FetchProviderHandler = async ({ url, max_chars }) => { const resp = await fetch(url, { signal: AbortSignal.timeout(15000), headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36', Accept: 'text/html,application/xhtml+xml,*/*' } }) if (!resp.ok) { throw new Error(`HTTP ${resp.status}`) } const html = await resp.text() const markdown = htmlToMarkdown(html) if (!markdown.trim()) { throw new Error('Direct fetch returned empty content') } return { ok: true, source: 'direct', ...trimContent(markdown, max_chars) } } export default fetchDirect
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. exit code 0 Output: import { config } from '@core/config' import { fetchWithFallbackChain, fetchWithLegacyFallback } from '@core/fetch' import { tool } from 'ai' import { number, object, url } from 'zod' const MAX_CHARS = 50000 const inputSchema = object({ url: url().describe('The URL to fetch content from'), max_chars: number().optional().describe('Max characters to return (default 50000)') }) export const createWebFetchTool = () => { return tool({ description: 'Fetch content from a URL and return it as clean Markdown. Use for reading web pages, documentation, or any online content.', inputSchema, execute: async input => { const max_chars = input.max_chars ?? MAX_CHARS return config.enbale_webfetch_chain ? await fetchWithFallbackChain(input.url, max_chars) : await fetchWithLegacyFallback(input.url, max_chars) } }) }
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. exit code 0 Output: import { link, link_article } from '@core/db/schema' import { getLink, getLinks, removeLink } from '@core/db/services' import { getLinkArticles } from '@core/db/services/externals' import removeArticle from '@core/io/remove' import { createLinkcaseSchedule, linkcase_schedule_actions, linkcase_schedule_interval_units, listLinkcaseSchedules, removeLinkcaseSchedule, updateLinkcaseSchedule } from '@core/rpc/linkcase/scheduler' import { commitLinkcasePreview, createLinkcaseItem, fetchLinkcaseLink, getLinkcaseKeywordWhere, hydrateLinkcaseItems, linkcase_statuses, markLinkcaseFetchFailure, previewLinkcaseLinkWithProvider, readLinkcasePreview } from '@core/rpc/linkcase/utils' import { tool } from 'ai' import { and, eq, inArray, notInArray } from 'drizzle-orm' import { array, boolean, enum as Enum, number, object, string, z } from 'zod' import type { WebfetchFallbackProvider } from '@core/types' Provider mentions: together-ai
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. exit code 0 Output: import { config } from '@core/config' import { tool } from 'ai' import TurndownService from 'turndown' import { number, object, string } from 'zod' const turndown = new TurndownService({ headingStyle: 'atx', hr: '---', bulletListMarker: '-', codeBlockStyle: 'fenced', emDelimiter: '*' }) turndown.remove(['script', 'style', 'meta', 'link', 'noscript']) const MAX_CHARS = 30000 const inputSchema = object({ query: string().describe('Keywords used to search the web for a list of relevant links'), max_chars: number().optional().describe('Max characters to return (default 30000)') }) export const createWebSearchTool = () => { return tool({ description: 'Search the web and return link list. This tool is for finding candidate URLs only, not for final factual answers. After reviewing the returned links and snippets, call web_fetch_tool on the most relevant target URL to read the webpage body.', inp
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly.
Agent context: There is already a web-fetch abstraction under `src/fetch`, plus prompts that explicitly require fetching target pages instead of trusting search snippets. I’m reading that path in detail because the best provider should replace or lead the existing chain, not sit beside it awkwardly. exit code 0 Output: packages/polywise/src/config/loadConfig.ts:4:import { default_fetch_fallback_chain } from '@core/types' packages/polywise/src/config/loadConfig.ts:15:const fetch_fallback_provider_set = new Set<string>(default_fetch_fallback_chain) packages/polywise/src/config/loadConfig.ts:107: if (config.jina_api_key === undefined) { packages/polywise/src/config/loadConfig.ts:108: config.jina_api_key = '' packages/polywise/src/config/loadConfig.ts:167: if (config.enbale_webfetch_chain === undefined) { packages/polywise/src/config/loadConfig.ts:168: config.enbale_webfetch_chain = false packages/polywise/src/config/loadConfig.ts:208: if (!Array.isArray(config.fetch_fallback_chain) || !config.fetch_fallback_chain.length) { packages/polywise/src/config/loadConfig.ts:209: config.fetch_fallback_chain = [...default_fetch_fallback_chain] packages/polywise/src/config/loadConfig.ts:212: const current_chain = con
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. exit code 0 Output: export interface Workspace { name: string endpoint?: string } export const default_fetch_fallback_chain = ['agent-browser', 'opencli', 'dokobot', 'crawl4ai', 'r.jina.ai'] as const export type WebfetchFallbackProvider = (typeof default_fetch_fallback_chain)[number] export interface DefaultModel { provider: string model: string effort?: string } export interface McpOAuthConfig { clientId?: string clientSecret?: string scope?: string redirectUri?: string } export interface McpLocalConfig { type: 'local' command: Array<string> environment?: Record<string, string> enabled?: boolean timeout?: number } export interface McpRemoteConfig { type: 'remote' url: string enabled?: boolean headers?: Record<string, string> oauth?: McpOAuthConfig | false timeout?: number } export interface McpEnabledConfig { enabled: boolean } export interface McpConfig { enabled?: boolean [name: string]: McpLocalConfig |
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. exit code 0 Output: } const article_id = await saveArticle({ article_id: article_item?.id, title: article_item?.title || item.title, content, for: 'linkcase', exec_pipeline: true }) runLinkcaseExtractTask({ id: item.id, article_id, content }) return { id: item.id, title: item.title, url: item.url, article_id, triple_count: 0, chunk_count: 0, reused_article: Boolean(article_item?.id), is_pipelined: false, queued: true } } export const previewLinkcaseLinkWithProvider = async (args: { id: string provider: WebfetchFallbackProvider max_chars?: number }) => { const current_link = await getLink(eq(link.id, args.id)) if (!current_link) { throw new Error(`Link not found: ${args.id}`) } cleanupLinkcaseFetchPreviewCache() const max_chars = args.max_chars ?? DEFAULT_LINKCASE_FETCH_MAX_CHARS const result = await fetchWithProvider(args.provider, current_link.url, max_chars) const preview_key = crypto.randomUUID() const p
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. exit code 0 Output: return { action: 'schedule_remove' as const, ...(await removeLinkcaseSchedule(input.schedule_id)) } } if (input.action === 'fetch_preview') { if (!input.id) { return { action: 'fetch_preview' as const, error: 'id is required for fetch_preview action' } } if (!input.provider) { return { action: 'fetch_preview' as const, error: 'provider is required for fetch_preview action' } } return { action: 'fetch_preview' as const, ...(await previewLinkcaseLinkWithProvider({ id: input.id, provider: input.provider as WebfetchFallbackProvider, max_chars: input.max_chars })) } } if (input.action === 'read_preview') { if (!input.preview_key) { return { action: 'read_preview' as const, error: 'preview_key is required for read_preview action' } } if (!input.page) { return { action: 'read_preview' as const, error: 'page is required for read_preview action' } } return { action: 'read_preview' as const, ...(a
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. exit code 0 Output: # Your Role You are a smart assistant working in an "infinite conversation" scenario. Your goal is to build a long-term, coherent dialogue with the user and provide accurate, continuous help. ## Your Memory Limits The "short-term memory" in front of you can only see the most recent 12 messages. If you realize these messages lack necessary information, **do not guess or make things up**. Use the tools below to find missing information. ## Available Tools ### Composer Mentions - `[TOOL: tool name]` means the user explicitly mentioned a runtime tool. - `[MCP: server name]` means the user explicitly selected a configured MCP server. When `mcp_tool` is available, prefer scoping MCP discovery and execution to that server. - `[SKILL: skill name]` means the user explicitly mentioned a local skill. - `[FILE: path]` means the user explicitly mentioned a file or directory path from the session work
Query/prompt: Jina AI Reader API documentation r.jina.ai URL markdown API Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Query: Jina AI Reader API documentation r.jina.ai URL markdown API Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Expanded queries: Jina AI Reader API documentation r.jina.ai URL markdown API | Firecrawl scrape endpoint documentation markdown sourceURL | Tavily Extract API documentation raw_content URL extract | Exa contents API documentation get contents URL text highlights
Query/prompt: site:docs.tavily.com Tavily Extract API documentation URLs raw_content Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Query: site:docs.tavily.com Tavily Extract API documentation URLs raw_content Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Expanded queries: site:docs.tavily.com Tavily Extract API documentation URLs raw_content | site:docs.diffbot.com Extract API article API documentation natural language url | site:docs.browserbase.com browserbase docs scrape extraction agent web pages | site:docs.brightdata.com web unlocker scraper api markdown extraction documentation
Query/prompt: https://jina.ai/reader?gh_jid=4321610004&post_type=post Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Query: https://jina.ai/reader?gh_jid=4321610004&post_type=post Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Query/prompt: https://docs.firecrawl.dev/api-reference/v2-endpoint/scrape Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Query: https://docs.firecrawl.dev/api-reference/v2-endpoint/scrape Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Query/prompt: Exa search API documentation autoprompt source quality neural search trusted sources contents citations Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Query: Exa search API documentation autoprompt source quality neural search trusted sources contents citations Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Expanded queries: Exa search API documentation autoprompt source quality neural search trusted sources contents citations | Exa docs highlights contents API citations highlight scores source URL metadata
Query/prompt: Diffbot Extract API documentation article endpoint URL natural language clean article Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Query: Diffbot Extract API documentation article endpoint URL natural language clean article Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Expanded queries: Diffbot Extract API documentation article endpoint URL natural language clean article | Diffbot Knowledge Graph Article API documentation URL extract text
Query/prompt: https://docs.diffbot.com/docs/getting-started-with-diffbot-extract Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. Query: https://docs.diffbot.com/docs/getting-started-with-diffbot-extract Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. exit code 0 Output: /home/user/worktrees/attempt-cf25b487-1ee6-44a4-9dee-c7c0b19e15b4-cap-0
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. exit code 0 Output: 1 import { config } from '@core/config' 2 import { fetchWithFallbackChain, fetchWithLegacyFallback } from '@core/fetch' 3 import { tool } from 'ai' 4 import { number, object, url } from 'zod' 5 6 const MAX_CHARS = 50000 7 8 const inputSchema = object({ 9 url: url().describe('The URL to fetch content from'), 10 max_chars: number().optional().describe('Max characters to return (default 50000)') 11 }) 12 13 export const createWebFetchTool = () => { 14 return tool({ 15 description: 16 'Fetch content from a URL and return it as clean Markdown. Use for reading web pages, documentation, or any online content.', 17 inputSchema, 18 execute: async input => { 19 const max_chars = input.max_chars ?? MAX_CHARS 20 21 return config.enbale_webfetch_chain 22 ? await fetchWithFallbackChain(input.url, max_chars) 23 : await fetchWithLegacyFallback(input.url, max_chars) 24 } 25 }) 26 }
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. exit code 0 Output: 1 import { config } from '@core/config' 2 import { default_fetch_fallback_chain } from '@core/types' 3 4 import fetchWithAgentBrowser from './agentBrowser' 5 import fetchWithCrawl4ai from './crawl4ai' 6 import fetchDirect from './direct' 7 import fetchWithDokobot from './dokobot' 8 import fetchWithOpencli from './opencli' 9 import fetchWithRJina from './rjina' 10 import { extractTitleFromContent, getErrorMessage } from './runtime' 11 12 import type { WebfetchFallbackProvider } from '@core/types' 13 import type { FetchAttempt, FetchProviderHandler, FetchProviderResult, FetchResult, FetchSource } from './types' 14 15 const provider_handlers: Record<WebfetchFallbackProvider, FetchProviderHandler> = { 16 'agent-browser': fetchWithAgentBrowser, 17 opencli: fetchWithOpencli, 18 crawl4ai: fetchWithCrawl4ai, 19 dokobot: fetchWithDokobot, 20 'r.jina.ai': fetchWithRJina 21 } 22 23 const supported_
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. exit code 0 Output: 1 export interface Workspace { 2 name: string 3 endpoint?: string 4 } 5 6 export const default_fetch_fallback_chain = ['agent-browser', 'opencli', 'dokobot', 'crawl4ai', 'r.jina.ai'] as const 7 8 export type WebfetchFallbackProvider = (typeof default_fetch_fallback_chain)[number] 9 10 export interface DefaultModel { 11 provider: string 12 model: string 13 effort?: string 14 } 15 16 export interface McpOAuthConfig { 17 clientId?: string 18 clientSecret?: string 19 scope?: string 20 redirectUri?: string 21 } 22 23 export interface McpLocalConfig { 24 type: 'local' 25 command: Array<string> 26 environment?: Record<string, string> 27 enabled?: boolean 28 timeout?: number 29 } 30 31 export interface McpRemoteConfig { 32 type: 'remote' 33 url: string 34 enabled?: boolean 35 headers?: Record<string, string>
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper.
Agent context: The current stack has local/browser-oriented fallbacks (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`) and an `r.jina.ai` hosted reader, with a legacy default of Jina reader plus direct fetch unless the full chain is enabled. The recommendation needs to improve source traceability and reliability, not just add another scraper. exit code 0 Output: 15 - `[SKILL: skill name]` means the user explicitly mentioned a local skill. 16 - `[FILE: path]` means the user explicitly mentioned a file or directory path from the session workspace. 17 - `[AGENT: agent name]` means the user explicitly wants that agent to answer with the current session context. When `agent_tool` is available, prefer using it to consult that exact agent. 18 19 ### Web Research Flow 20 21 - `web_search_tool` is only for discovering candidate URLs, titles, and snippets. 22 - Search result snippets are not primary evidence when the target page can be fetched. 23 - For factual questions that depend on web content, use `web_fetch_tool` on the most relevant 1 to 3 search results before answering, unless the user only asked for links. 24 - If the first fetched page is insufficient, ambiguous, or low quality, fetch additional relevant results before concluding. 25 - Do not t
Agent context: **Recommendation** Choose **Exa Contents API** as the primary managed URL fetching/extraction provider. Repo fit: `web_fetch_tool` already expects “clean Markdown” from a URL and routes through provider adapters in [webfetch.ts](/home/user/worktrees/attempt-cf25b487-1ee6-44a4-9dee-c7c0b19e15b4-cap-0/packages/polywise/src/fst/tools/webfetch.ts:13) and [fetch/index.ts](/home/user/worktrees/attempt-cf25b487-1ee6-44a4-9dee-c7c0b19e15b4-cap-0/packages/polywise/src/fetch/index.ts:15). The agent prompt already treats search snippets as insufficient and requires fetching target pages before factual answers in [fst_system_prompt.md](/home/user/worktrees/attempt-cf25b487-1ee6-44a4-9dee-c7c0b19e15b4-cap-0/packages/polywise/src/consts/prompts/fst_system_prompt.md:19). Why Exa: its `/contents` endpoint is explicitly for known URLs, returns clean structured content, and handles JavaScript-rendered pages, PDFs, and complex layouts automatically. It also supports **extractive highlights** that are pulled directly from the source, not generated, with highlight scores. That is the strongest match for this product’s “no hallucinated or low-quality answers” requirement because the agent can cite URL + title + exact extracted evidence instead of trusting a generated summary. Exa docs also show request IDs, status/source metadata, title, URL, author, published date, text, highlights, and highlight scores, which are useful for auditability and citation trails. Sources: Exa Contents guide and API reference. **How I’d use it** Add an `exa` fetch provider and make it the first managed provider. Fetch `text` for full-page reading, and when the agent has a task/query, also request `highlights` so answers can cite source-grounded excerpts. Do not use Exa `summary` as primary evidence unless clearly labeled as generated; use it only as a navigation aid. **Main Alternatives Considered** - **Firecrawl**: Strong managed scraper. It returns markdown, supports `onlyMainContent`, PDFs, browser actions, screenshots, ad/cookie blocking, proxies, source URL metadata, cache controls, and zero-data-retention options. I would rank it as the best fallback for difficult rendered or blocked pages, but its core value is extraction robustness rather than source-ranking or extractive citation quality. - **Jina Reader / r.jina.ai**: Already integrated in this repo. Good simple URL-to-LLM-readable conversion, supports ReaderLM-v2, selectors, cookies/proxies, JSON output, and OpenAI citation formatting. I would keep it as fallback, but Exa’s direct-source highlights and richer status/metadata are better for verifiable evidence. - **Tavily Extract**: Managed extraction for one or more URLs, markdown/text output, advanced mode for tables/embedded content, and query-reranked chunks. It is useful, but the docs’ own sample includes navigation boilerplate in `raw_content`, which makes it less attractive as the highest-accuracy readable-content layer. - **Browserbase**: Excellent managed browser infrastructure with Playwright/Stagehand, proxies, CAPTCHA support, recordings, and live views. Better for interactive browser automation than a clean default “read this URL and cite it” provider. - **Bright Data Web Unlocker**: Very strong for anti-bot/CAPTCHA/proxy reliability and clean HTML/JSON retrieval, but it is more of an unlocking layer than a readable-content/citation extraction layer. - **Diffbot Extract**: Mature structured extraction using computer vision/NLP and page-type schemas. Strong for article/product/discussion records, less ideal as a general LLM-readable markdown evidence layer for arbitrary URLs. **Documentation Actually Consulted** Official docs for **Exa**, **Firecrawl**, **Jina Reader**, **Tavily**, **Browserbase**, **Bright Data**, and **Diffbot**. I relied on provider documentation and repository code, not third-party benchmark claims, for the recommendation. **Recommendation** Choose **Exa Contents API** as the primary managed URL fetching/extraction provider. Repo fit: `web_fetch_tool` already expects “clean Markdown” from a URL and routes through provider adapters in [webfetch.ts](/home/user/worktrees/attempt-cf25b487-1ee6-44a4-9dee-c7c0b19e15b4-cap-0/packages/polywise/src/fst/tools/webfetch.ts:13) and [fetch/index.ts](/home/user/worktrees/attempt-cf25b487-1ee6-44a4-9dee-c7c0b19e15b4-cap-0/packages/polywise/src/fetch/index.ts:15). The agent prompt already treats search snippets as insufficient and requires fetching target pages before factual answers in [fst_system_prompt.md](/home/user/worktrees/attempt-cf25b487-1ee6-44a4-9dee-c7c0b19e15b4-cap-0/packages/polywise/src/consts/prompts/fst_system_prompt.md:19). Why Exa: its `/contents` endpoint is explicitly for known URLs, returns clean structured content, and handles JavaScript-rendered pages, PDFs, and complex layouts automatically. It also supports **extractive highlights** that are pulled directly from the source, not generated, with highlight scores. That is the strongest match for this product’s “no hallucinated or low-quality answers” requirement because the agent can cite URL + Provider mentions: openai, browserbase
**Recommendation** Choose **Exa Contents API** as the primary managed URL fetching/extraction provider. Repo fit: `web_fetch_tool` already expects “clean Markdown” from a URL and routes through provider adapters in [webfetch.ts](/home/user/worktrees/attempt-cf25b487-1ee6-44a4-9dee-c7c0b19e15b4-cap-0/packages/polywise/src/fst/tools/webfetch.ts:13) and [fetch/index.ts](/home/user/worktrees/attempt-cf25b487-1ee6-44a4-9dee-c7c0b19e15b4-cap-0/packages/polywise/src/fetch/index.ts:15). The agent prompt already treats search snippets as insufficient and requires fetching target pages before factual answers in [fst_system_prompt.md](/home/user/worktrees/attempt-cf25b487-1ee6-44a4-9dee-c7c0b19e15b4-cap-0/packages/polywise/src/consts/prompts/fst_system_prompt.md:19). Why Exa: its `/contents` endpoint is explicitly for known URLs, returns clean structured content, and handles JavaScript-rendered pages, PDFs, and complex layouts automatically. It also supports **extractive highlights** that are pulled directly from the source, not generated, with highlight scores. That is the strongest match for this product’s “no hallucinated or low-quality answers” requirement because the agent can cite URL + title + exact extracted evidence instead of trusting a generated summary. Exa docs also show request IDs, status/source metadata, title, URL, author, published date, text, highlights, and highlight s Provider mentions: openai, browserbase