Find
Search dashboard pages and product views.
Search dashboard pages and product views.
MatrixAges/polywise
Add the ability for this repository's agent to fetch and extract clean, readable content from the web pages/URLs it encounters, so it can use that content in its workflow. Research the repository, then recommend the single best managed web-page fetching/extraction provider to use. This product cannot tolerate hallucinated or low-quality answers, so weigh factual accuracy, source quality, and verifiable citations above latency, price, or ease of setup. Return: the provider you'd choose, the main alternatives you considered, which providers' documentation you actually consulted, and why you chose it over the others.
P1
1eab79217258b1314fe9302fb16113154535049b
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: EISDIR: illegal operation on a directory, read '/home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1' Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/.draft/2026-05-31-cli-input-schema-plan.md /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/.draft/2026-05-31-electron-close-behavior.md /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/.draft/2026-05-31-global-panel-mention-tools-plan.md /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 { 2 "name": "monorepo", 3 "author": "1yasa", 4 "packageManager": "pnpm@11.5.1", 5 "scripts": { 6 "deps:w": "ncu -u -i --workspaces", 7 "build:deps": "turbo run deps", 8 "build:workflows": "bun ./scripts/build_workflows.mjs", 9 "build:mac": "turbo run desktop#pack:mac", 10 "build:win": "turbo run desktop#pack:win", 11 "pcu": "pcu update -i --no-ai --no-backup", 12 "release": "VERSION=0.0.3 bun run ./scripts/release. Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 # <p align="center"> <img src="images/logo.png" width="24" height="24" alt="Polywise Logo"> Polywise </p> 2 3 <p align="center"><strong>The open source agentic content system</strong></p> 4 5 <p align="center"> 6 <a href="https://x.com/xiewendao"><img src="https://img.shields.io/badge/Follow-222?logo=X" alt="X"></a> 7 <a href="https://discord.com/invite/6MDTdVzR3Y"><img alt="Discord" src="https://img.shields.io/bad Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: (Bash completed with no output) Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: total 32 drwxr-xr-x 8 user user 4096 Jun 6 04:57 . drwxr-xr-x 15 user user 4096 Jun 6 04:57 .. drwxr-xr-x 22 user user 4096 Jun 6 04:57 app drwxr-xr-x 8 user user 4096 Jun 6 04:57 desktop drwxr-xr-x 3 user user 4096 Jun 6 04:57 erpc drwxr-xr-x 7 user user 4096 Jun 6 04:57 polywise drwxr-xr-x 3 user user 4096 Jun 6 04:57 stk drwxr-xr-x 20 user user 4096 Jun 6 04:57 website Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: packages/app/agentmap.md packages/desktop/agentmap.md packages/erpc/agentmap.md packages/polywise/agentmap.md packages/polywise/src/db/schema/agent.ts packages/polywise/src/db/schema/externals/agent_article.ts packages/polywise/src/db/schema/externals/agent_document.ts packages/polywise/src/db/schema/externals/agent_session.ts packages/polywise/src/db/schema/externals/agent_skill.ts packages/polywise/src/db/schema/ex Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/app/agentmap.md /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/desktop/agentmap.md /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/erpc/agentmap.md /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/agentmap.md /home/user/worktrees/a Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: Found 6 files packages/polywise/src/fst/tools/mcp.ts packages/polywise/src/fst/utils/getBashTools/index.ts packages/polywise/src/fst/mcp/client.ts packages/polywise/src/fst/mcp/createMcpClient.ts packages/polywise/src/fst/provider.ts packages/polywise/src/consts/providers/index.ts Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 { 2 "name": "polywise", 3 "version": "0.0.5", 4 "license": "MIT", 5 "homepage": "https://polywise.io", 6 "repository": { 7 "type": "git", 8 "url": "https://github.com/MatrixAges/polywise.git" 9 }, 10 "keywords": [ 11 "agent", 12 "decision", 13 "workflow", 14 "database", 15 "ai", 16 "memory", 17 "self-hosted", 18 "assistant", 19 "twin", 20 "rag", 21 "decision-system", 22 "graph-rag", 23 "polywise", 24 "llm-wiki" 25 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import chaos_detector_prompt from '@core/consts/prompts/chaos_detector_prompt.md' 2 import { Output, stepCountIs, ToolLoopAgent } from 'ai' 3 import { boolean, infer as Infer, object, string } from 'zod' 4 5 import type { LanguageModel } from 'ai' 6 7 const schema = object({ 8 is_chaos: boolean().describe('Whether the agent is stuck in repetitive or circular behavior'), 9 reason: string().describe('Explanation for Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { getAgentToolSystemPrompt } from '@core/consts/prompts/getAgentPrompt' 2 import getContextPrompt from '@core/consts/prompts/getContextPrompt' 3 import { getAgents } from '@core/db/services' 4 import { generateText, tool } from 'ai' 5 import dayjs from 'dayjs' 6 import { array, object, string, enum as zod_enum } from 'zod' 7 8 import getAgentModel from '../domains/group/runtime/getAgentModel' 9 10 import typ Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/agents/superego/content_tool.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/agents/system/tool.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/agents/tool/agent.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { htmlToMarkdown, runCommand, trimContent } from './runtime' 2 3 import type { FetchProviderHandler } from './types' 4 5 const fetchWithAgentBrowser: FetchProviderHandler = async ({ url, max_chars }) => { 6 const open_result = await runCommand('agent-browser', ['open', url], 30000) 7 8 if (open_result.exitCode !== 0) { 9 throw new Error(open_result.stderr || open_result.stdout || 'agent-browser open failed') Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/tools/webfetch.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/tools/websearch.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/rpc/linkcase/fetch.ts Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { config } from '@core/config' 2 import { fetchWithFallbackChain, fetchWithLegacyFallback } from '@core/fetch' 3 import { tool } from 'ai' 4 import { number, object, url } from 'zod' 5 6 const MAX_CHARS = 50000 7 8 const inputSchema = object({ 9 url: url().describe('The URL to fetch content from'), 10 max_chars: number().optional().describe('Max characters to return (default 50000)') 11 }) 12 13 export const Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { config } from '@core/config' 2 import { tool } from 'ai' 3 import TurndownService from 'turndown' 4 import { number, object, string } from 'zod' 5 6 const turndown = new TurndownService({ 7 headingStyle: 'atx', 8 hr: '---', 9 bulletListMarker: '-', 10 codeBlockStyle: 'fenced', 11 emDelimiter: '*' 12 }) 13 14 turndown.remove(['script', 'style', 'meta', 'link', 'noscript']) 15 16 const MAX_CHARS = 30000 17 1 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fetch/agentBrowser.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fetch/crawl4ai.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fetch/direct.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/p Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { config } from '@core/config' 2 import { default_fetch_fallback_chain } from '@core/types' 3 4 import fetchWithAgentBrowser from './agentBrowser' 5 import fetchWithCrawl4ai from './crawl4ai' 6 import fetchDirect from './direct' 7 import fetchWithDokobot from './dokobot' 8 import fetchWithOpencli from './opencli' 9 import fetchWithRJina from './rjina' 10 import { extractTitleFromContent, getErrorMessage } fr Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import type { WebfetchFallbackProvider } from '@core/types' 2 3 export type FetchSource = WebfetchFallbackProvider | 'direct' 4 5 export interface FetchAttempt { 6 source: FetchSource 7 error: string 8 } 9 10 export interface FetchProviderSuccess { 11 ok: true 12 source: FetchSource 13 content: string 14 truncated: boolean 15 title?: string 16 } 17 18 export interface FetchSuccess extends FetchProviderSuccess { 19 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/agents/permission/utils.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/domains/group/tools/createGroupCoordinationTool.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/domains/group/tools/createGroupMemberTool.ts /ho Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { config } from '@core/config' 2 3 import runHooks from '../../hooks/runHooks' 4 5 import type { Message } from '../../../types' 6 import type { ToolState } from '../../core/types' 7 import type Session from '../../index' 8 9 export default async (s: Session, message: Message, isFirst: boolean) => 10 runHooks(s, 'onTools', { 11 message, 12 isFirst, 13 hasTodo: await s.has_todo_session_link, 14 reportEnabled: Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { Bash, MountableFs, ReadWriteFs } from 'just-bash' 2 3 import { getBashTools } from '../utils' 4 5 import type { MountConfig } from 'just-bash' 6 import type Session from '../session' 7 8 export const createBashTool = async (s: Session) => { 9 const mounts = [] as Array<MountConfig> 10 11 for (const mount of s.additional_mounts) { 12 mounts.push({ 13 mountPoint: mount.mountPoint, 14 filesystem: new ReadWrit Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { embed, readUIMessageStream, tool, ToolLoopAgent } from 'ai' 2 import { object, string } from 'zod' 3 4 import google_search_agent_prompt from '../consts/prompts/google_search_agent_prompt.md' 5 6 import type { GoogleEmbeddingModelOptions, GoogleLanguageModelOptions } from '@ai-sdk/google' 7 import type { ProviderOptions } from '@ai-sdk/provider-utils' 8 import type { LanguageModel, ToolSet } from 'ai' 9 10 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import amazon_bedrock from './amazon_bedrock' 2 import anthropic from './anthropic' 3 import azure_openai from './azure_openai' 4 import cerebras from './cerebras' 5 import cohere from './cohere' 6 import deepinfra from './deepinfra' 7 import deepseek from './deepseek' 8 import fireworks from './fireworks' 9 import google_gemini from './google_gemini' 10 import groq from './groq' 11 import jina from './jina' 12 imp Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Provider mentions: together-ai, openai
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: -rw-r--r-- 1 user user 421 Jun 6 04:57 /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/clean.ts -rw-r--r-- 1 user user 4554 Jun 6 04:57 /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/duration.ts -rw-r--r-- 1 user user 146 Jun 6 04:57 /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/poly Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/agents/audit/index.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/agents/index.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/fst/agents/permission/index.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec8 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import linkcasePlugin from './plugins/linkcasePlugin' 2 3 export default [linkcasePlugin] 4 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 export { default as Session } from './session' 2 3 export * from './clean' 4 export * from './mcp' 5 export * from './telemetry' 6 export * from './types' 7 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: (Bash completed with no output) Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: (Bash completed with no output) Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: total 72 drwxr-xr-x 16 user user 4096 Jun 6 04:57 . drwxr-xr-x 9 user user 4096 Jun 6 04:57 .. drwxr-xr-x 8 user user 4096 Jun 6 04:57 caps drwxr-xr-x 2 user user 4096 Jun 6 04:57 config drwxr-xr-x 2 user user 4096 Jun 6 04:57 context drwxr-xr-x 2 user user 4096 Jun 6 04:57 core drwxr-xr-x 2 user user 4096 Jun 6 04:57 hooks -rw-r--r-- 1 user user 7988 Jun 6 04:57 index.ts drwxr-xr-x 2 user user 4096 Jun 6 04:57 kerne Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import events from 'events' 2 import path from 'path' 3 import { app } from '@core/consts' 4 import getContextPrompt from '@core/consts/prompts/getContextPrompt' 5 6 import { loadMcpTools } from '../mcp' 7 import { updateTitle } from '../tools' 8 import { loadCustomToolsMap } from '../tools/meta' 9 import { loadSkillMap } from '../tools/skill' 10 import getConfig from './config/getConfig' 11 import setConfig from ' Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 export { default as createMcpClient } from './createMcpClient' 2 export { clearMcpClient, getMcpClient, hasMcpClient, mcp_client_map } from './client' 3 export { default as getEnabledMcps } from './getEnabledMcps' 4 export { default as initMcps } from './initMcps' 5 export { default as listConfiguredMcps } from './listConfiguredMcps' 6 export { default as loadConfig } from './loadConfig' 7 export { default as loadM Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { tool } from 'ai' 2 import { enum as Enum, number, object, string } from 'zod' 3 4 import { getMcpClient, hasMcpClient, listConfiguredMcps } from '../mcp' 5 6 import type { ListToolsResult, MCPClient } from '@ai-sdk/mcp' 7 import type Session from '../session' 8 9 type MCPTool = ListToolsResult['tools'][number] 10 11 const inputSchema = object({ 12 action: Enum(['list_servers', 'search_tools', 'read_tool', Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { boolean, number, object, string } from 'zod' 2 3 import { p } from '../../utils/trpc' 4 import { fetchLinkcaseLink } from './utils' 5 6 const input_type = object({ 7 id: string(), 8 exec_pipeline: boolean().optional(), 9 max_chars: number().int().positive().optional() 10 }) 11 12 export default p 13 .meta({ 14 openapi: { 15 method: 'POST', 16 path: '/linkcase/fetch', 17 description: 'Fetch remote content f Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/rpc/linkcase/agentBindings.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/rpc/linkcase/create.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/rpc/linkcase/createSchedule.ts /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: /home/user/worktrees/attempt-8d8b6f87-a47e-4f19-9d0f-3c57cec803e0-cap-1/packages/polywise/src/config Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: total 24 drwxr-xr-x 2 user user 4096 Jun 6 04:57 . drwxr-xr-x 22 user user 4096 Jun 6 04:57 .. -rw-r--r-- 1 user user 216 Jun 6 04:57 index.ts -rw-r--r-- 1 user user 626 Jun 6 04:57 initConfig.ts -rw-r--r-- 1 user user 7279 Jun 6 04:57 loadConfig.ts Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import type { AppConfig, ProviderConfig } from '@core/types' 2 3 export const config = {} as AppConfig 4 export const providers = {} as ProviderConfig 5 6 export { default as initConfig, config_watcher } from './initConfig' 7 Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { preset_providers } from '@core/consts/providers' 2 import { resetRemoteEmbeddingRunner } from '@core/pipeline/genEmbedding' 3 import { resetRemoteRerankRunner } from '@core/pipeline/genRerank' 4 import { default_fetch_fallback_chain } from '@core/types' 5 import { clearObject, initDefaults, log } from '@core/utils' 6 import { to } from 'await-to-js' 7 import fs from 'fs-extra' 8 9 import { config_path, pro Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: 1 import { blob, index, integer, real, sqliteTable, text } from 'drizzle-orm/sqlite-core' 2 import { getId } from 'stk/utils' 3 4 import type { TableModel } from '@core/types' 5 6 export default sqliteTable( 7 'agent', 8 { 9 id: text('id').primaryKey().$defaultFn(getId), 10 // Agent name 11 name: text('name').notNull(), 12 // Agent role 13 role: text('role').notNull(), 14 // Agent description 15 description: text('de Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers. Tool result summary: # Agent Map This document is an outline-level map of `packages/polywise`. It captures the package's long-lived runtime domains instead of enumerating every leaf file. ## 1. Module Overview - **Description**: Polywise backend and AI runtime package. - **Architecture**: TypeScript service runtime around search/memory pipelines, session orchestration, and SQLite-backed state. ## 2. Outline Tree ```json { "entry_and_boot Agent context: I'll research the repository structure to understand the agent architecture, then evaluate web-fetching providers.
Agent context: Let me examine the existing fetch provider implementations directly to understand the interface and how a new provider would plug in. Let me examine the existing fetch provider implementations directly to understand the interface and how a new provider would plug in.
Agent context: Let me examine the existing fetch provider implementations directly to understand the interface and how a new provider would plug in. Tool result summary: 1 import { config } from '@core/config' 2 import { default_fetch_fallback_chain } from '@core/types' 3 4 import fetchWithAgentBrowser from './agentBrowser' 5 import fetchWithCrawl4ai from './crawl4ai' 6 import fetchDirect from './direct' 7 import fetchWithDokobot from './dokobot' 8 import fetchWithOpencli from './opencli' 9 import fetchWithRJina from './rjina' 10 import { extractTitleFromContent, getErrorMessage } fr Agent context: Let me examine the existing fetch provider implementations directly to understand the interface and how a new provider would plug in.
Agent context: Let me examine the existing fetch provider implementations directly to understand the interface and how a new provider would plug in. Tool result summary: 1 import { config } from '@core/config' 2 3 import { trimContent } from './runtime' 4 5 import type { FetchProviderHandler } from './types' 6 7 const fetchWithRJina: FetchProviderHandler = async ({ url, max_chars }) => { 8 const jina_api_key = config.jina_api_key?.trim() 9 const resp = await fetch(`https://r.jina.ai/${url}`, { 10 signal: AbortSignal.timeout(30000), 11 headers: { 12 ...(jina_api_key ? { Authorization: Agent context: Let me examine the existing fetch provider implementations directly to understand the interface and how a new provider would plug in.
Agent context: Let me examine the existing fetch provider implementations directly to understand the interface and how a new provider would plug in. Tool result summary: 1 import type { WebfetchFallbackProvider } from '@core/types' 2 3 export type FetchSource = WebfetchFallbackProvider | 'direct' 4 5 export interface FetchAttempt { 6 source: FetchSource 7 error: string 8 } 9 10 export interface FetchProviderSuccess { 11 ok: true 12 source: FetchSource 13 content: string 14 truncated: boolean 15 title?: string 16 } 17 18 export interface FetchSuccess extends FetchProviderSuccess { 19 Agent context: Let me examine the existing fetch provider implementations directly to understand the interface and how a new provider would plug in.
Agent context: The fetch provider interface is clear. Let me check the provider types/config and then research the actual provider documentation on the web. The fetch provider interface is clear. Let me check the provider types/config and then research the actual provider documentation on the web.
Agent context: The fetch provider interface is clear. Let me check the provider types/config and then research the actual provider documentation on the web. Tool result summary: packages/polywise/src/types/config.ts:6:export const default_fetch_fallback_chain = ['agent-browser', 'opencli', 'dokobot', 'crawl4ai', 'r.jina.ai'] as const packages/polywise/src/types/config.ts:8:export type WebfetchFallbackProvider = (typeof default_fetch_fallback_chain)[number] packages/polywise/src/types/config.ts:101: jina_api_key?: string packages/polywise/src/types/config.ts:105: fetch_fallback_chain: Array<W Agent context: The fetch provider interface is clear. Let me check the provider types/config and then research the actual provider documentation on the web.
Agent context: I understand the integration surface. Now let me research the candidate managed extraction providers' actual documentation, focusing on accuracy and citation quality. I understand the integration surface. Now let me research the candidate managed extraction providers' actual documentation, focusing on accuracy and citation quality.
Query/prompt: Describe the /scrape endpoint: what output formats (markdown, etc.) it returns, how it handles content extraction/cleaning, JS rendering, metadata returned, and any features related to extraction accuracy or main-content detection. Include the API endpoint and auth method. Agent context: I understand the integration surface. Now let me research the candidate managed extraction providers' actual documentation, focusing on accuracy and citation quality. Fetch prompt: Describe the /scrape endpoint: what output formats (markdown, etc.) it returns, how it handles content extraction/cleaning, JS rendering, metadata returned, and any features related to extraction accuracy or main-content detection. Include the API endpoint and auth method. Tool result summary: # Firecrawl /scrape Endpoint Overview ## Core Functionality The `/scrape` endpoint converts web pages into structured data suitable for LLM applications. It "converts web pages into markdown, ideal for LLM applications" while managing technical complexities like proxies, caching, rate limits, and JavaScript-rendered content. ## Output Formats The endpoint supports multiple output formats: - **Markdown & HTML**: Clean Agent context: I understand the integration surface. Now let me research the candidate managed extraction providers' actual documentation, focusing on accuracy and citation quality.
Query/prompt: Describe the Tavily Extract endpoint: input (URLs), output format, how it extracts clean content, extraction depth options, and how Tavily positions itself on factual accuracy and being optimized for RAG/LLM grounding. Include endpoint URL and auth method. Agent context: I understand the integration surface. Now let me research the candidate managed extraction providers' actual documentation, focusing on accuracy and citation quality. Fetch prompt: Describe the Tavily Extract endpoint: input (URLs), output format, how it extracts clean content, extraction depth options, and how Tavily positions itself on factual accuracy and being optimized for RAG/LLM grounding. Include endpoint URL and auth method. Tool result summary: # Tavily Extract Endpoint Overview ## Endpoint & Authentication **URL:** `POST https://api.tavily.com/extract` **Auth Method:** Bearer token authentication using your Tavily API key in the format `Bearer tvly-YOUR_API_KEY` ## Input Parameters The endpoint accepts one or more URLs as input, either as a single string or array. Optional parameters include: - `query`: User intent for reranking extracted chunks based on r Agent context: I understand the integration surface. Now let me research the candidate managed extraction providers' actual documentation, focusing on accuracy and citation quality.
Query/prompt: Describe Exa's get contents endpoint: how it returns clean page content from URLs, highlights, summaries, and any features for citations or verifiable source content. Include endpoint and auth. Agent context: I understand the integration surface. Now let me research the candidate managed extraction providers' actual documentation, focusing on accuracy and citation quality. Fetch prompt: Describe Exa's get contents endpoint: how it returns clean page content from URLs, highlights, summaries, and any features for citations or verifiable source content. Include endpoint and auth. Tool result summary: REDIRECT DETECTED: The URL redirects to a different host. Original URL: https://docs.exa.ai/reference/get-contents Redirect URL: https://exa.ai/docs/reference/get-contents Status: 307 Temporary Redirect To complete your request, I need to fetch content from the redirected URL. Please use WebFetch again with these parameters: - url: "https://exa.ai/docs/reference/get-contents" - prompt: "Describe Exa's get contents en Agent context: I understand the integration surface. Now let me research the candidate managed extraction providers' actual documentation, focusing on accuracy and citation quality.
Agent context: Let me consult Exa (redirected), Diffbot, and the Jina Reader docs to round out the comparison. Let me consult Exa (redirected), Diffbot, and the Jina Reader docs to round out the comparison.
Query/prompt: Describe Exa's get contents endpoint: how it returns clean page content from URLs, highlights, summaries, livecrawl, and any features for citations or verifiable source content. Include endpoint and auth. Agent context: Let me consult Exa (redirected), Diffbot, and the Jina Reader docs to round out the comparison. Fetch prompt: Describe Exa's get contents endpoint: how it returns clean page content from URLs, highlights, summaries, livecrawl, and any features for citations or verifiable source content. Include endpoint and auth. Tool result summary: # Exa's Get Contents Endpoint ## Overview Exa's `/contents` endpoint retrieves full page content from URLs, returning clean text with optional enrichments like summaries and highlights. ## Endpoint Details - **URL:** `https://api.exa.ai/contents` - **Method:** POST - **Authentication:** API key via `x-api-key` header or Bearer token ## Key Features **Text Extraction** The endpoint can return page content in multiple Agent context: Let me consult Exa (redirected), Diffbot, and the Jina Reader docs to round out the comparison.
Query/prompt: Describe Diffbot's Extract API (Article API in particular): how it extracts clean article content from URLs, structured fields returned, accuracy/ML-based extraction, provenance, and how it positions itself on accuracy. Include endpoint and auth. Agent context: Let me consult Exa (redirected), Diffbot, and the Jina Reader docs to round out the comparison. Fetch prompt: Describe Diffbot's Extract API (Article API in particular): how it extracts clean article content from URLs, structured fields returned, accuracy/ML-based extraction, provenance, and how it positions itself on accuracy. Include endpoint and auth. Tool result summary: # Diffbot's Extract API Overview Based on the provided documentation excerpt, here's what can be determined about Diffbot's Extract API: ## Core Functionality The Extract API leverages "computer vision and natural language processing to automatically categorize and extract their contents into clean, structured JSON." This technology-driven approach enables automated content processing from web pages. ## Capabilities Agent context: Let me consult Exa (redirected), Diffbot, and the Jina Reader docs to round out the comparison.
Query/prompt: Firecrawl vs Tavily vs Jina Reader vs Diffbot extraction accuracy clean content for RAG LLM 2026 Agent context: Let me consult Exa (redirected), Diffbot, and the Jina Reader docs to round out the comparison. Query: Firecrawl vs Tavily vs Jina Reader vs Diffbot extraction accuracy clean content for RAG LLM 2026 10 search result links captured Tool result summary: Based on the search results, here's a comprehensive comparison of these four web extraction tools for RAG and LLM applications in 2026: ## Extraction Accuracy & Clean Content **Firecrawl vs Tavily:** Firecrawl achieved 77.2% coverage versus Tavily at 67.8% in benchmark tests, meaning Firecrawl succeeded on more URLs in the same test set. Firecrawl handles full web extraction - search, crawl, format, while Tavily is b Agent context: Let me consult Exa (redirected), Diffbot, and the Jina Reader docs to round out the comparison.
Agent context: ## Recommendation: **Firecrawl** (`/v2/scrape`) For a system that "cannot tolerate hallucinated or low-quality answers," the real extraction risk isn't speed — it's a provider returning *lossy or AI-rewritten* content instead of the page's actual text. Firecrawl best avoids that. ### How it fits this repo Your fetch layer (`packages/polywise/src/fetch/`) already uses a clean `FetchProviderHandler` contract returning `{ content (markdown), title, truncated, source }` with a fallback chain (`agent-browser → opencli → dokobot → crawl4ai → r.jina.ai`). Firecrawl drops in as one more handler beside `rjina.ts`: - Returns **verbatim clean markdown** of the real page (no summarization → no hallucination introduced at extraction). - `only_main_content: true` strips nav/sidebar noise → high source quality. - Full Playwright JS rendering → faithful capture of JS-heavy pages. - Metadata (title, canonical sourceURL, status, OG data) maps directly to your `title`/`source` fields → **verifiable citations**. - Add `'firecrawl'` to `WebfetchFallbackProvider` in `types/config.ts`, a `firecrawl_api_key` field, and register in `index.ts` `provider_handlers`. ### Why over the alternatives - **Tavily Extract** — reranks/chunks to 500-char snippets by query; *drops* content. Faithfulness risk; it's really a search tool. - **Exa `/contents`** — caps at 10k chars and offers LLM summaries/highlights; summarization is a hallucination vector. Better as search-with-content. - **Jina Reader** (already integrated as `r.jina.ai`) — fine for plain articles but weaker JS rendering and more leftover noise; keep it as a fallback, not the primary. - **Diffbot** — the closest rival and arguably the *highest* raw extraction accuracy (computer-vision + explicit author/date/URL provenance). Runner-up. Lost because it returns **structured JSON, not clean readable markdown** (interface mismatch with your handler) and is enterprise-gated. Worth revisiting if you later want structured provenance fields. Firecrawl also led the cited head-to-head coverage benchmark (77.2% vs Tavily 67.8%), meaning fewer silent empty results that could push the agent to guess. ### Documentation I actually consulted - Firecrawl `/scrape` — formats, `only_main_content`, JS rendering, metadata, auth - Tavily `/extract` — depth modes, chunking behavior - Exa `/contents` — char caps, summaries/highlights, citation metadata - Diffbot Extract API intro — CV/NLP structured JSON extraction - A 2026 comparison search for benchmark/coverage data I've only researched and recommended — no code written yet. Want me to implement the Firecrawl provider into the fallback chain? Sources: - [Firecrawl /scrape docs](https://docs.firecrawl.dev/features/scrape) - [Tavily Extract docs](https://docs.tavily.com/documentation/api-reference/endpoint/extract) - [Exa get-contents docs](https://exa.ai/docs/reference/get-contents) - [Diffbot Extract docs](https://docs.diffbot.com/reference/extract-introduction) - [Firecrawl vs Tavily comparison (2026)](https://blog.apify.com/firecrawl-vs-tavily/) - [Exa vs Tavily vs Firecrawl benchmark (2026)](https://apigene.ai/blog/exa-vs-tavily-vs-firecrawl-2026-web-scraping-benchmark) ## Recommendation: **Firecrawl** (`/v2/scrape`) For a system that "cannot tolerate hallucinated or low-quality answers," the real extraction risk isn't speed — it's a provider returning *lossy or AI-rewritten* content instead of the page's actual text. Firecrawl best avoids that. ### How it fits this repo Your fetch layer (`packages/polywise/src/fetch/`) already uses a clean `FetchProviderHandler` contract returning `{ content (markdown), title, truncated, source }` with a fallback chain (`agent-browser → opencli → dokobot → crawl4ai → r.jina.ai`). Firecrawl drops in as one more handler beside `rjina.ts`: - Returns **verbatim clean markdown** of the real page (no summarization → no hallucination introduced at extraction). - `only_main_content: true` strips nav/sidebar noise → high source quality. - Full Playwright JS rendering → faithful capture of JS-heavy pages. - Metadata (title, canonical sourceURL, status, OG data) maps directly to your `title`/`source` fields → **verifiable citations**. - Add `'firecrawl'` to `WebfetchFallbackProvider` in `types/config.ts`, a `firecrawl_api_key` field, and register in `index.ts` `provider_handlers`. ### Why over the alternatives - **Tavily Ex
## Recommendation: **Firecrawl** (`/v2/scrape`) For a system that "cannot tolerate hallucinated or low-quality answers," the real extraction risk isn't speed — it's a provider returning *lossy or AI-rewritten* content instead of the page's actual text. Firecrawl best avoids that. ### How it fits this repo Your fetch layer (`packages/polywise/src/fetch/`) already uses a clean `FetchProviderHandler` contract returning `{ content (markdown), title, truncated, source }` with a fallback chain (`agent-browser → opencli → dokobot → crawl4ai → r.jina.ai`). Firecrawl drops in as one more handler beside `rjina.ts`: - Returns **verbatim clean markdown** of the real page (no summarization → no hallucination introduced at extraction). - `only_main_content: true` strips nav/sidebar noise → high source quality. - Full Playwright JS rendering → faithful capture of JS-heavy pages. - Metadata (title, canonical sourceURL, status, OG data) maps directly to your `title`/`source` fields → **verifiable citations**. - Add `'firecrawl'` to `WebfetchFallbackProvider` in `types/config.ts`, a `firecrawl_api_key` field, and register in `index.ts` `provider_handlers`. ### Why over the alternatives - **Tavily Extract** — reranks/chunks to 500-char snippets by query; *drops* content. Faithfulness risk; it's really a search tool. - **Exa `/contents`** — caps at 10k chars and offers LLM summaries/highlights; su