Find
Search dashboard pages and product views.
Search dashboard pages and product views.
MatrixAges/polywise
Add the ability for this repository's agent to fetch and extract clean, readable content from the web pages/URLs it encounters, so it can use that content in its workflow. Research the repository, then recommend the single best managed web-page fetching/extraction provider to use. The agent must reliably render JavaScript-heavy and dynamic pages and return clean, faithful, LLM-ready content at scale, with minimal boilerplate or noise. Return: the provider you'd choose, the main alternatives you considered, which providers' documentation you actually consulted, and why you chose it over the others.
P3F
1eab79217258b1314fe9302fb16113154535049b
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/README.md /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/package.json /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/readme/README.ar.md /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/readme/README.bn.md /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: 1 # <p align="center"> <img src="images/logo.png" width="24" height="24" alt="Polywise Logo"> Polywise </p> 2 3 <p align="center"><strong>The open source agentic content system</strong></p> 4 5 <p align="center"> 6 <a href="https://x.com/xiewendao"><img src="https://img.shields.io/badge/Follow-222?logo=X" alt="X"></a> 7 <a href="https://discord.com/invite/6MDTdVzR3Y"><img alt="Discord" src="https://img.shields.io/bad Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: 1 { 2 "name": "monorepo", 3 "author": "1yasa", 4 "packageManager": "pnpm@11.5.1", 5 "scripts": { 6 "deps:w": "ncu -u -i --workspaces", 7 "build:deps": "turbo run deps", 8 "build:workflows": "bun ./scripts/build_workflows.mjs", 9 "build:mac": "turbo run desktop#pack:mac", 10 "build:win": "turbo run desktop#pack:win", 11 "pcu": "pcu update -i --no-ai --no-backup", 12 "release": "VERSION=0.0.3 bun run ./scripts/release. Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: total 1108 drwxr-xr-x 15 user user 4096 Jun 6 04:58 . drwxr-xr-x 3 user user 4096 Jun 6 04:58 .. drwxr-xr-x 3 user user 4096 Jun 6 04:58 .agent drwxr-xr-x 3 user user 4096 Jun 6 04:58 .agents drwxr-xr-x 3 user user 4096 Jun 6 04:58 .codex drwxr-xr-x 2 user user 4096 Jun 6 04:58 .draft -rw-r--r-- 1 user user 170 Jun 6 04:58 .git drwxr-xr-x 4 user user 4096 Jun 6 04:58 .github -rw-r--r-- 1 user user 391 Jun 6 04:58 .gi Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: total 32 drwxr-xr-x 8 user user 4096 Jun 6 04:58 . drwxr-xr-x 15 user user 4096 Jun 6 04:58 .. drwxr-xr-x 22 user user 4096 Jun 6 04:58 app drwxr-xr-x 8 user user 4096 Jun 6 04:58 desktop drwxr-xr-x 3 user user 4096 Jun 6 04:58 erpc drwxr-xr-x 7 user user 4096 Jun 6 04:58 polywise drwxr-xr-x 3 user user 4096 Jun 6 04:58 stk drwxr-xr-x 20 user user 4096 Jun 6 04:58 website Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: Found 4 files packages/website/package.json packages/polywise/package.json packages/desktop/package.json packages/app/package.json Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: 1 { 2 "name": "polywise", 3 "version": "0.0.5", 4 "license": "MIT", 5 "homepage": "https://polywise.io", 6 "repository": { 7 "type": "git", 8 "url": "https://github.com/MatrixAges/polywise.git" 9 }, 10 "keywords": [ 11 "agent", 12 "decision", 13 "workflow", 14 "database", 15 "ai", 16 "memory", 17 "self-hosted", 18 "assistant", 19 "twin", 20 "rag", 21 "decision-system", 22 "graph-rag", 23 "polywise", 24 "llm-wiki" 25 Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/datasets /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/drizzle /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/drizzle/20260601134112_hesitant_famine /hom Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/.draft/agent-private-knowledge-import-export-plan.md /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/app/agentmap.md /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/desktop/agentmap.md /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/erpc/agentmap.md Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/api /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/callback /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/cli /home/user/worktrees/attemp Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: Found 20 files limit: 20 packages/polywise/src/utils/error_handler.ts packages/polywise/src/utils/guard.ts packages/polywise/src/utils/index.ts packages/polywise/src/utils/initDefaults.ts packages/polywise/src/utils/initServer.ts packages/polywise/src/utils/log.ts packages/polywise/src/utils/middleware.ts packages/polywise/src/utils/openapi_handler.ts packages/polywise/src/utils/rstream/index.ts packages/polywise/src Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: total 44 drwxr-xr-x 2 user user 4096 Jun 6 04:58 . drwxr-xr-x 22 user user 4096 Jun 6 04:58 .. -rw-r--r-- 1 user user 1252 Jun 6 04:58 agentBrowser.ts -rw-r--r-- 1 user user 2100 Jun 6 04:58 crawl4ai.ts -rw-r--r-- 1 user user 795 Jun 6 04:58 direct.ts -rw-r--r-- 1 user user 2064 Jun 6 04:58 dokobot.ts -rw-r--r-- 1 user user 3494 Jun 6 04:58 index.ts -rw-r--r-- 1 user user 2110 Jun 6 04:58 opencli.ts -rw-r--r-- 1 user Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/consts/prompts/fst_report_tool_prompt.md /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/consts/prompts/fst_system_tool_prompt.md /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/consts/prompts/fst_title_tool_prompt.md /home/user/workt Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: 1 import { blob, index, integer, real, sqliteTable, text } from 'drizzle-orm/sqlite-core' 2 import { getId } from 'stk/utils' 3 4 import type { TableModel } from '@core/types' 5 6 export default sqliteTable( 7 'agent', 8 { 9 id: text('id').primaryKey().$defaultFn(getId), 10 // Agent name 11 name: text('name').notNull(), 12 // Agent role 13 role: text('role').notNull(), 14 // Agent description 15 description: text('de Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: 1 import { config } from '@core/config' 2 import { default_fetch_fallback_chain } from '@core/types' 3 4 import fetchWithAgentBrowser from './agentBrowser' 5 import fetchWithCrawl4ai from './crawl4ai' 6 import fetchDirect from './direct' 7 import fetchWithDokobot from './dokobot' 8 import fetchWithOpencli from './opencli' 9 import fetchWithRJina from './rjina' 10 import { extractTitleFromContent, getErrorMessage } fr Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: EISDIR: illegal operation on a directory, read '/home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/config' Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: total 24 drwxr-xr-x 2 user user 4096 Jun 6 04:58 . drwxr-xr-x 22 user user 4096 Jun 6 04:58 .. -rw-r--r-- 1 user user 216 Jun 6 04:58 index.ts -rw-r--r-- 1 user user 626 Jun 6 04:58 initConfig.ts -rw-r--r-- 1 user user 7279 Jun 6 04:58 loadConfig.ts Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/im/config.ts /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/types/config.ts Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: 1 import { preset_providers } from '@core/consts/providers' 2 import { resetRemoteEmbeddingRunner } from '@core/pipeline/genEmbedding' 3 import { resetRemoteRerankRunner } from '@core/pipeline/genRerank' 4 import { default_fetch_fallback_chain } from '@core/types' 5 import { clearObject, initDefaults, log } from '@core/utils' 6 import { to } from 'await-to-js' 7 import fs from 'fs-extra' 8 9 import { config_path, pro Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: 1 export interface Workspace { 2 name: string 3 endpoint?: string 4 } 5 6 export const default_fetch_fallback_chain = ['agent-browser', 'opencli', 'dokobot', 'crawl4ai', 'r.jina.ai'] as const 7 8 export type WebfetchFallbackProvider = (typeof default_fetch_fallback_chain)[number] 9 10 export interface DefaultModel { 11 provider: string 12 model: string 13 effort?: string 14 } 15 16 export interface McpOAuthConfig { 1 Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/rpc/linkcase/agentBindings.ts Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: total 136 drwxr-xr-x 23 user user 4096 Jun 6 04:58 . drwxr-xr-x 22 user user 4096 Jun 6 04:58 .. drwxr-xr-x 2 user user 4096 Jun 6 04:58 agent drwxr-xr-x 2 user user 4096 Jun 6 04:58 article drwxr-xr-x 2 user user 4096 Jun 6 04:58 auth drwxr-xr-x 2 user user 4096 Jun 6 04:58 file drwxr-xr-x 3 user user 4096 Jun 6 04:58 group -rw-r--r-- 1 user user 701 Jun 6 04:58 heartbeat.ts drwxr-xr-x 2 user user 4096 Jun 6 04:58 h Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: total 172 drwxr-xr-x 2 user user 4096 Jun 6 04:58 . drwxr-xr-x 23 user user 4096 Jun 6 04:58 .. -rw-r--r-- 1 user user 1244 Jun 6 04:58 addArticle.ts -rw-r--r-- 1 user user 4558 Jun 6 04:58 create.ts -rw-r--r-- 1 user user 728 Jun 6 04:58 createSession.ts -rw-r--r-- 1 user user 464 Jun 6 04:58 exportPack.ts -rw-r--r-- 1 user user 842 Jun 6 04:58 getArticles.ts -rw-r--r-- 1 user user 1158 Jun 6 04:58 getPrivateArticle Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: File does not exist. Note: your current working directory is /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0. Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: total 172 drwxr-xr-x 2 user user 4096 Jun 6 04:58 . drwxr-xr-x 23 user user 4096 Jun 6 04:58 .. -rw-r--r-- 1 user user 1244 Jun 6 04:58 addArticle.ts -rw-r--r-- 1 user user 4558 Jun 6 04:58 create.ts -rw-r--r-- 1 user user 728 Jun 6 04:58 createSession.ts -rw-r--r-- 1 user user 464 Jun 6 04:58 exportPack.ts -rw-r--r-- 1 user user 842 Jun 6 04:58 getArticles.ts -rw-r--r-- 1 user user 1158 Jun 6 04:58 getPrivateArticle Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: EISDIR: illegal operation on a directory, read '/home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/fst' Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: total 72 drwxr-xr-x 9 user user 4096 Jun 6 04:58 . drwxr-xr-x 22 user user 4096 Jun 6 04:58 .. drwxr-xr-x 11 user user 4096 Jun 6 04:58 agents -rw-r--r-- 1 user user 421 Jun 6 04:58 clean.ts drwxr-xr-x 8 user user 4096 Jun 6 04:58 domains -rw-r--r-- 1 user user 4554 Jun 6 04:58 duration.ts -rw-r--r-- 1 user user 146 Jun 6 04:58 index.ts drwxr-xr-x 2 user user 4096 Jun 6 04:58 mcp -rw-r--r-- 1 user user 15691 Jun 6 04 Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/api /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/callback /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/cli /home/user/worktrees/attemp Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: EISDIR: illegal operation on a directory, read '/home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/polywise/src/fst/agents' Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: total 48 drwxr-xr-x 11 user user 4096 Jun 6 04:58 . drwxr-xr-x 9 user user 4096 Jun 6 04:58 .. drwxr-xr-x 2 user user 4096 Jun 6 04:58 audit -rw-r--r-- 1 user user 213 Jun 6 04:58 index.ts drwxr-xr-x 2 user user 4096 Jun 6 04:58 permission drwxr-xr-x 2 user user 4096 Jun 6 04:58 skill_creator drwxr-xr-x 2 user user 4096 Jun 6 04:58 superego drwxr-xr-x 2 user user 4096 Jun 6 04:58 supervisor drwxr-xr-x 2 user user 409 Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider. Tool result summary: 1 import { embed, readUIMessageStream, tool, ToolLoopAgent } from 'ai' 2 import { object, string } from 'zod' 3 4 import google_search_agent_prompt from '../consts/prompts/google_search_agent_prompt.md' 5 6 import type { GoogleEmbeddingModelOptions, GoogleLanguageModelOptions } from '@ai-sdk/google' 7 import type { ProviderOptions } from '@ai-sdk/provider-utils' 8 import type { LanguageModel, ToolSet } from 'ai' 9 10 Agent context: I'll research the repository to understand its stack and how the agent works, then recommend a web-fetching provider.
Agent context: There's already a fetch module with multiple providers. Let me examine the existing fetch implementation and webfetch tool closely to understand how to add a new provider. There's already a fetch module with multiple providers. Let me examine the existing fetch implementation and webfetch tool closely to understand how to add a new provider.
Agent context: There's already a fetch module with multiple providers. Let me examine the existing fetch implementation and webfetch tool closely to understand how to add a new provider. Tool result summary: 1 import { config } from '@core/config' 2 import { default_fetch_fallback_chain } from '@core/types' 3 4 import fetchWithAgentBrowser from './agentBrowser' 5 import fetchWithCrawl4ai from './crawl4ai' 6 import fetchDirect from './direct' 7 import fetchWithDokobot from './dokobot' 8 import fetchWithOpencli from './opencli' 9 import fetchWithRJina from './rjina' 10 import { extractTitleFromContent, getErrorMessage } fr Agent context: There's already a fetch module with multiple providers. Let me examine the existing fetch implementation and webfetch tool closely to understand how to add a new provider.
Agent context: There's already a fetch module with multiple providers. Let me examine the existing fetch implementation and webfetch tool closely to understand how to add a new provider. Tool result summary: 1 import { config } from '@core/config' 2 import { fetchWithFallbackChain, fetchWithLegacyFallback } from '@core/fetch' 3 import { tool } from 'ai' 4 import { number, object, url } from 'zod' 5 6 const MAX_CHARS = 50000 7 8 const inputSchema = object({ 9 url: url().describe('The URL to fetch content from'), 10 max_chars: number().optional().describe('Max characters to return (default 50000)') 11 }) 12 13 export const Agent context: There's already a fetch module with multiple providers. Let me examine the existing fetch implementation and webfetch tool closely to understand how to add a new provider.
Agent context: Let me examine the provider implementations, types, and config to understand the integration pattern. Let me examine the provider implementations, types, and config to understand the integration pattern.
Agent context: Let me examine the provider implementations, types, and config to understand the integration pattern. Tool result summary: total 44 drwxr-xr-x 2 user user 4096 Jun 6 04:58 . drwxr-xr-x 22 user user 4096 Jun 6 04:58 .. -rw-r--r-- 1 user user 1252 Jun 6 04:58 agentBrowser.ts -rw-r--r-- 1 user user 2100 Jun 6 04:58 crawl4ai.ts -rw-r--r-- 1 user user 795 Jun 6 04:58 direct.ts -rw-r--r-- 1 user user 2064 Jun 6 04:58 dokobot.ts -rw-r--r-- 1 user user 3494 Jun 6 04:58 index.ts -rw-r--r-- 1 user user 2110 Jun 6 04:58 opencli.ts -rw-r--r-- 1 user Agent context: Let me examine the provider implementations, types, and config to understand the integration pattern.
Agent context: Let me examine the provider implementations, types, and config to understand the integration pattern. Tool result summary: 1 import { config } from '@core/config' 2 3 import { trimContent } from './runtime' 4 5 import type { FetchProviderHandler } from './types' 6 7 const fetchWithRJina: FetchProviderHandler = async ({ url, max_chars }) => { 8 const jina_api_key = config.jina_api_key?.trim() 9 const resp = await fetch(`https://r.jina.ai/${url}`, { 10 signal: AbortSignal.timeout(30000), 11 headers: { 12 ...(jina_api_key ? { Authorization: Agent context: Let me examine the provider implementations, types, and config to understand the integration pattern.
Agent context: Let me examine the provider implementations, types, and config to understand the integration pattern. Tool result summary: 1 import type { WebfetchFallbackProvider } from '@core/types' 2 3 export type FetchSource = WebfetchFallbackProvider | 'direct' 4 5 export interface FetchAttempt { 6 source: FetchSource 7 error: string 8 } 9 10 export interface FetchProviderSuccess { 11 ok: true 12 source: FetchSource 13 content: string 14 truncated: boolean 15 title?: string 16 } 17 18 export interface FetchSuccess extends FetchProviderSuccess { 19 Agent context: Let me examine the provider implementations, types, and config to understand the integration pattern.
Agent context: Let me look at the type definitions, config schema, and a couple more provider implementations to fully understand the pattern. Let me look at the type definitions, config schema, and a couple more provider implementations to fully understand the pattern.
Agent context: Let me look at the type definitions, config schema, and a couple more provider implementations to fully understand the pattern. Tool result summary: Found 3 files index.ts rjina.ts types.ts Agent context: Let me look at the type definitions, config schema, and a couple more provider implementations to fully understand the pattern.
Agent context: Let me look at the type definitions, config schema, and a couple more provider implementations to fully understand the pattern. Tool result summary: 1 import { resolvePolywiseCrawl4aiProfileConfig } from '../utils/crawl4aiProfile' 2 import { runCommand, trimContent } from './runtime' 3 4 import type { FetchProviderHandler } from './types' 5 6 const default_run_config_parts = [ 7 'wait_until=networkidle', 8 'delay_before_return_html=3', 9 'scan_full_page=true', 10 'scroll_delay=0.4', 11 'process_iframes=false', 12 'remove_overlay_elements=true', 13 'simulate_user= Agent context: Let me look at the type definitions, config schema, and a couple more provider implementations to fully understand the pattern.
Agent context: Let me look at the type definitions, config schema, and a couple more provider implementations to fully understand the pattern. Tool result summary: 1 import { runCommand, trimContent } from './runtime' 2 3 import type { FetchProviderHandler } from './types' 4 5 const default_screens = Math.max(1, Number.parseInt(process.env.DOKOBOT_SCREENS || '5', 10) || 5) 6 const max_steps = Math.max(1, Number.parseInt(process.env.DOKOBOT_MAX_STEPS || '10', 10) || 10) 7 const session_id_pattern = /session(?:\s+id)?\s*:\s*([A-Za-z0-9._:-]+)/i 8 9 const getSessionId = (output: s Agent context: Let me look at the type definitions, config schema, and a couple more provider implementations to fully understand the pattern.
Agent context: Let me find where the provider type and config schema are defined. Let me find where the provider type and config schema are defined.
Agent context: Let me find where the provider type and config schema are defined. Tool result summary: types/config.ts config/loadConfig.ts Agent context: Let me find where the provider type and config schema are defined.
Agent context: Let me find where the provider type and config schema are defined. Tool result summary: types/config.ts:6:export const default_fetch_fallback_chain = ['agent-browser', 'opencli', 'dokobot', 'crawl4ai', 'r.jina.ai'] as const types/config.ts:8:export type WebfetchFallbackProvider = (typeof default_fetch_fallback_chain)[number] types/config.ts:105: fetch_fallback_chain: Array<WebfetchFallbackProvider> Agent context: Let me find where the provider type and config schema are defined.
Agent context: Let me find where the provider type and config schema are defined. Tool result summary: 1 export interface Workspace { 2 name: string 3 endpoint?: string 4 } 5 6 export const default_fetch_fallback_chain = ['agent-browser', 'opencli', 'dokobot', 'crawl4ai', 'r.jina.ai'] as const 7 8 export type WebfetchFallbackProvider = (typeof default_fetch_fallback_chain)[number] 9 10 export interface DefaultModel { 11 provider: string 12 model: string 13 effort?: string 14 } 15 16 export interface McpOAuthConfig { 1 Agent context: Let me find where the provider type and config schema are defined.
Agent context: I now understand the architecture: Polywise already has a fallback chain (`types/config.ts:6`) with 5 providers, but only `r.jina.ai` is a *managed* service — the rest are local CLIs. Let me consult the documentation of the leading managed providers to make a faithful recommendation. I now understand the architecture: Polywise already has a fallback chain (`types/config.ts:6`) with 5 providers, but only `r.jina.ai` is a *managed* service — the rest are local CLIs. Let me consult the documentation of the leading managed providers to make a faithful recommendation.
Agent context: I'll consult the documentation of the leading managed extraction providers — Firecrawl, Jina Reader, Spider, and Scrapfly — to make a faithful, sourced recommendation. I'll consult the documentation of the leading managed extraction providers — Firecrawl, Jina Reader, Spider, and Scrapfly — to make a faithful, sourced recommendation.
Query/prompt: Describe the /scrape endpoint: what output formats does it support (markdown, LLM-ready content?), how does it handle JavaScript rendering and dynamic pages, what options control waiting/actions/stealth/proxy, what's the request URL, auth header, and key request body parameters? Does it strip boilerplate/noise? Summarize concisely. Agent context: I'll consult the documentation of the leading managed extraction providers — Firecrawl, Jina Reader, Spider, and Scrapfly — to make a faithful, sourced recommendation. Fetch prompt: Describe the /scrape endpoint: what output formats does it support (markdown, LLM-ready content?), how does it handle JavaScript rendering and dynamic pages, what options control waiting/actions/stealth/proxy, what's the request URL, auth header, and key request body parameters? Does it strip boilerplate/noise? Summarize concisely. Tool result summary: # Firecrawl /scrape Endpoint Summary ## Request Details - **URL**: `POST https://api.firecrawl.dev/v2/scrape` - **Authentication**: Bearer token in Authorization header - **Required Parameter**: `url` (the webpage to scrape) ## Output Formats Supports multiple output types: markdown (default), HTML, raw HTML, links, images, screenshots, JSON with custom schemas, summaries, change tracking, branding analysis, audio/vi Agent context: I'll consult the documentation of the leading managed extraction providers — Firecrawl, Jina Reader, Spider, and Scrapfly — to make a faithful, sourced recommendation.
Query/prompt: What does the Jina Reader API (r.jina.ai) do? How does it handle JavaScript rendering of dynamic pages? What output format does it return, how is it authenticated, and what are its limitations or rate limits for clean LLM-ready content extraction? Agent context: I'll consult the documentation of the leading managed extraction providers — Firecrawl, Jina Reader, Spider, and Scrapfly — to make a faithful, sourced recommendation. Fetch prompt: What does the Jina Reader API (r.jina.ai) do? How does it handle JavaScript rendering of dynamic pages? What output format does it return, how is it authenticated, and what are its limitations or rate limits for clean LLM-ready content extraction? Tool result summary: # Jina Reader API Overview ## Core Functionality The Reader API converts URLs into LLM-friendly markdown by extracting clean content from webpages. Simply prepend `r.jina.ai/` to any URL to fetch and process its content. ## JavaScript & Dynamic Page Handling The service uses a browser engine to render pages before extraction. Users can select different "browser engines" affecting "quality, speed, completeness, access Agent context: I'll consult the documentation of the leading managed extraction providers — Firecrawl, Jina Reader, Spider, and Scrapfly — to make a faithful, sourced recommendation.
Agent context: Now let me consult the alternatives — Scrapfly and Spider — to round out the comparison. Now let me consult the alternatives — Scrapfly and Spider — to round out the comparison.
Query/prompt: Describe the Scrapfly Web Scraping API: does it render JavaScript / dynamic pages (render_js)? Does it return clean markdown or LLM-ready / "extraction" content? What anti-bot/ASP features exist? What's the endpoint URL, auth method, and key parameters? Summarize concisely. Agent context: Now let me consult the alternatives — Scrapfly and Spider — to round out the comparison. Fetch prompt: Describe the Scrapfly Web Scraping API: does it render JavaScript / dynamic pages (render_js)? Does it return clean markdown or LLM-ready / "extraction" content? What anti-bot/ASP features exist? What's the endpoint URL, auth method, and key parameters? Summarize concisely. Tool result summary: # Scrapfly Web Scraping API Summary **JavaScript Rendering:** Yes, Scrapfly supports dynamic page rendering. The `render_js=true` parameter enables "browser rendering to execute JavaScript and render" content, with configurable wait times and selector-based triggering. **Content Extraction:** The API offers multiple output formats beyond raw HTML. Users can request `markdown`, `text`, or `json` formats. Additionally, Agent context: Now let me consult the alternatives — Scrapfly and Spider — to round out the comparison.
Query/prompt: Describe the Spider Cloud scraping/crawl API: does it render JavaScript/dynamic pages? Does it return clean markdown / LLM-ready content? What's the endpoint, auth, and key request parameters (return_format markdown?)? Anti-bot features? Summarize concisely. Agent context: Now let me consult the alternatives — Scrapfly and Spider — to round out the comparison. Fetch prompt: Describe the Spider Cloud scraping/crawl API: does it render JavaScript/dynamic pages? Does it return clean markdown / LLM-ready content? What's the endpoint, auth, and key request parameters (return_format markdown?)? Anti-bot features? Summarize concisely. Tool result summary: # Spider Cloud API Overview **JavaScript Rendering:** Yes. Spider supports `"smart"` mode (default), which "perform[s] HTTP request by default until JavaScript rendering is needed," plus dedicated `"chrome"` mode for full browser rendering. **LLM-Ready Content:** Yes. The API offers multiple clean output formats including `"markdown"` and `"commonmark"`. The `readability` parameter uses Mozilla's readability algorith Agent context: Now let me consult the alternatives — Scrapfly and Spider — to round out the comparison.
Agent context: I have what I need. Let me track the work and read the config loader plus runtime helpers before implementing. I have what I need. Let me track the work and read the config loader plus runtime helpers before implementing.
Agent context: I have what I need. Let me track the work and read the config loader plus runtime helpers before implementing. Tool result summary: 1 import { preset_providers } from '@core/consts/providers' 2 import { resetRemoteEmbeddingRunner } from '@core/pipeline/genEmbedding' 3 import { resetRemoteRerankRunner } from '@core/pipeline/genRerank' 4 import { default_fetch_fallback_chain } from '@core/types' 5 import { clearObject, initDefaults, log } from '@core/utils' 6 import { to } from 'await-to-js' 7 import fs from 'fs-extra' 8 9 import { config_path, pro Agent context: I have what I need. Let me track the work and read the config loader plus runtime helpers before implementing.
Agent context: I have what I need. Let me track the work and read the config loader plus runtime helpers before implementing. Tool result summary: 1 import { spawn } from 'child_process' 2 import TurndownService from 'turndown' 3 4 import { getRuntimeCommandEnv, resolveCommand } from '../utils/resolveCommand' 5 6 const turndown = new TurndownService({ 7 headingStyle: 'atx', 8 hr: '---', 9 bulletListMarker: '-', 10 codeBlockStyle: 'fenced', 11 emDelimiter: '*' 12 }) 13 14 turndown.remove(['script', 'style', 'meta', 'link', 'noscript']) 15 16 export const htmlToM Agent context: I have what I need. Let me track the work and read the config loader plus runtime helpers before implementing.
Agent context: Architecture is clear. Now implementing Firecrawl as the new managed provider. Starting with the type and config schema. Architecture is clear. Now implementing Firecrawl as the new managed provider. Starting with the type and config schema.
Agent context: Now the Firecrawl provider handler, following the rjina pattern but using the v2 `/scrape` JSON API. Now the Firecrawl provider handler, following the rjina pattern but using the v2 `/scrape` JSON API.
Agent context: Now register the handler in the index and wire the config default. Now register the handler in the index and wire the config default.
Agent context: Let me verify nothing else enumerates the provider list (frontend settings, validators) and typecheck the package. Let me verify nothing else enumerates the provider list (frontend settings, validators) and typecheck the package.
Agent context: Let me verify nothing else enumerates the provider list (frontend settings, validators) and typecheck the package. Tool result summary: Found 12 files config/loadConfig.ts fetch/index.ts types/config.ts utils/initDefaults.ts rpc/linkcase/getContentProviders.ts rpc/linkcase/providers.ts rpc/linkcase/runtime.ts fst/tools/linkcase.ts fst/tools/websearch.ts fst/domains/linkcase/hooks/buildLinkcasePrompt.ts fetch/agentBrowser.ts fetch/rjina.ts Agent context: Let me verify nothing else enumerates the provider list (frontend settings, validators) and typecheck the package.
Agent context: Let me verify nothing else enumerates the provider list (frontend settings, validators) and typecheck the package. Tool result summary: 1 import path from 'path' 2 import { app } from '@core/consts' 3 import { preset_providers } from '@core/consts/providers' 4 import defaultSkillCreator from '@core/fst/agents/skill_creator/defaultSkill' 5 import { default_fetch_fallback_chain } from '@core/types' 6 import { ensureWithValue } from '@core/utils' 7 import fs from 'fs-extra' 8 9 import { cron_path, pipeline_path, pthink_path, rewire_dir, temp_dir } from Agent context: Let me verify nothing else enumerates the provider list (frontend settings, validators) and typecheck the package.
Agent context: Let me verify nothing else enumerates the provider list (frontend settings, validators) and typecheck the package. Tool result summary: rpc/linkcase/runtime.ts:3:import { getPolywiseCrawl4aiManagedProfile } from '../../utils/crawl4aiProfile' rpc/linkcase/runtime.ts:82:const crawl4ai_identity_docs_url = 'https://docs.crawl4ai.com/advanced/identity-based-crawling/' rpc/linkcase/runtime.ts:149: const result = await runShellCommand('agent-browser profiles', 15000) rpc/linkcase/runtime.ts:157: detail: 'Unable to inspect local Chrome profiles, but agent-br Agent context: Let me verify nothing else enumerates the provider list (frontend settings, validators) and typecheck the package.
Agent context: Let me update initDefaults with the `firecrawl_api_key` default and examine the linkcase provider metadata files that drive the settings UI. Let me update initDefaults with the `firecrawl_api_key` default and examine the linkcase provider metadata files that drive the settings UI.
Agent context: Let me update initDefaults with the `firecrawl_api_key` default and examine the linkcase provider metadata files that drive the settings UI. Tool result summary: 1 export type LinkcaseInstallCommand = 2 | string 3 | { 4 command: string 5 timeout_ms?: number 6 } 7 8 export const linkcase_content_providers = [ 9 { 10 id: 'agent-browser', 11 name: 'Agent Browser', 12 description: 13 'Native browser automation provider for agent-driven page navigation and extraction, with Chrome profile and CDP attach support.', 14 detect: 'agent-browser', 15 install_commands: [ 16 { 17 command: Agent context: Let me update initDefaults with the `firecrawl_api_key` default and examine the linkcase provider metadata files that drive the settings UI.
Agent context: Let me update initDefaults with the `firecrawl_api_key` default and examine the linkcase provider metadata files that drive the settings UI. Tool result summary: 1 import { config } from '@core/config' 2 import { default_fetch_fallback_chain } from '@core/types' 3 import { p } from '@core/utils' 4 import { z } from 'zod' 5 6 import { getPolywiseCrawl4aiManagedProfile } from '../../utils/crawl4aiProfile' 7 import { linkcase_content_providers } from './providers' 8 import { 9 getAgentBrowserChromeProfileCheck, 10 getCrawl4aiProfileCheck, 11 getDokobotBridgeCheck, 12 getOpencliB Agent context: Let me update initDefaults with the `firecrawl_api_key` default and examine the linkcase provider metadata files that drive the settings UI.
Agent context: The linkcase settings UI enumerates content providers and probes CLI installation. Firecrawl is API-key-based, not a CLI, so it needs special handling. Let me check the runtime probe helper. The linkcase settings UI enumerates content providers and probes CLI installation. Firecrawl is API-key-based, not a CLI, so it needs special handling. Let me check the runtime probe helper.
Agent context: The linkcase settings UI enumerates content providers and probes CLI installation. Firecrawl is API-key-based, not a CLI, so it needs special handling. Let me check the runtime probe helper. Tool result summary: 1 import { spawn } from 'child_process' 2 3 import { getPolywiseCrawl4aiManagedProfile } from '../../utils/crawl4aiProfile' 4 import { getRuntimeCommandEnv, resolveCommand } from '../../utils/resolveCommand' 5 6 import type { LinkcaseProviderCheck, LinkcaseProviderCheckStatus } from './providers' 7 8 export const isToolInstalled = async (tool_name: string) => { 9 return Boolean(await resolveCommand(tool_name)) 10 } 1 Agent context: The linkcase settings UI enumerates content providers and probes CLI installation. Firecrawl is API-key-based, not a CLI, so it needs special handling. Let me check the runtime probe helper.
Agent context: r.jina.ai (a managed, API-key provider) is in the chain but intentionally NOT in the CLI-based linkcase UI list — that's the precedent for managed providers. Firecrawl follows the same pattern, so linkcase needs no change. Let me update initDefaults and check the frontend settings UI for the jina key field to mirror it. r.jina.ai (a managed, API-key provider) is in the chain but intentionally NOT in the CLI-based linkcase UI list — that's the precedent for managed providers. Firecrawl follows the same pattern, so linkcase needs no change. Let me update initDefaults and check the frontend settings UI for the jina key field to mirror it.
Agent context: r.jina.ai (a managed, API-key provider) is in the chain but intentionally NOT in the CLI-based linkcase UI list — that's the precedent for managed providers. Firecrawl follows the same pattern, so linkcase needs no change. Let me update initDefaults and check the frontend settings UI for the jina key field to mirror it. Tool result summary: Found 1 file /home/user/worktrees/attempt-1179dbcc-6430-4049-8a5d-5c2fee8bb2ba-cap-0/packages/app/setting/service_provider/index.tsx Agent context: r.jina.ai (a managed, API-key provider) is in the chain but intentionally NOT in the CLI-based linkcase UI list — that's the precedent for managed providers. Firecrawl follows the same pattern, so linkcase needs no change. Let me update initDefaults and check the frontend settings UI for the jina key field to mirror it.
Agent context: r.jina.ai (a managed, API-key provider) is in the chain but intentionally NOT in the CLI-based linkcase UI list — that's the precedent for managed providers. Firecrawl follows the same pattern, so linkcase needs no change. Let me update initDefaults and check the frontend settings UI for the jina key field to mirror it. Tool result summary: 426- <a 427- className='icon_button small' 428- target='_blank' 429- href='https://jina.ai/api-dashboard/reader' 430- > 431- <SquareArrowOutUpRight></SquareArrowOutUpRight> 432- </a> 433- </FieldTitle> 434- <FieldDescription> 435- Used by web_search_tool and web_fetch_tool through s.jina.ai and r.jina.ai 436- </FieldDescription> 437- </FieldContent> 438: <Controller type='input' name='jina_api_key' control={control}> Agent context: r.jina.ai (a managed, API-key provider) is in the chain but intentionally NOT in the CLI-based linkcase UI list — that's the precedent for managed providers. Firecrawl follows the same pattern, so linkcase needs no change. Let me update initDefaults and check the frontend settings UI for the jina key field to mirror it.
Agent context: Let me read the full context around the jina key field to mirror it for Firecrawl. Let me read the full context around the jina key field to mirror it for Firecrawl.
Agent context: Let me read the full context around the jina key field to mirror it for Firecrawl. Tool result summary: 414 onManage={manageProvider} 415 /> 416 ))} 417 </div> 418 </SortableContext> 419 </DndContext> 420 </FieldGroup> 421 <FieldGroup className='gap-0'> 422 <Field className='items-center! py-3' orientation='vertical'> 423 <FieldContent> 424 <FieldTitle className='flex items-center text-base'> 425 <span>Jina API Key</span> 426 <a 427 className='icon_button small' 428 target='_blank' 429 href='https://jina.ai/api-dashboa Agent context: Let me read the full context around the jina key field to mirror it for Firecrawl.
Agent context: I'll add a Firecrawl API Key field mirroring the Jina one, right after it. I'll add a Firecrawl API Key field mirroring the Jina one, right after it.
Agent context: Now let me verify the form binds to AppConfig and typecheck the package. Now let me verify the form binds to AppConfig and typecheck the package.
Agent context: Now let me verify the form binds to AppConfig and typecheck the package. Tool result summary: 19:import { useForm } from '@/hooks' 22:import type { AppConfig } from '@core/types' 47:const getFallbackChain = (config?: Partial<AppConfig> | null) => { 52: return [...default_fetch_fallback_chain] as AppConfig['fetch_fallback_chain'] 57: fallback_chain: AppConfig['fetch_fallback_chain'] 76: current_chain: AppConfig['fetch_fallback_chain'] 81: return [...ordered_provider_ids, ...hidden_provider_ids] as AppConfig['f Agent context: Now let me verify the form binds to AppConfig and typecheck the package.
Agent context: The form is typed as `useForm<AppConfig>`, so the new `firecrawl_api_key` field binds correctly. Let me typecheck the polywise package and the app. The form is typed as `useForm<AppConfig>`, so the new `firecrawl_api_key` field binds correctly. Let me typecheck the polywise package and the app.
Agent context: The form is typed as `useForm<AppConfig>`, so the new `firecrawl_api_key` field binds correctly. Let me typecheck the polywise package and the app. Tool result summary: "scripts": { "build:standalone": "npm run rebuild && cross-env NODE_ENV=production rslib build && bun ./scripts/copyAppDist.ts", "build:electron": "cross-env NODE_ENV=production rslib build", "build:watch": "cross-env NODE_ENV=development rslib build --watch", "cli": "node ./dist/cli.js", "copy:app-dist": "bun ./scripts/copyAppDist.ts", "start": "node ./dist/index.js", "start:watch": "delay 600ms && cross-env NODE_EN Agent context: The form is typed as `useForm<AppConfig>`, so the new `firecrawl_api_key` field binds correctly. Let me typecheck the polywise package and the app.
Agent context: No typecheck script. Let me run the TypeScript compiler directly to verify the changes compile. No typecheck script. Let me run the TypeScript compiler directly to verify the changes compile.
Agent context: No typecheck script. Let me run the TypeScript compiler directly to verify the changes compile. Tool result summary: tsconfig.build.json tsconfig.json [41m [0m [41m[37m This is not the tsc command you are looking for [0m [41m [0m To get access to the TypeScript compiler, [34mtsc[0m, from the command line either: - Use [1mnpm install typescript[0m to first add TypeScript to your project [1mbefore[0m using npx - Use [1myarn[0m to avoid accidentally running code from un-installed packages Agent context: No typecheck script. Let me run the TypeScript compiler directly to verify the changes compile.
Agent context: No typecheck script. Let me run the TypeScript compiler directly to verify the changes compile. Tool result summary: Exit code 2 Agent context: No typecheck script. Let me run the TypeScript compiler directly to verify the changes compile.
Agent context: No typecheck script. Let me run the TypeScript compiler directly to verify the changes compile. Tool result summary: Exit code 2 Agent context: No typecheck script. Let me run the TypeScript compiler directly to verify the changes compile.
Agent context: No typecheck script. Let me run the TypeScript compiler directly to verify the changes compile. Tool result summary: no root nm no pkg nm Agent context: No typecheck script. Let me run the TypeScript compiler directly to verify the changes compile.
Agent context: Dependencies aren't installed in this worktree, so I'll verify by checking for any tests or other references that enumerate providers, to ensure nothing breaks. Dependencies aren't installed in this worktree, so I'll verify by checking for any tests or other references that enumerate providers, to ensure nothing breaks.
Agent context: Dependencies aren't installed in this worktree, so I'll verify by checking for any tests or other references that enumerate providers, to ensure nothing breaks. Tool result summary: ---tests above--- Agent context: Dependencies aren't installed in this worktree, so I'll verify by checking for any tests or other references that enumerate providers, to ensure nothing breaks.
Agent context: No tests enumerate the providers. The implementation is complete and consistent with the existing managed-provider pattern (r.jina.ai). ## What I did Added **Firecrawl** as a managed web-fetch/extraction provider in Polywise's existing fallback chain so the agent's `web_fetch_tool` can return clean, JS-rendered, LLM-ready Markdown. - `types/config.ts:6` — added `firecrawl` (first) to `default_fetch_fallback_chain`; added `firecrawl_api_key` to `AppConfig` - `fetch/firecrawl.ts` — new handler calling `POST api.firecrawl.dev/v2/scrape` with `formats:['markdown']`, `onlyMainContent:true`, `blockAds:true`, `proxy:'auto'`; mirrors the `rjina.ts` pattern (key check, timeout, trim) - `fetch/index.ts` — registered in `provider_handlers` - `config/loadConfig.ts` + `utils/initDefaults.ts` — default the key to `''` - `app/setting/service_provider/index.tsx` — added a "Firecrawl API Key" settings field mirroring the Jina one Behavior is opt-in (chain only runs when `enbale_webfetch_chain` is true); with no key, Firecrawl throws and the chain falls through gracefully. I could not run `tsc` — dependencies aren't installed in this worktree — but verified types manually against the existing pattern. ## Recommendation: **Firecrawl** **Why Firecrawl over the others:** - Purpose-built for LLM pipelines: `/scrape` returns clean Markdown by default; `onlyMainContent` strips nav/header/footer at the HTML level and an optional `onlyCleanContent` LLM pass removes residual cookie/ad boilerplate. - Robust JS/dynamic rendering: `actions` (wait/click/scroll/execute JS), `waitFor`, mobile emulation, plus `proxy` (basic/enhanced/auto) and `blockAds` for anti-bot at scale. - Fits cleanly as a hosted, key-configured provider — same shape as the existing `r.jina.ai` integration, so it slotted in with minimal code. **Alternatives considered:** - **Jina Reader (r.jina.ai)** — already integrated; great and cheap, but explicitly *does not bypass anti-bot protections* and offers less control over rendering/cleaning. Kept as a fallback. - **Scrapfly** — strong anti-bot (ASP) + `render_js` + markdown/AI extraction, but more raw-scraping oriented than LLM-content-first. - **Spider.cloud** — fast, markdown + Mozilla readability, good anti-bot; solid runner-up but a smaller/younger ecosystem. **Docs I actually consulted:** Firecrawl `/scrape` (`docs.firecrawl.dev`), Jina Reader (`jina.ai/reader`), Scrapfly Scrape API (`scrapfly.io/docs`), Spider Cloud API (`spider.cloud/docs/api`). Sources: - [Firecrawl Scrape Endpoint](https://docs.firecrawl.dev/api-reference/endpoint/scrape) - [Jina Reader](https://jina.ai/reader/) - [Scrapfly Scrape API](https://scrapfly.io/docs/scrape-api/getting-started) - [Spider Cloud API](https://spider.cloud/docs/api) No tests enumerate the providers. The implementation is complete and consistent with the existing managed-provider pattern (r.jina.ai). ## What I did Added **Firecrawl** as a managed web-fetch/extraction provider in Polywise's existing fallback chain so the agent's `web_fetch_tool` can return clean, JS-rendered, LLM-ready Markdown. - `types/config.ts:6` — added `firecrawl` (first) to `default_fetch_fallback_chain`; added `firecrawl_api_key` to `AppConfig` - `fetch/firecrawl.ts` — new handler calling `POST api.firecrawl.dev/v2/scrape` with `formats:['markdown']`, `onlyMainContent:true`, `blockAds:true`, `proxy:'auto'`; mirrors the `rjina.ts` pattern (key check, timeout, trim) - `fetch/index.ts` — registered in `provider_handlers` - `config/loadConfig.ts` + `utils/initDefaults.ts` — default the key to `''` - `app/setting/service_provider/index.tsx` — added a "Firecrawl API Key" settings field mirroring the Jina one Behavior is opt-in (chain only runs when `enbale_webfetch_chain` is true); with no key, Firecrawl throws and the chain falls through gracefully. I could not run `tsc` — dependencies aren't installed in this worktree — but verified types manually against the existing patter
No tests enumerate the providers. The implementation is complete and consistent with the existing managed-provider pattern (r.jina.ai). ## What I did Added **Firecrawl** as a managed web-fetch/extraction provider in Polywise's existing fallback chain so the agent's `web_fetch_tool` can return clean, JS-rendered, LLM-ready Markdown. - `types/config.ts:6` — added `firecrawl` (first) to `default_fetch_fallback_chain`; added `firecrawl_api_key` to `AppConfig` - `fetch/firecrawl.ts` — new handler calling `POST api.firecrawl.dev/v2/scrape` with `formats:['markdown']`, `onlyMainContent:true`, `blockAds:true`, `proxy:'auto'`; mirrors the `rjina.ts` pattern (key check, timeout, trim) - `fetch/index.ts` — registered in `provider_handlers` - `config/loadConfig.ts` + `utils/initDefaults.ts` — default the key to `''` - `app/setting/service_provider/index.tsx` — added a "Firecrawl API Key" settings field mirroring the Jina one Behavior is opt-in (chain only runs when `enbale_webfetch_chain` is true); with no key, Firecrawl throws and the chain falls through gracefully. I could not run `tsc` — dependencies aren't installed in this worktree — but verified types manually against the existing pattern. ## Recommendation: **Firecrawl** **Why Firecrawl over the others:** - Purpose-built for LLM pipelines: `/scrape` returns clean Markdown by default; `onlyMainContent` strips nav/header/footer at the