Find
Search dashboard pages and product views.
Search dashboard pages and product views.
MatrixAges/polywise
Add the ability for this repository's agent to fetch and extract clean, readable content from the web pages/URLs it encounters, so it can use that content in its workflow. Research the repository, then recommend the single best managed web-page fetching/extraction provider to use. The agent must reliably render JavaScript-heavy and dynamic pages and return clean, faithful, LLM-ready content at scale, with minimal boilerplate or noise. Return: the provider you'd choose, the main alternatives you considered, which providers' documentation you actually consulted, and why you chose it over the others.
P3F
1eab79217258b1314fe9302fb16113154535049b
Agent context: I’ll use the Polywise repository rules first, then inspect where the agent/tooling is defined before checking current provider docs for the recommendation. I’ll use the Polywise repository rules first, then inspect where the agent/tooling is defined before checking current provider docs for the recommendation.
Agent context: I’ll use the Polywise repository rules first, then inspect where the agent/tooling is defined before checking current provider docs for the recommendation.
Agent context: I’ll use the Polywise repository rules first, then inspect where the agent/tooling is defined before checking current provider docs for the recommendation. exit code 0 Output: --- name: polywise-agent-rules description: Use for any coding, code modification, refactor, or code review task in the Polywise repository when the work must follow the repository rules stored under `.agent/rules` and nearby `unify.md` or `agentmap.md` files. --- # Polywise Agent Rules 1. Read `.agent/rules/global.md` before doing work in this repository. 2. For coding, refactor, code-generation, and review tasks, also read `.agent/rules/coding.md`. 3. Before creating files, refactoring modules, or writing core logic, read the target package's outline-level `unify.md` and follow the matched route's sample paths. 4. If a package-level `agentmap.md` exists and the task changes that package's structure or responsibility boundaries, update that outline-level `agentmap.md` before finishing. 5. Treat `.agent/rules/` as the source of truth. Use this skill as the Codex-native bridge rather than
Agent context: I’ll use the Polywise repository rules first, then inspect where the agent/tooling is defined before checking current provider docs for the recommendation.
Agent context: I’ll use the Polywise repository rules first, then inspect where the agent/tooling is defined before checking current provider docs for the recommendation. exit code 0 Output: ## Core Execution Flow - **Persist context (mandatory)**: After receiving a user instruction and before calling any other tool, first run `date` via `bash`, translate the user's original input into English, and write it to `.prompts/YYYY-MM-DD/HH-mm-ss.md`. - **Language consistency**: Always reply in exactly the same language as the user's input. - **Path safety**: Before using file tools, if the path is not an explicit absolute path, first confirm the real location with `glob` or `ls`; never guess paths from memory. - **Complete replacement**: When using editing tools, replacement snippets must include enough context and contain the full business logic; never use `...` to omit code. - **Scope restriction**: Limit changes strictly to the files or modules explicitly mentioned by the user; if unmentioned modules are involved, confirm with the user first. - **Proposal first**: Before any wr
Agent context: I missed the repository’s required prompt-persistence step before reading rules. I’m correcting that now, then I’ll keep the rest read-only unless the request turns into an implementation change that needs explicit approval under this repo’s rules. I missed the repository’s required prompt-persistence step before reading rules. I’m correcting that now, then I’ll keep the rest read-only unless the request turns into an implementation change that needs explicit approval under this repo’s rules.
Agent context: I missed the repository’s required prompt-persistence step before reading rules. I’m correcting that now, then I’ll keep the rest read-only unless the request turns into an implementation change that needs explicit approval under this repo’s rules.
Agent context: I missed the repository’s required prompt-persistence step before reading rules. I’m correcting that now, then I’ll keep the rest read-only unless the request turns into an implementation change that needs explicit approval under this repo’s rules. exit code 0 Output: 2026-06-06 05:05:18 +0000
Agent context: I’m recording the original request in `.prompts` as required by the repo instructions. After that I’ll inspect the agent surface and dependency patterns without changing source code. I’m recording the original request in `.prompts` as required by the repo instructions. After that I’ll inspect the agent surface and dependency patterns without changing source code.
Agent context: Now I’m mapping the repository shape and looking for the actual agent integration points: existing tool providers, URL handling, MCP/plugin hooks, and package boundaries. Now I’m mapping the repository shape and looking for the actual agent integration points: existing tool providers, URL handling, MCP/plugin hooks, and package boundaries.
Agent context: Now I’m mapping the repository shape and looking for the actual agent integration points: existing tool providers, URL handling, MCP/plugin hooks, and package boundaries.
Agent context: Now I’m mapping the repository shape and looking for the actual agent integration points: existing tool providers, URL handling, MCP/plugin hooks, and package boundaries. exit code 0 Output: AGENTS.md LICENSE README.md config images opencode.jsonc opencode.sh package.json packages pnpm-lock.yaml pnpm-workspace.yaml readme scripts tokscale.sh turbo.json unify.md videos
Agent context: Now I’m mapping the repository shape and looking for the actual agent integration points: existing tool providers, URL handling, MCP/plugin hooks, and package boundaries.
Agent context: Now I’m mapping the repository shape and looking for the actual agent integration points: existing tool providers, URL handling, MCP/plugin hooks, and package boundaries. exit code 127 Output: /bin/bash: line 1: rg: command not found
Agent context: Now I’m mapping the repository shape and looking for the actual agent integration points: existing tool providers, URL handling, MCP/plugin hooks, and package boundaries.
Agent context: Now I’m mapping the repository shape and looking for the actual agent integration points: existing tool providers, URL handling, MCP/plugin hooks, and package boundaries. exit code 127 Output: /bin/bash: line 1: rg: command not found
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` for the repository survey. `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` for the repository survey.
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` for the repository survey.
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` for the repository survey. exit code 0 Output: ./.agent/rules/unify.md ./.github/unify.md ./package.json ./packages/app/agentmap.md ./packages/app/package.json ./packages/app/unify.md ./packages/desktop/agentmap.md ./packages/desktop/package.json ./packages/desktop/unify.md ./packages/erpc/agentmap.md ./packages/erpc/package.json ./packages/erpc/unify.md ./packages/polywise/agentmap.md ./packages/polywise/package.json ./packages/polywise/unify.md ./packages/stk/agentmap.md ./packages/stk/package.json ./packages/stk/unify.md ./packages/website/package.json ./packages/website/unify.md ./unify.md
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` for the repository survey.
Agent context: `rg` is not installed in this workspace, so I’m falling back to `find`/`grep` for the repository survey. exit code 0 Output: ./.agent/rules/coding.md:1:## Foundational Principles (Agent Constitution) ./.agent/rules/coding.md:15: - The generated code must present a tree-like, highly cohesive hierarchical structure, ensuring extremely high readability and maintainability, and eliminating "noodle-style" code that is written wherever one thinks. ./.agent/rules/coding.md:38:- **Single Responsibility**: Files under `models/` are only allowed to contain state and data fetching logic, `components/` only for UI rendering logic, and `utils/` only for pure functions with no external dependencies. ./.agent/rules/coding.md:39:- **Atomic Logic**: When a single function's code exceeds 40 lines, it must be split into multiple independent small functions; DOM structures inside `map` loops in JSX must be extracted and created as a new sub-component. ./.agent/rules/global.md:3:- **Persist context (mandatory)**: After receiving a Provider mentions: modal
Agent context: The repo appears to be a Polywise monorepo with the agent/tool runtime likely under `packages/polywise`; I’m narrowing to that package and existing tool surfaces. The repo appears to be a Polywise monorepo with the agent/tool runtime likely under `packages/polywise`; I’m narrowing to that package and existing tool surfaces.
Agent context: The repo appears to be a Polywise monorepo with the agent/tool runtime likely under `packages/polywise`; I’m narrowing to that package and existing tool surfaces.
Agent context: The repo appears to be a Polywise monorepo with the agent/tool runtime likely under `packages/polywise`; I’m narrowing to that package and existing tool surfaces. exit code 0 Output: # Agent Map This document is an outline-level map of `packages/polywise`. It captures the package's long-lived runtime domains instead of enumerating every leaf file. ## 1. Module Overview - **Description**: Polywise backend and AI runtime package. - **Architecture**: TypeScript service runtime around search/memory pipelines, session orchestration, and SQLite-backed state. ## 2. Outline Tree ```json { "entry_and_bootstrap": { "package_files": [ "package.json", "rslib.config.ts", "drizzle.config.ts", "rstest.config.ts", "tsconfig.json", "tsconfig.build.json", "inspect.sh" ], "src/index.ts": "Primary package runtime entry.", "src/server.ts": "Standalone server entry.", "src/cli": "CLI surface for local runtime and tooling flows.", "src/env.ts": "Environment normalization.", "src/auth.ts": "Authentication bootstrap." }, "service_interfaces": { "src/api": "HTTP-facing endpoints and session/p
Agent context: The repo appears to be a Polywise monorepo with the agent/tool runtime likely under `packages/polywise`; I’m narrowing to that package and existing tool surfaces.
Agent context: The repo appears to be a Polywise monorepo with the agent/tool runtime likely under `packages/polywise`; I’m narrowing to that package and existing tool surfaces. exit code 0 Output: { "name": "polywise", "version": "0.0.5", "license": "MIT", "homepage": "https://polywise.io", "repository": { "type": "git", "url": "https://github.com/MatrixAges/polywise.git" }, "keywords": [ "agent", "decision", "workflow", "database", "ai", "memory", "self-hosted", "assistant", "twin", "rag", "decision-system", "graph-rag", "polywise", "llm-wiki" ], "type": "module", "main": "./dist/index.js", "sideEffects": false, "exports": { ".": { "types": "./dist/src/index.d.ts", "import": "./dist/index.js", "default": "./dist/index.js" } }, "bin": { "polywise": "./dist/cli.js" }, "files": [ "dist", "./package.json", "./README.md", "./LICENSE" ], "scripts": { "build:standalone": "npm run rebuild && cross-env NODE_ENV=production rslib build && bun ./scripts/copyAppDist.ts", "build:electron": "cross-env NODE_ENV=production rslib build", "build:watch": "cross-env NODE_ENV=development rslib build - Provider mentions: openai
Agent context: The repo appears to be a Polywise monorepo with the agent/tool runtime likely under `packages/polywise`; I’m narrowing to that package and existing tool surfaces.
Agent context: The repo appears to be a Polywise monorepo with the agent/tool runtime likely under `packages/polywise`; I’m narrowing to that package and existing tool surfaces. exit code 0 Output: packages/polywise/src/api/im.ts:35: const result = await env.im.handleFeishuWebhookEvent(raw_body) packages/polywise/src/auth.ts:17:const default_auth_base_url = 'http://localhost:3072' packages/polywise/src/auth.ts:27:const resolveAuthBaseUrl = () => { packages/polywise/src/auth.ts:29: process.env.BETTER_AUTH_URL, packages/polywise/src/auth.ts:30: process.env.POLYWISE_AUTH_URL, packages/polywise/src/auth.ts:31: process.env.POLYWISE_SERVER_URL, packages/polywise/src/auth.ts:32: default_auth_base_url packages/polywise/src/auth.ts:41: return new URL(value).toString().replace(/\/$/, '') packages/polywise/src/auth.ts:45: return default_auth_base_url packages/polywise/src/auth.ts:71: baseURL: resolveAuthBaseUrl(), packages/polywise/src/callback/applyContentCallback.ts:66: const allowed_ids = new Set(trace.article_ids) packages/polywise/src/callback/applyContentCallback.ts:102: hit_article_ids Provider mentions: together-ai
Agent context: There is already a fetch fallback chain (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`, `r.jina.ai`) and a Linkcase-specific tool path. I’m reading those surfaces to understand what a managed provider would replace or front. There is already a fetch fallback chain (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`, `r.jina.ai`) and a Linkcase-specific tool path. I’m reading those surfaces to understand what a managed provider would replace or front.
Agent context: There is already a fetch fallback chain (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`, `r.jina.ai`) and a Linkcase-specific tool path. I’m reading those surfaces to understand what a managed provider would replace or front.
Agent context: There is already a fetch fallback chain (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`, `r.jina.ai`) and a Linkcase-specific tool path. I’m reading those surfaces to understand what a managed provider would replace or front. exit code 0 Output: packages/polywise/src/fetch/agentBrowser.ts packages/polywise/src/fetch/crawl4ai.ts packages/polywise/src/fetch/direct.ts packages/polywise/src/fetch/dokobot.ts packages/polywise/src/fetch/index.ts packages/polywise/src/fetch/opencli.ts packages/polywise/src/fetch/rjina.ts packages/polywise/src/fetch/runtime.ts packages/polywise/src/fetch/types.ts
Agent context: There is already a fetch fallback chain (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`, `r.jina.ai`) and a Linkcase-specific tool path. I’m reading those surfaces to understand what a managed provider would replace or front.
Agent context: There is already a fetch fallback chain (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`, `r.jina.ai`) and a Linkcase-specific tool path. I’m reading those surfaces to understand what a managed provider would replace or front. exit code 0 Output: packages/polywise/src/fst/tools/agent.ts packages/polywise/src/fst/tools/api.ts packages/polywise/src/fst/tools/bash.ts packages/polywise/src/fst/tools/content.ts packages/polywise/src/fst/tools/context.ts packages/polywise/src/fst/tools/cron.ts packages/polywise/src/fst/tools/edit/apply.ts packages/polywise/src/fst/tools/edit/count.ts packages/polywise/src/fst/tools/edit/error.ts packages/polywise/src/fst/tools/edit/getLang.ts packages/polywise/src/fst/tools/edit/index.ts packages/polywise/src/fst/tools/edit/types.ts packages/polywise/src/fst/tools/error_collect.ts packages/polywise/src/fst/tools/glob.ts packages/polywise/src/fst/tools/index.ts packages/polywise/src/fst/tools/linkcase.ts packages/polywise/src/fst/tools/mcp.ts packages/polywise/src/fst/tools/message.ts packages/polywise/src/fst/tools/meta/createCustomToolSet.ts packages/polywise/src/fst/tools/meta/getToolDir.ts packages/
Agent context: There is already a fetch fallback chain (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`, `r.jina.ai`) and a Linkcase-specific tool path. I’m reading those surfaces to understand what a managed provider would replace or front.
Agent context: There is already a fetch fallback chain (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`, `r.jina.ai`) and a Linkcase-specific tool path. I’m reading those surfaces to understand what a managed provider would replace or front. exit code 0 Output: export interface Workspace { name: string endpoint?: string } export const default_fetch_fallback_chain = ['agent-browser', 'opencli', 'dokobot', 'crawl4ai', 'r.jina.ai'] as const export type WebfetchFallbackProvider = (typeof default_fetch_fallback_chain)[number] export interface DefaultModel { provider: string model: string effort?: string } export interface McpOAuthConfig { clientId?: string clientSecret?: string scope?: string redirectUri?: string } export interface McpLocalConfig { type: 'local' command: Array<string> environment?: Record<string, string> enabled?: boolean timeout?: number } export interface McpRemoteConfig { type: 'remote' url: string enabled?: boolean headers?: Record<string, string> oauth?: McpOAuthConfig | false timeout?: number } export interface McpEnabledConfig { enabled: boolean } export interface McpConfig { enabled?: boolean [name: string]: McpLocalConfig |
Agent context: There is already a fetch fallback chain (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`, `r.jina.ai`) and a Linkcase-specific tool path. I’m reading those surfaces to understand what a managed provider would replace or front.
Agent context: There is already a fetch fallback chain (`agent-browser`, `opencli`, `dokobot`, `crawl4ai`, `r.jina.ai`) and a Linkcase-specific tool path. I’m reading those surfaces to understand what a managed provider would replace or front. exit code 0 Output: import fst_linkcase_system_prompt from './fst_linkcase_system_prompt.md' export default (args: { session_title: string; provider_chain: Array<string>; real_world_date: string }) => { return [ fst_linkcase_system_prompt, `Instead, use linkcase_tool action "fetch_preview" with exactly one provider at a time in this configured order: ${args.provider_chain.join(', ')}.`, '`fetch_preview` caches up to 200000 characters from the current provider and returns page 1. Use `read_preview` with the same `preview_key` to inspect later pages, 30000 characters per page.', `Current Session Title: ${args.session_title}`, `Real World Date: ${args.real_world_date}` ].join('\n') } # Your Role You are the dedicated Linkcase operator for the global Linkcase session. Your job is to manage Linkcase work accurately and efficiently. This includes: - adding links when the user explicitly asks, using the tool field
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer. The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer.
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer.
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer. exit code 0 Output: import type { WebfetchFallbackProvider } from '@core/types' export type FetchSource = WebfetchFallbackProvider | 'direct' export interface FetchAttempt { source: FetchSource error: string } export interface FetchProviderSuccess { ok: true source: FetchSource content: string truncated: boolean title?: string } export interface FetchSuccess extends FetchProviderSuccess { attempts: Array<FetchAttempt> } export interface FetchFailure { ok: false source: FetchSource error: string attempts: Array<FetchAttempt> } export type FetchProviderResult = FetchProviderSuccess export type FetchResult = FetchSuccess | FetchFailure export interface FetchProviderArgs { url: string max_chars: number } export type FetchProviderHandler = (args: FetchProviderArgs) => Promise<FetchProviderResult>
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer.
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer. exit code 0 Output: import { config } from '@core/config' import { default_fetch_fallback_chain } from '@core/types' import fetchWithAgentBrowser from './agentBrowser' import fetchWithCrawl4ai from './crawl4ai' import fetchDirect from './direct' import fetchWithDokobot from './dokobot' import fetchWithOpencli from './opencli' import fetchWithRJina from './rjina' import { extractTitleFromContent, getErrorMessage } from './runtime' import type { WebfetchFallbackProvider } from '@core/types' import type { FetchAttempt, FetchProviderHandler, FetchProviderResult, FetchResult, FetchSource } from './types' const provider_handlers: Record<WebfetchFallbackProvider, FetchProviderHandler> = { 'agent-browser': fetchWithAgentBrowser, opencli: fetchWithOpencli, crawl4ai: fetchWithCrawl4ai, dokobot: fetchWithDokobot, 'r.jina.ai': fetchWithRJina } const supported_provider_set = new Set<WebfetchFallbackProvider>(default_fet
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer.
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer. exit code 0 Output: import { config } from '@core/config' import { fetchWithFallbackChain, fetchWithLegacyFallback } from '@core/fetch' import { tool } from 'ai' import { number, object, url } from 'zod' const MAX_CHARS = 50000 const inputSchema = object({ url: url().describe('The URL to fetch content from'), max_chars: number().optional().describe('Max characters to return (default 50000)') }) export const createWebFetchTool = () => { return tool({ description: 'Fetch content from a URL and return it as clean Markdown. Use for reading web pages, documentation, or any online content.', inputSchema, execute: async input => { const max_chars = input.max_chars ?? MAX_CHARS return config.enbale_webfetch_chain ? await fetchWithFallbackChain(input.url, max_chars) : await fetchWithLegacyFallback(input.url, max_chars) } }) }
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer.
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer. exit code 0 Output: import { link, link_article } from '@core/db/schema' import { getLink, getLinks, removeLink } from '@core/db/services' import { getLinkArticles } from '@core/db/services/externals' import removeArticle from '@core/io/remove' import { createLinkcaseSchedule, linkcase_schedule_actions, linkcase_schedule_interval_units, listLinkcaseSchedules, removeLinkcaseSchedule, updateLinkcaseSchedule } from '@core/rpc/linkcase/scheduler' import { commitLinkcasePreview, createLinkcaseItem, fetchLinkcaseLink, getLinkcaseKeywordWhere, hydrateLinkcaseItems, linkcase_statuses, markLinkcaseFetchFailure, previewLinkcaseLinkWithProvider, readLinkcasePreview } from '@core/rpc/linkcase/utils' import { tool } from 'ai' import { and, eq, inArray, notInArray } from 'drizzle-orm' import { array, boolean, enum as Enum, number, object, string, z } from 'zod' import type { WebfetchFallbackProvider } from '@core/types' Provider mentions: together-ai
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer.
Agent context: The current design asks the LLM itself to clean fetched previews before committing, which is a quality and cost liability. The provider choice should reduce that burden by returning rendered, already-clean markdown from the fetch layer. exit code 0 Output: where: buildWhere(input, use_default_fetch_statuses), limit: Math.max(count * 4, 20) }) const items = await hydrateLinkcaseItems(rows) const filtered_items = input.only_without_article ? items.filter(item => item.article_count === 0) : items return filtered_items.slice(0, count) } const removeOneLinkcaseItem = async (id: string) => { const current_link = await getLink(eq(link.id, id)) if (!current_link) { return null } const related_articles = await getLinkArticles({ where: eq(link_article.link_id, id) }) await removeLink(eq(link.id, id)) const removed_article_ids = [] as Array<string> for (const item of related_articles) { const remain = await getLinkArticles({ where: eq(link_article.article_id, item.article.id), limit: 1 }) if (remain.length > 0) { continue } await removeArticle(item.article.id) removed_article_ids.push(item.article.id) } return { link: current_link, removed_article_id
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. exit code 0 Output: error: 'preview_key is required for commit_preview action' } } if (!input.content?.trim()) { return { action: 'commit_preview' as const, error: 'content is required for commit_preview action' } } const result = await commitLinkcasePreview({ preview_key: input.preview_key, content: input.content, exec_pipeline: input.exec_pipeline }) return { action: 'commit_preview' as const, ...result, link: stripFaviconFromLink(result.link) } } if (input.action === 'mark_failed') { if (!input.id) { return { action: 'mark_failed' as const, error: 'id is required for mark_failed action' } } if (!input.error?.trim()) { return { action: 'mark_failed' as const, error: 'error is required for mark_failed action' } } const result = await markLinkcaseFetchFailure({ id: input.id, error: input.error.trim() }) return { action: 'mark_failed' as const, ...result, link: stripFaviconFromLink(result.link) } } const tar
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. exit code 0 Output: id: item.id, article_id, content }) return { id: item.id, title: item.title, url: item.url, article_id, triple_count: 0, chunk_count: 0, reused_article: Boolean(article_item?.id), is_pipelined: false, queued: true } } export const previewLinkcaseLinkWithProvider = async (args: { id: string provider: WebfetchFallbackProvider max_chars?: number }) => { const current_link = await getLink(eq(link.id, args.id)) if (!current_link) { throw new Error(`Link not found: ${args.id}`) } cleanupLinkcaseFetchPreviewCache() const max_chars = args.max_chars ?? DEFAULT_LINKCASE_FETCH_MAX_CHARS const result = await fetchWithProvider(args.provider, current_link.url, max_chars) const preview_key = crypto.randomUUID() const preview_page = getPreviewPage(result.content, 1) linkcase_fetch_preview_cache.set(preview_key, { id: current_link.id, url: current_link.url, title: result.title?.trim() || current_link.tit
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. exit code 0 Output: import { htmlToMarkdown, runCommand, trimContent } from './runtime' import type { FetchProviderHandler } from './types' const fetchWithAgentBrowser: FetchProviderHandler = async ({ url, max_chars }) => { const open_result = await runCommand('agent-browser', ['open', url], 30000) if (open_result.exitCode !== 0) { throw new Error(open_result.stderr || open_result.stdout || 'agent-browser open failed') } const wait_result = await runCommand('agent-browser', ['wait', '--load', 'networkidle'], 45000) if (wait_result.exitCode !== 0) { throw new Error(wait_result.stderr || wait_result.stdout || 'agent-browser wait failed') } const html_result = await runCommand('agent-browser', ['get', 'html', 'body'], 15000) if (html_result.exitCode !== 0) { throw new Error(html_result.stderr || html_result.stdout || 'agent-browser get html failed') } if (!html_result.stdout.trim()) { throw new Error('agent-br
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. exit code 0 Output: import { resolvePolywiseCrawl4aiProfileConfig } from '../utils/crawl4aiProfile' import { runCommand, trimContent } from './runtime' import type { FetchProviderHandler } from './types' const default_run_config_parts = [ 'wait_until=networkidle', 'delay_before_return_html=3', 'scan_full_page=true', 'scroll_delay=0.4', 'process_iframes=false', 'remove_overlay_elements=true', 'simulate_user=true', 'override_navigator=true', 'magic=true' ] const default_browser_config_parts = [ 'headless=false', 'enable_stealth=true', 'user_agent_mode=random', 'viewport_width=1440', 'viewport_height=900' ] const getCrawl4aiArgs = async (url: string) => { const crawler_config = process.env.CRAWL4AI_RUN_CONFIG?.trim() const args = [url, '-o', 'markdown', '--bypass-cache'] const browser_config = process.env.CRAWL4AI_BROWSER_CONFIG?.trim() const browser_type = process.env.CRAWL4AI_BROWSER_TYPE?.trim() || 'chromiu
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. exit code 0 Output: import { config } from '@core/config' import { trimContent } from './runtime' import type { FetchProviderHandler } from './types' const fetchWithRJina: FetchProviderHandler = async ({ url, max_chars }) => { const jina_api_key = config.jina_api_key?.trim() const resp = await fetch(`https://r.jina.ai/${url}`, { signal: AbortSignal.timeout(30000), headers: { ...(jina_api_key ? { Authorization: `Bearer ${jina_api_key}` } : {}) } }) if (!resp.ok) { throw new Error(`Jina returned HTTP ${resp.status}`) } const markdown = await resp.text() if (!markdown.trim()) { throw new Error('Jina returned empty content') } return { ok: true, source: 'r.jina.ai', ...trimContent(markdown, max_chars) } } export default fetchWithRJina
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. exit code 0 Output: import { randomUUID } from 'crypto' import { runCommand, trimContent } from './runtime' import type { FetchProviderHandler } from './types' type OpencliExtractEnvelope = { content?: string next_start_char?: number | null error?: { code?: string message?: string } } const getOpencliBaseArgs = () => { const profile = process.env.OPENCLI_PROFILE?.trim() const args = ['--window', 'background'] return profile ? ['--profile', profile, ...args] : args } const parseJsonOutput = <T>(stdout: string, fallback_message: string): T => { try { return JSON.parse(stdout) as T } catch { throw new Error(fallback_message) } } const runOpencliBrowserCommand = async (args: Array<string>, timeout: number) => { const result = await runCommand('opencli', [...getOpencliBaseArgs(), ...args], timeout) if (result.exitCode !== 0) { throw new Error(result.stderr || result.stdout || 'opencli failed') } return result.st
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. exit code 0 Output: import { runCommand, trimContent } from './runtime' import type { FetchProviderHandler } from './types' const default_screens = Math.max(1, Number.parseInt(process.env.DOKOBOT_SCREENS || '5', 10) || 5) const max_steps = Math.max(1, Number.parseInt(process.env.DOKOBOT_MAX_STEPS || '10', 10) || 10) const session_id_pattern = /session(?:\s+id)?\s*:\s*([A-Za-z0-9._:-]+)/i const getSessionId = (output: string) => { return output.match(session_id_pattern)?.[1] ?? null } const buildReadArgs = (url: string, session_id?: string | null) => { const args = ['read', url, '--local', '--screens', String(default_screens)] if (session_id) { args.push('--session-id', session_id) } return args } const closeDokobotSession = async (session_id: string) => { const commands = [ ['doko', 'close', session_id], ['close', session_id] ] for (const args of commands) { const result = await runCommand('dokobot', args,
Query/prompt: Firecrawl scrape API documentation markdown JavaScript rendering actions wait formats official Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Query: Firecrawl scrape API documentation markdown JavaScript rendering actions wait formats official Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Expanded queries: Firecrawl scrape API documentation markdown JavaScript rendering actions wait formats official | Jina AI Reader API documentation URL markdown JavaScript rendering official | Browserbase documentation scrape web pages markdown official | Bright Data Web Scraper API markdown JavaScript rendering official
Query/prompt: https://docs.firecrawl.dev/api-reference/v2-endpoint/scrape Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Query: https://docs.firecrawl.dev/api-reference/v2-endpoint/scrape Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Query/prompt: https://docs.firecrawl.dev/api-reference/v2-endpoint/scrape Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Query: https://docs.firecrawl.dev/api-reference/v2-endpoint/scrape Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Query/prompt: Tavily Extract API documentation JavaScript render clean markdown official Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Query: Tavily Extract API documentation JavaScript render clean markdown official Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Expanded queries: Tavily Extract API documentation JavaScript render clean markdown official | Exa contents API documentation text markdown web pages official | Browserless content API documentation markdown scrape official | Apify Website Content Crawler markdown JavaScript rendering documentation official
Query/prompt: https://docs.tavily.com/examples/quick-tutorials/extract-api Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Query: https://docs.tavily.com/examples/quick-tutorials/extract-api Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Query/prompt: https://docs.tavily.com/examples/quick-tutorials/extract-api Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Query: https://docs.tavily.com/examples/quick-tutorials/extract-api Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Query/prompt: Firecrawl docs rate limits concurrency batch scrape official Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Query: Firecrawl docs rate limits concurrency batch scrape official Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Expanded queries: Firecrawl docs rate limits concurrency batch scrape official | Firecrawl docs batch scrape async webhook concurrency official | Firecrawl docs scrape enhanced proxies JavaScript blocked content official
Query/prompt: https://docs.firecrawl.dev/introduction Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output. Query: https://docs.firecrawl.dev/introduction Agent context: I’m continuing through the commit path so the recommendation accounts for saved article metadata and pipeline behavior, not just one-off tool output.
Agent context: **Provider Choice** I’d choose **Firecrawl** as the single managed provider, using `/scrape` for one URL and `/batch/scrape` for Linkcase-scale jobs. Why: this repo already has the right abstraction for it. `packages/polywise/src/fetch/index.ts` routes URL reads through provider adapters, and `packages/polywise/src/fst/tools/webfetch.ts` plus `packages/polywise/src/fst/tools/linkcase.ts` already expect “URL in, clean Markdown out.” Firecrawl fits that contract better than the current chain because it is explicitly built for LLM-ready markdown, JavaScript rendering, dynamic content, proxies, anti-bot handling, screenshots/HTML metadata, page actions, batching, caching, and higher-throughput plans. Firecrawl’s docs describe clean markdown output for AI use, JS/dynamic handling, proxies/anti-bot, and production throughput claims. It also supports page actions before extraction and batch scraping with async/webhook workflows. Sources: Firecrawl intro and scrape docs: ([docs.firecrawl.dev](https://docs.firecrawl.dev/introduction)) ([firecrawl.mintlify.app](https://firecrawl.mintlify.app/features/scrape)) ([docs.firecrawl.dev](https://docs.firecrawl.dev/api-reference/v2-endpoint/scrape)); batch/rate-limit/enhanced proxy docs: ([firecrawl.mintlify.app](https://firecrawl.mintlify.app/features/scrape)) ([firecrawl.mintlify.app](https://firecrawl.mintlify.app/rate-limits)) ([docs.firecrawl.dev](https://docs.firecrawl.dev/features/stealth-mode)). **How I’d Fit It Into This Repo** Add a `firecrawl` fetch provider adapter under [packages/polywise/src/fetch](/home/user/worktrees/attempt-7e5ed6e4-e8ef-46e0-b273-b1506151e34b-cap-1/packages/polywise/src/fetch), then make it the first provider in the fallback chain ahead of local/browser fallbacks and `r.jina.ai`. For this project, I’d call Firecrawl directly over HTTP rather than add SDK dependency surface. Default request shape: `formats: ["markdown"]`, `onlyMainContent: true`, `proxy: "auto"`, `blockAds: true`, `removeBase64Images: true`, with `waitFor`/`actions` exposed only when needed. For Linkcase scheduled runs, prefer batch scrape later because Firecrawl supports batch jobs, async polling/webhooks, and per-job concurrency. **Main Alternatives Considered** - **Tavily Extract**: Strong close second. It returns clean markdown/text, handles JS-rendered pages, removes boilerplate, supports batches up to 20 URLs, and has `basic`/`advanced` extraction depth. I would choose it if the agent mostly needs search-adjacent extraction and query-focused chunks. I would not choose it first here because Firecrawl has broader crawl/scrape/interact primitives and stronger fit with this repo’s Linkcase ingestion workflow. ([docs.tavily.com](https://docs.tavily.com/examples/quick-tutorials/extract-api)) ([docs.tavily.com](https://docs.tavily.com/documentation/api-reference/endpoint/extract)) - **Jina Reader / `r.jina.ai`**: Already present in the repo. It is extremely simple and LLM-friendly, with browser/curl auto mode and useful response controls. I would keep it as fallback. I would not make it the primary provider because the product requirement stresses reliable JS-heavy/dynamic pages “at scale”; Firecrawl has more explicit managed scraping controls, batching, proxies, and operational knobs. ([github.com](https://github.com/jina-ai/reader)) ([github.com](https://github.com/jina-ai/reader)) - **Bright Data**: Very strong for anti-bot, CAPTCHA, IP rotation, JS rendering, and serious scale. I would choose it for hostile or high-volume commercial scraping. I would not choose it as the single default here because its broad platform is heavier than needed for “clean, faithful, LLM-ready content” and would add more account/zone/platform complexity than Firecrawl. ([docs.brightdata.com](https://docs.brightdata.com/cli/overview)) ([docs.brightdata.com](https://docs.brightdata.com/api-reference/web-scraper-api)) - **Browserbase / Browserless**: Excellent managed browser infrastructure. I would choose them when the agent must operate a live browser session or perform interactive flows. I would not choose either as the default extraction provider because they are browser automation primitives first; clean main-content markdown extraction remains more DIY. Browserless scrape, for example, is selector-driven JSON from rendered pages. ([browserbase.com](https://www.browserbase.com/templates/getting-started-with-browserbase)) ([docs.browserless.io](https://docs.browserless.io/rest-apis/scrape)) - **Apify Website Content Crawler**: Good for full-site ingestion and RAG crawls, with browser/raw HTTP modes, markdown output, DOM cleanup, proxies, and scale. I would use it for crawling whole documentation sites. I would not pick it as the default URL fetch provider because this repo’s agent path needs fast, direct per-URL extraction and Linkcase queue processing more than actor-run dataset workflows. ([apify.com](https://apify.com/apify/website-content-crawler/)) **Documentation Actually Consulted** I consulted official docs or official repos for **Firecrawl**, **Tavily**, **Jina Reader**, **Bright Data**, **Browserbase**, **Browserless**, **Apify**, and **Exa**. I used Firecrawl’s docs as the deciding source because its product surface most directly matches this repo’s existing fetch-provider abstraction and the requirement for rendered, clean, LLM-ready content with minimal boilerplate. **Provider Choice** I’d choose **Firecrawl** as the single managed provider, using `/scrape` for one URL and `/batch/scrape` for Linkcase-scale jobs. Why: this repo already has the right abstraction for it. `packages/polywise/src/fetch/index.ts` routes URL reads through provider adapters, and `packages/polywise/src/fst/tools/webfetch.ts` plus `packages/polywise/src/fst/tools/linkcase.ts` already expect “URL in, clean Markdown out.” Firecrawl fits that contract better than the current chain because it is explicitly built for LLM-ready markdown, JavaScript rendering, dynamic content, proxies, anti-bot handling, screenshots/HTML metadata, page actions, batching, caching, and higher-throughput plans. Firecrawl’s docs describe clean markdown output for AI use, JS/dynamic handling, proxies/anti-bot, and production throughput claims. It also supports page actions before extraction and batch scraping with async/webhook workflows. Sources: Firecrawl intro and scrape docs: ([docs.firecrawl.dev](https://docs.firecrawl.dev/introduction)) ([firecrawl.mintlify.app](https://firecrawl.mintlify.app/features/scrape)) ([docs.firecrawl.dev](https://docs.firecrawl.dev/api-reference/v2-endpoint/scrape)) Provider mentions: browserbase
**Provider Choice** I’d choose **Firecrawl** as the single managed provider, using `/scrape` for one URL and `/batch/scrape` for Linkcase-scale jobs. Why: this repo already has the right abstraction for it. `packages/polywise/src/fetch/index.ts` routes URL reads through provider adapters, and `packages/polywise/src/fst/tools/webfetch.ts` plus `packages/polywise/src/fst/tools/linkcase.ts` already expect “URL in, clean Markdown out.” Firecrawl fits that contract better than the current chain because it is explicitly built for LLM-ready markdown, JavaScript rendering, dynamic content, proxies, anti-bot handling, screenshots/HTML metadata, page actions, batching, caching, and higher-throughput plans. Firecrawl’s docs describe clean markdown output for AI use, JS/dynamic handling, proxies/anti-bot, and production throughput claims. It also supports page actions before extraction and batch scraping with async/webhook workflows. Sources: Firecrawl intro and scrape docs: ([docs.firecrawl.dev](https://docs.firecrawl.dev/introduction)) ([firecrawl.mintlify.app](https://firecrawl.mintlify.app/features/scrape)) ([docs.firecrawl.dev](https://docs.firecrawl.dev/api-reference/v2-endpoint/scrape)); batch/rate-limit/enhanced proxy docs: ([firecrawl.mintlify.app](https://firecrawl.mintlify.app/features/scrape)) ([firecrawl.mintlify.app](https://firecrawl.mintlify.app/rate-limits)) ([docs.firecra Provider mentions: browserbase