Find
Search dashboard pages and product views.
Search dashboard pages and product views.
Omniplex-ai/omniplex
Add the ability for this repository's agent to fetch and extract clean, readable content from the web pages/URLs it encounters, so it can use that content in its workflow. Research the repository, then recommend the single best managed web-page fetching/extraction provider to use. Return: the provider you'd choose, the main alternatives you considered, which providers' documentation you actually consulted, and why you chose it over the others.
P0
6b010650109bf32003ff0e0910127d6cfd5d23a5
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: /home/user/worktrees/attempt-c219809f-8548-46f3-9a6e-63c77f584b96-cap-0/README.md Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1  2 3 <p align="center"> 4 <h1 align="center"><b>Omniplex</b></h1> 5 <p align="center"> 6 Open-Source Perplexity 7 <br /> 8 <br /> 9 <a href="https://omniplex.ai">Website</a> 10 · 11 <a href="https://discord.gg/87Mh7q5ZSd">Discord</a> 12 · 13 <a href="https://www.reddit.com/r/omniplex_ai">Reddit</a> 14 </p> 15 </p> 16 17 # :construction: Under Active Development 18 19 > Our focus is on establishin Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: total 1600 drwxr-xr-x 4 user user 4096 Jun 6 04:55 . drwxr-xr-x 4 user user 4096 Jun 6 04:55 .. -rw-r--r-- 1 user user 94 Jun 6 04:55 .env.example -rw-r--r-- 1 user user 40 Jun 6 04:55 .eslintrc.json -rw-r--r-- 1 user user 170 Jun 6 04:55 .git -rw-r--r-- 1 user user 396 Jun 6 04:55 .gitignore -rw-r--r-- 1 user user 1186230 Jun 6 04:55 Github.png -rw-r--r-- 1 user user 34523 Jun 6 04:55 LICENSE -rw-r--r-- 1 user user Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 { 2 "name": "omniplex", 3 "version": "1.0.0", 4 "private": true, 5 "scripts": { 6 "dev": "next dev", 7 "build": "next build", 8 "start": "next start", 9 "lint": "next lint" 10 }, 11 "dependencies": { 12 "@firebase/firestore": "^4.5.0", 13 "@firebase/storage": "^0.12.2", 14 "@headlessui/react": "^2.0.3", 15 "@headlessui/tailwindcss": "^0.2.0", 16 "@heroicons/react": "^2.1.1", 17 "@lottiefiles/react-lottie-player": " Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: /home/user/worktrees/attempt-c219809f-8548-46f3-9a6e-63c77f584b96-cap-0/src/app/AuthWrapper.tsx /home/user/worktrees/attempt-c219809f-8548-46f3-9a6e-63c77f584b96-cap-0/src/app/api/chat/route.ts /home/user/worktrees/attempt-c219809f-8548-46f3-9a6e-63c77f584b96-cap-0/src/app/api/dictionary/route.ts /home/user/worktrees/attempt-c219809f-8548-46f3-9a6e-63c77f584b96-cap-0/src/app/api/favicon/route.ts /home/user/worktrees/ Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import OpenAI from "openai"; 2 3 const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); 4 5 export async function POST(req: Request) { 6 if (req.method !== "POST") { 7 return new Response( 8 JSON.stringify({ 9 error: "Method not allowed, only POST requests are accepted.", 10 }), 11 { status: 405 } 12 ); 13 } 14 15 const messages = await req.json(); 16 17 const tools: OpenAI.Chat.Completions.ChatCompleti Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Provider mentions: openai
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import OpenAI from "openai"; 2 import { OpenAIStream, StreamingTextResponse } from "ai"; 3 4 const openai = new OpenAI({ 5 apiKey: process.env.OPENAI_API_KEY, 6 }); 7 8 export const runtime = "edge"; 9 10 export async function POST(req: Request) { 11 const { 12 messages, 13 model, 14 temperature, 15 max_tokens, 16 top_p, 17 frequency_penalty, 18 presence_penalty, 19 } = await req.json(); 20 21 const response = awai Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Provider mentions: openai
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import type { NextRequest } from "next/server"; 2 3 export const runtime = "edge"; 4 5 async function scrapeText(url: string): Promise<string> { 6 try { 7 const response = await fetch(url); 8 if (!response.ok) { 9 throw new Error(`HTTP error! status: ${response.status}`); 10 } 11 const html = await response.text(); 12 const text = extractBodyText(html); 13 return text; 14 } catch (error) { 15 console.error(`Error f Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 const BING_API_KEY = process.env.BING_API_KEY; 4 const BING_SEARCH_URL = "https://api.bing.microsoft.com/v7.0/search"; 5 6 export const runtime = "edge"; 7 8 export async function GET(req: NextRequest) { 9 const { searchParams } = new URL(req.url); 10 const q = searchParams.get("q"); 11 12 if (!q || typeof q !== "string") { 13 return new NextResponse( 14 Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 const OPENWEATHERMAP_API_KEY = process.env.OPENWEATHERMAP_API_KEY; 4 const GEOCODING_URL = "http://api.openweathermap.org/geo/1.0/direct"; 5 const CURRENT_WEATHER_URL = "https://api.openweathermap.org/data/2.5/weather"; 6 const FORECAST_URL = "https://api.openweathermap.org/data/2.5/forecast"; 7 8 export const runtime = "edge"; 9 10 function formatTime(ho Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 const FINNHUB_API_KEY = process.env.FINNHUB_API_KEY; 4 const ALPHA_VANTAGE_API_KEY = process.env.ALPHA_VANTAGE_API_KEY; 5 6 const fetchJSON = async (url: string) => { 7 const response = await fetch(url); 8 if (!response.ok) throw new Error(`Failed to fetch data from ${url}`); 9 return response.json(); 10 }; 11 12 type ChartDataPoint = { 13 timestamp: stri Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 export async function GET(req: NextRequest) { 4 const { searchParams } = new URL(req.url); 5 const word = searchParams.get("word"); 6 7 if (!word) { 8 return new NextResponse(JSON.stringify({ error: "Word is required" }), { 9 status: 400, 10 headers: { "Content-Type": "application/json" }, 11 }); 12 } 13 14 const apiUrl = `https://api.dictionaryapi.dev/ap Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: /home/user/worktrees/attempt-c219809f-8548-46f3-9a6e-63c77f584b96-cap-0/src/utils/api.ts /home/user/worktrees/attempt-c219809f-8548-46f3-9a6e-63c77f584b96-cap-0/src/utils/types.ts Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 export type ListItem = { 2 date: string; 3 content: string; 4 }; 5 6 export type Mode = 7 | "search" 8 | "chat" 9 | "image" 10 | "stock" 11 | "weather" 12 | "dictionary" 13 | ""; 14 15 export type FileInfo = { 16 name: string; 17 size: number; 18 date: string; 19 url: string; 20 }; 21 22 export type MessageContent = 23 | { type: "text"; text: string } 24 | { 25 type: "image_url"; 26 image_url: { 27 url: string; 28 Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 export const handleMode = async (text: string) => { 2 try { 3 const response = await fetch("/api/tools", { 4 method: "POST", 5 headers: { "Content-Type": "application/json" }, 6 body: JSON.stringify([ 7 { 8 role: "system", 9 content: 10 "You are an Ai Asistant who is supposed to use functions or chat based on the user query." + 11 "If the user wants to search for information, use search function." + 12 "If the user Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: /home/user/worktrees/attempt-c219809f-8548-46f3-9a6e-63c77f584b96-cap-0/src/store/chatSlice.ts Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import { createSlice, PayloadAction } from "@reduxjs/toolkit"; 2 import { RootState } from "./store"; 3 import { 4 ChatThread, 5 Message, 6 Chat, 7 Mode, 8 WeatherType, 9 StockType, 10 DictionaryType, 11 SearchType, 12 } from "@/utils/types"; 13 14 type ChatState = { 15 threads: { [id: string]: ChatThread }; 16 }; 17 18 const initialState: ChatState = { 19 threads: {}, 20 }; 21 22 const chatSlice = createSlice({ 23 Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 "use client"; 2 3 import React, { useEffect, useRef, useState } from "react"; 4 import styles from "./Chat.module.css"; 5 import Source from "../Source/Source"; 6 import Answer from "../Answer/Answer"; 7 import Actions from "../Actions/Actions"; 8 import Prompt from "../Prompt/Prompt"; 9 import Chat404 from "../Chat404/Chat404"; 10 import ChatFetch from "../ChatFetch/ChatFetch"; 11 import { useDispatch } from "reac Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 150 151 // if ( 152 // (lastChat.mode === "chat" || lastChat.mode === "image") && 153 // !lastChat.answer 154 // ) { 155 // try { 156 // await handleAnswer(lastChat); 157 // } catch (error) { 158 // console.error("Error generating answer:", error); 159 // } 160 // } else if (lastChat.answer) { 161 // setIsLoading(false); 162 // setIsCompleted(true); 163 // } 164 165 // setLastProcessedIndex(lastChatIndex); 166 // }; Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 BING_API_KEY= 2 OPENAI_API_KEY= 3 4 OPENWEATHERMAP_API_KEY= 5 ALPHA_VANTAGE_API_KEY= 6 FINNHUB_API_KEY= Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import React from "react"; 2 import Image from "next/image"; 3 import styles from "./Source.module.css"; 4 import File from "../File/File"; 5 import Search from "../Search/Search"; 6 import Stock from "../Stock/Stock"; 7 import Widget from "../Widget/Widget"; 8 import Weather from "../Weather/Weather"; 9 import Dictionary from "../Dictionary/Dictionary"; 10 import { 11 FileInfo, 12 SearchType, 13 StockType, 14 Weat Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import { initializeApp } from "firebase/app"; 2 import { getStorage } from "firebase/storage"; 3 import { getFirestore } from "firebase/firestore"; 4 5 // Firebase Config 6 export const firebaseConfig = { 7 apiKey: "", 8 authDomain: "", 9 projectId: "", 10 storageBucket: "", 11 messagingSenderId: "", 12 appId: "", 13 measurementId: "", 14 }; 15 16 const app = initializeApp(firebaseConfig); 17 18 const db = getFires Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: /home/user/worktrees/attempt-c219809f-8548-46f3-9a6e-63c77f584b96-cap-0/src/hooks/useChatAnswer.ts Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import { useState } from "react"; 2 import { useDispatch, useSelector } from "react-redux"; 3 import { 4 updateAnswer, 5 addMessage, 6 updateMessage, 7 selectChatThread, 8 } from "@/store/chatSlice"; 9 import { Chat as ChatType, ChatThread, Message } from "../utils/types"; 10 import { getInitialMessages } from "../utils/utils"; 11 import { selectUserDetailsState } from "@/store/authSlice"; 12 import { selectAI } fr Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: /home/user/worktrees/attempt-c219809f-8548-46f3-9a6e-63c77f584b96-cap-0/src/utils/data.ts Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import Web from "../../public/svgs/options/Web.svg"; 2 import Academic from "../../public/svgs/options/Academic.svg"; 3 import Writing from "../../public/svgs/options/Writing.svg"; 4 import Youtube from "../../public/svgs/options/Youtube.svg"; 5 import Reddit from "../../public/svgs/options/Reddit.svg"; 6 import Stackoverflow from "../../public/svgs/options/Stackoverflow.svg"; 7 8 export const focusOptions = [ 9 { Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: total 28 drwxr-xr-x 2 user user 4096 Jun 6 04:55 . drwxr-xr-x 8 user user 4096 Jun 6 04:55 .. -rw-r--r-- 1 user user 2079 Jun 6 04:55 aiSlice.ts -rw-r--r-- 1 user user 1160 Jun 6 04:55 authSlice.ts -rw-r--r-- 1 user user 4836 Jun 6 04:55 chatSlice.ts -rw-r--r-- 1 user user 1053 Jun 6 04:55 store.ts Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 import { configureStore, combineReducers } from "@reduxjs/toolkit"; 2 import { 3 persistStore, 4 persistReducer, 5 FLUSH, 6 REHYDRATE, 7 PAUSE, 8 PERSIST, 9 PURGE, 10 REGISTER, 11 } from "redux-persist"; 12 import storage from "redux-persist/lib/storage"; 13 import chatReducer from "./chatSlice"; 14 import authReducer from "./authSlice"; 15 import aiReducer from "./aiSlice"; 16 17 const persistConfig = { 18 key: "r Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: total 108 drwxr-xr-x 27 user user 4096 Jun 6 04:55 . drwxr-xr-x 8 user user 4096 Jun 6 04:55 .. drwxr-xr-x 2 user user 4096 Jun 6 04:55 Actions drwxr-xr-x 2 user user 4096 Jun 6 04:55 Answer drwxr-xr-x 2 user user 4096 Jun 6 04:55 Auth drwxr-xr-x 2 user user 4096 Jun 6 04:55 Chat drwxr-xr-x 2 user user 4096 Jun 6 04:55 Chat404 drwxr-xr-x 2 user user 4096 Jun 6 04:55 ChatFetch drwxr-xr-x 2 user user 4096 Jun 6 04:55 D Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 { 2 "compilerOptions": { 3 "lib": ["dom", "dom.iterable", "esnext"], 4 "allowJs": true, 5 "skipLibCheck": true, 6 "strict": true, 7 "noEmit": true, 8 "esModuleInterop": true, 9 "module": "esnext", 10 "moduleResolution": "bundler", 11 "resolveJsonModule": true, 12 "isolatedModules": true, 13 "jsx": "preserve", 14 "incremental": true, 15 "plugins": [ 16 { 17 "name": "next" 18 } 19 ], 20 "paths": { 21 "@/*": ["./src/* Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works. Tool result summary: 1 /** @type {import('next').NextConfig} */ 2 3 const nextConfig = { 4 images: { 5 remotePatterns: [ 6 { 7 protocol: "https", 8 hostname: "**", 9 }, 10 ], 11 }, 12 compiler: { 13 removeConsole: process.env.NODE_ENV === "production", 14 }, 15 }; 16 17 export default nextConfig; 18 Agent context: I'll research the repository structure first to understand what kind of agent this is and how it works.
Agent context: Let me examine the existing scrape implementation and tool registration to ground my recommendation in the actual code. Let me examine the existing scrape implementation and tool registration to ground my recommendation in the actual code.
Agent context: Let me examine the existing scrape implementation and tool registration to ground my recommendation in the actual code. Tool result summary: 1 import type { NextRequest } from "next/server"; 2 3 export const runtime = "edge"; 4 5 async function scrapeText(url: string): Promise<string> { 6 try { 7 const response = await fetch(url); 8 if (!response.ok) { 9 throw new Error(`HTTP error! status: ${response.status}`); 10 } 11 const html = await response.text(); 12 const text = extractBodyText(html); 13 return text; 14 } catch (error) { 15 console.error(`Error f Agent context: Let me examine the existing scrape implementation and tool registration to ground my recommendation in the actual code.
Agent context: Let me examine the existing scrape implementation and tool registration to ground my recommendation in the actual code. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 const BING_API_KEY = process.env.BING_API_KEY; 4 const BING_SEARCH_URL = "https://api.bing.microsoft.com/v7.0/search"; 5 6 export const runtime = "edge"; 7 8 export async function GET(req: NextRequest) { 9 const { searchParams } = new URL(req.url); 10 const q = searchParams.get("q"); 11 12 if (!q || typeof q !== "string") { 13 return new NextResponse( 14 Agent context: Let me examine the existing scrape implementation and tool registration to ground my recommendation in the actual code.
Agent context: The repo already has a brittle `/api/scrape` (raw `fetch()` + regex tag-stripping, no JS rendering, easily blocked). Now let me research the actual managed providers by consulting their documentation. Let me load the web tools. The repo already has a brittle `/api/scrape` (raw `fetch()` + regex tag-stripping, no JS rendering, easily blocked). Now let me research the actual managed providers by consulting their documentation. Let me load the web tools.
Agent context: Now let me consult the documentation of the leading managed providers directly. Now let me consult the documentation of the leading managed providers directly.
Query/prompt: Describe the scrape endpoint: what it returns (markdown, clean content, metadata), JS rendering, output formats, authentication, pricing tiers, and how it's called (HTTP request shape). Summarize concisely. Agent context: Now let me consult the documentation of the leading managed providers directly. Fetch prompt: Describe the scrape endpoint: what it returns (markdown, clean content, metadata), JS rendering, output formats, authentication, pricing tiers, and how it's called (HTTP request shape). Summarize concisely. Tool result summary: # Firecrawl Scrape Endpoint Summary ## What It Returns The endpoint delivers multiple output formats including "markdown," "html," "rawHtml," and metadata (title, description, language, keywords). It also supports "screenshot," "json," "links," and specialized formats like "audio," "video," and "branding" extraction. ## Content Processing The service handles JavaScript-rendered pages through browser automation. The ` Agent context: Now let me consult the documentation of the leading managed providers directly.
Query/prompt: Describe the Jina Reader API (r.jina.ai): what it returns, how it's called, whether it needs an API key, output formats (markdown), JS rendering, and pricing. Summarize concisely. Agent context: Now let me consult the documentation of the leading managed providers directly. Fetch prompt: Describe the Jina Reader API (r.jina.ai): what it returns, how it's called, whether it needs an API key, output formats (markdown), JS rendering, and pricing. Summarize concisely. Tool result summary: # Jina Reader API Summary ## What It Returns The Reader API converts URLs into "clean, LLM-friendly text" by extracting core content and removing extraneous elements like markup and scripts. It outputs primarily in Markdown format and returns JSON responses containing the URL, title, content, and timestamp. ## How to Call It Simply prepend `r.jina.ai` to any URL. For example: `https://r.jina.ai/https://www.example.co Agent context: Now let me consult the documentation of the leading managed providers directly.
Query/prompt: Describe Tavily's Extract endpoint: what it returns, clean content extraction, request shape, auth, pricing. Summarize concisely. Agent context: Now let me consult the documentation of the leading managed providers directly. Fetch prompt: Describe Tavily's Extract endpoint: what it returns, clean content extraction, request shape, auth, pricing. Summarize concisely. Tool result summary: # Tavily Extract Endpoint Summary **Purpose**: Extracts and returns web page content from one or more specified URLs as cleaned text or markdown. **Request Shape**: POST to `https://api.tavily.com/extract` with JSON body containing: - `urls` (required): Single URL string or array of URLs (max 20) - `query` (optional): Intent for reranking content chunks - `chunks_per_source`: 1-5 chunks (default 3, available with que Agent context: Now let me consult the documentation of the leading managed providers directly.
Agent context: I have what I need. Here's my recommendation. ## Recommendation: **Firecrawl** (`/v2/scrape`) ### Context from the repo Omniplex already has `src/app/api/scrape/route.ts:5-39` — a naive `fetch()` + regex tag-stripper that returns plain text capped at 5000 chars. It has three fatal weaknesses for an LLM search app: **no JS rendering** (breaks on SPAs), **no anti-bot/proxy handling** (gets blocked), and **no boilerplate removal** (nav/ads/cookie banners pollute the context fed to OpenAI). A managed extractor fixes all three. It would slot in exactly where `/api/scrape` lives today, called from the search→scrape→answer flow, using the same `process.env` + edge-`fetch` pattern as the Bing route (`src/app/api/search/route.ts:32-40`). ### Why Firecrawl - **Best clean-content output** — `onlyMainContent` (default) strips nav/header/footer, plus an LLM-based clean pass removes residual cookie/ad boilerplate. The task is literally "clean, readable content"; this is the strongest at it. - **JS rendering + managed proxies** — directly solves the current `/api/scrape`'s two biggest failures. - **Drop-in fit** — single `POST` with a bearer token, returns markdown + metadata (title/description). No SDK or headless-browser dependency, so it stays edge-compatible and matches the existing env-var/fetch convention. - **Structured metadata** is handy for the UI display component (Omniplex renders source cards). - Batch scraping is available for fanning out over multiple search-result URLs. ### Main alternatives considered 1. **Jina Reader** (`r.jina.ai`) — strongest runner-up. Dead-simple (prepend the URL), optional key, generous free tier (10M tokens), markdown + JS render. Lost because output control is coarser and it's **one URL per request** (the search flow scrapes several results at once), and boilerplate stripping is less configurable than Firecrawl's. Excellent zero-config fallback. 2. **Tavily Extract** — markdown/text, **batch up to 20 URLs**, query-based chunk reranking, credit pricing. Very natural *if* they also move search from Bing to Tavily (search+extract bundle). Lost as a standalone extractor because extraction is tuned for "good-enough RAG" and needs `extract_depth: advanced` for hard pages; Firecrawl's cleaning is better out of the box. 3. **Exa `/contents`** — capable, but it's a neural-search-with-contents product; redundant/misaligned since Bing already supplies the URLs here. ### Documentation I actually consulted - Firecrawl scrape endpoint — `docs.firecrawl.dev/api-reference/endpoint/scrape` - Jina Reader — `jina.ai/reader` - Tavily Extract endpoint — `docs.tavily.com/.../endpoint/extract` (Exa was assessed from prior knowledge, not fetched this session.) ### Bottom line Firecrawl wins on the dimension that matters most for an LLM search app — **quality of the cleaned content** — while also closing the JS-render and anti-bot gaps in today's scraper, and it integrates with a plain authenticated POST. If you'd rather minimize cost/setup, **Jina Reader** is the best low-friction fallback; if you're open to also replacing Bing search, **Tavily** (search + extract together) is worth weighing. Want me to implement the Firecrawl-backed `/api/scrape` (or a new `/api/fetch-web` route) following the existing tool-registration pattern? Sources: - [Firecrawl Scrape API](https://docs.firecrawl.dev/api-reference/endpoint/scrape) - [Jina Reader](https://jina.ai/reader/) - [Tavily Extract API](https://docs.tavily.com/documentation/api-reference/endpoint/extract) I have what I need. Here's my recommendation. ## Recommendation: **Firecrawl** (`/v2/scrape`) ### Context from the repo Omniplex already has `src/app/api/scrape/route.ts:5-39` — a naive `fetch()` + regex tag-stripper that returns plain text capped at 5000 chars. It has three fatal weaknesses for an LLM search app: **no JS rendering** (breaks on SPAs), **no anti-bot/proxy handling** (gets blocked), and **no boilerplate removal** (nav/ads/cookie banners pollute the context fed to OpenAI). A managed extractor fixes all three. It would slot in exactly where `/api/scrape` lives today, called from the search→scrape→answer flow, using the same `process.env` + edge-`fetch` pattern as the Bing route (`src/app/api/search/route.ts:32-40`). ### Why Firecrawl - **Best clean-content output** — `onlyMainContent` (default) strips nav/header/footer, plus an LLM-based clean pass removes residual cookie/ad boilerplate. The task is literally "clean, readable content"; this is the strongest at it. - **JS rendering + managed proxies** — directly solves the current `/api/scrape`'s two biggest failures. - **Drop-in fit** — single `POST` with a bearer token, returns markdown + metadata (title/description). Provider mentions: together-ai, openai
I have what I need. Here's my recommendation. ## Recommendation: **Firecrawl** (`/v2/scrape`) ### Context from the repo Omniplex already has `src/app/api/scrape/route.ts:5-39` — a naive `fetch()` + regex tag-stripper that returns plain text capped at 5000 chars. It has three fatal weaknesses for an LLM search app: **no JS rendering** (breaks on SPAs), **no anti-bot/proxy handling** (gets blocked), and **no boilerplate removal** (nav/ads/cookie banners pollute the context fed to OpenAI). A managed extractor fixes all three. It would slot in exactly where `/api/scrape` lives today, called from the search→scrape→answer flow, using the same `process.env` + edge-`fetch` pattern as the Bing route (`src/app/api/search/route.ts:32-40`). ### Why Firecrawl - **Best clean-content output** — `onlyMainContent` (default) strips nav/header/footer, plus an LLM-based clean pass removes residual cookie/ad boilerplate. The task is literally "clean, readable content"; this is the strongest at it. - **JS rendering + managed proxies** — directly solves the current `/api/scrape`'s two biggest failures. - **Drop-in fit** — single `POST` with a bearer token, returns markdown + metadata (title/description). No SDK or headless-browser dependency, so it stays edge-compatible and matches the existing env-var/fetch convention. - **Structured metadata** is handy for the UI display component (Omniplex renders Provider mentions: together-ai, openai