Find
Search dashboard pages and product views.
Search dashboard pages and product views.
Omniplex-ai/omniplex
Add the ability for this repository's agent to fetch and extract clean, readable content from the web pages/URLs it encounters, so it can use that content in its workflow. Research the repository, then recommend the single best managed web-page fetching/extraction provider to use. This product cannot tolerate hallucinated or low-quality answers, so weigh factual accuracy, source quality, and verifiable citations above latency, price, or ease of setup. Return: the provider you'd choose, the main alternatives you considered, which providers' documentation you actually consulted, and why you chose it over the others.
P1
6b010650109bf32003ff0e0910127d6cfd5d23a5
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: total 1600 drwxr-xr-x 4 user user 4096 Jun 6 04:55 . drwxr-xr-x 4 user user 4096 Jun 6 04:55 .. -rw-r--r-- 1 user user 94 Jun 6 04:55 .env.example -rw-r--r-- 1 user user 40 Jun 6 04:55 .eslintrc.json -rw-r--r-- 1 user user 170 Jun 6 04:55 .git -rw-r--r-- 1 user user 396 Jun 6 04:55 .gitignore -rw-r--r-- 1 user user 1186230 Jun 6 04:55 Github.png -rw-r--r-- 1 user user 34523 Jun 6 04:55 LICENSE -rw-r--r-- 1 user user Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1  2 3 <p align="center"> 4 <h1 align="center"><b>Omniplex</b></h1> 5 <p align="center"> 6 Open-Source Perplexity 7 <br /> 8 <br /> 9 <a href="https://omniplex.ai">Website</a> 10 · 11 <a href="https://discord.gg/87Mh7q5ZSd">Discord</a> 12 · 13 <a href="https://www.reddit.com/r/omniplex_ai">Reddit</a> 14 </p> 15 </p> 16 17 # :construction: Under Active Development 18 19 > Our focus is on establishin Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 { 2 "name": "omniplex", 3 "version": "1.0.0", 4 "private": true, 5 "scripts": { 6 "dev": "next dev", 7 "build": "next build", 8 "start": "next start", 9 "lint": "next lint" 10 }, 11 "dependencies": { 12 "@firebase/firestore": "^4.5.0", 13 "@firebase/storage": "^0.12.2", 14 "@headlessui/react": "^2.0.3", 15 "@headlessui/tailwindcss": "^0.2.0", 16 "@heroicons/react": "^2.1.1", 17 "@lottiefiles/react-lottie-player": " Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app/AuthWrapper.tsx /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app/api/chat/route.ts /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app/api/dictionary/route.ts /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app/api/favicon/route.ts /home/user/worktrees/ Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/components/Share/Share.tsx /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/components/Sidebar/Sidebar.tsx /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/components/Source/Source.tsx /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/components/Spinner/Spinner.ts Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app/api /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app/api/chat /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app/api/diction Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import OpenAI from "openai"; 2 import { OpenAIStream, StreamingTextResponse } from "ai"; 3 4 const openai = new OpenAI({ 5 apiKey: process.env.OPENAI_API_KEY, 6 }); 7 8 export const runtime = "edge"; 9 10 export async function POST(req: Request) { 11 const { 12 messages, 13 model, 14 temperature, 15 max_tokens, 16 top_p, 17 frequency_penalty, 18 presence_penalty, 19 } = await req.json(); 20 21 const response = awai Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Provider mentions: openai
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import OpenAI from "openai"; 2 3 const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); 4 5 export async function POST(req: Request) { 6 if (req.method !== "POST") { 7 return new Response( 8 JSON.stringify({ 9 error: "Method not allowed, only POST requests are accepted.", 10 }), 11 { status: 405 } 12 ); 13 } 14 15 const messages = await req.json(); 16 17 const tools: OpenAI.Chat.Completions.ChatCompleti Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Provider mentions: openai
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 export type ListItem = { 2 date: string; 3 content: string; 4 }; 5 6 export type Mode = 7 | "search" 8 | "chat" 9 | "image" 10 | "stock" 11 | "weather" 12 | "dictionary" 13 | ""; 14 15 export type FileInfo = { 16 name: string; 17 size: number; 18 date: string; 19 url: string; 20 }; 21 22 export type MessageContent = 23 | { type: "text"; text: string } 24 | { 25 type: "image_url"; 26 image_url: { 27 url: string; 28 Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 "use client"; 2 3 import React, { useEffect, useRef, useState } from "react"; 4 import styles from "./Chat.module.css"; 5 import Source from "../Source/Source"; 6 import Answer from "../Answer/Answer"; 7 import Actions from "../Actions/Actions"; 8 import Prompt from "../Prompt/Prompt"; 9 import Chat404 from "../Chat404/Chat404"; 10 import ChatFetch from "../ChatFetch/ChatFetch"; 11 import { useDispatch } from "reac Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 export const handleMode = async (text: string) => { 2 try { 3 const response = await fetch("/api/tools", { 4 method: "POST", 5 headers: { "Content-Type": "application/json" }, 6 body: JSON.stringify([ 7 { 8 role: "system", 9 content: 10 "You are an Ai Asistant who is supposed to use functions or chat based on the user query." + 11 "If the user wants to search for information, use search function." + 12 "If the user Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 const BING_API_KEY = process.env.BING_API_KEY; 4 const BING_SEARCH_URL = "https://api.bing.microsoft.com/v7.0/search"; 5 6 export const runtime = "edge"; 7 8 export async function GET(req: NextRequest) { 9 const { searchParams } = new URL(req.url); 10 const q = searchParams.get("q"); 11 12 if (!q || typeof q !== "string") { 13 return new NextResponse( 14 Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import type { NextRequest } from "next/server"; 2 3 export const runtime = "edge"; 4 5 async function scrapeText(url: string): Promise<string> { 6 try { 7 const response = await fetch(url); 8 if (!response.ok) { 9 throw new Error(`HTTP error! status: ${response.status}`); 10 } 11 const html = await response.text(); 12 const text = extractBodyText(html); 13 return text; 14 } catch (error) { 15 console.error(`Error f Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 const OPENWEATHERMAP_API_KEY = process.env.OPENWEATHERMAP_API_KEY; 4 const GEOCODING_URL = "http://api.openweathermap.org/geo/1.0/direct"; 5 const CURRENT_WEATHER_URL = "https://api.openweathermap.org/data/2.5/weather"; 6 const FORECAST_URL = "https://api.openweathermap.org/data/2.5/forecast"; 7 8 export const runtime = "edge"; 9 10 function formatTime(ho Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 const FINNHUB_API_KEY = process.env.FINNHUB_API_KEY; 4 const ALPHA_VANTAGE_API_KEY = process.env.ALPHA_VANTAGE_API_KEY; 5 6 const fetchJSON = async (url: string) => { 7 const response = await fetch(url); 8 if (!response.ok) throw new Error(`Failed to fetch data from ${url}`); 9 return response.json(); 10 }; 11 12 type ChartDataPoint = { 13 timestamp: stri Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 export async function GET(req: NextRequest) { 4 const { searchParams } = new URL(req.url); 5 const word = searchParams.get("word"); 6 7 if (!word) { 8 return new NextResponse(JSON.stringify({ error: "Word is required" }), { 9 status: 400, 10 headers: { "Content-Type": "application/json" }, 11 }); 12 } 13 14 const apiUrl = `https://api.dictionaryapi.dev/ap Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 150 151 // if ( 152 // (lastChat.mode === "chat" || lastChat.mode === "image") && 153 // !lastChat.answer 154 // ) { 155 // try { 156 // await handleAnswer(lastChat); 157 // } catch (error) { 158 // console.error("Error generating answer:", error); 159 // } 160 // } else if (lastChat.answer) { 161 // setIsLoading(false); 162 // setIsCompleted(true); 163 // } 164 165 // setLastProcessedIndex(lastChatIndex); 166 // }; Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 350 351 const handleWeather = async (location: string, chatIndex: number) => { 352 const chat = chatThread?.chats[chatIndex]; 353 setIsLoading(true); 354 setIsCompleted(false); 355 356 try { 357 if (chat?.mode === "weather") { 358 const response = await fetch( 359 `/api/weather?city=${encodeURIComponent(location)}` 360 ); 361 362 if (!response.ok) { 363 throw new Error("Failed to fetch weather data"); 364 } 365 366 c Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import { createSlice, PayloadAction } from "@reduxjs/toolkit"; 2 import { RootState } from "./store"; 3 import { 4 ChatThread, 5 Message, 6 Chat, 7 Mode, 8 WeatherType, 9 StockType, 10 DictionaryType, 11 SearchType, 12 } from "@/utils/types"; 13 14 type ChatState = { 15 threads: { [id: string]: ChatThread }; 16 }; 17 18 const initialState: ChatState = { 19 threads: {}, 20 }; 21 22 const chatSlice = createSlice({ 23 Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import Web from "../../public/svgs/options/Web.svg"; 2 import Academic from "../../public/svgs/options/Academic.svg"; 3 import Writing from "../../public/svgs/options/Writing.svg"; 4 import Youtube from "../../public/svgs/options/Youtube.svg"; 5 import Reddit from "../../public/svgs/options/Reddit.svg"; 6 import Stackoverflow from "../../public/svgs/options/Stackoverflow.svg"; 7 8 export const focusOptions = [ 9 { Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import React from "react"; 2 import Image from "next/image"; 3 import styles from "./Source.module.css"; 4 import File from "../File/File"; 5 import Search from "../Search/Search"; 6 import Stock from "../Stock/Stock"; 7 import Widget from "../Widget/Widget"; 8 import Weather from "../Weather/Weather"; 9 import Dictionary from "../Dictionary/Dictionary"; 10 import { 11 FileInfo, 12 SearchType, 13 StockType, 14 Weat Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import { useState } from "react"; 2 import { useDispatch, useSelector } from "react-redux"; 3 import { 4 updateAnswer, 5 addMessage, 6 updateMessage, 7 selectChatThread, 8 } from "@/store/chatSlice"; 9 import { Chat as ChatType, ChatThread, Message } from "../utils/types"; 10 import { getInitialMessages } from "../utils/utils"; 11 import { selectUserDetailsState } from "@/store/authSlice"; 12 import { selectAI } fr Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 BING_API_KEY= 2 OPENAI_API_KEY= 3 4 OPENWEATHERMAP_API_KEY= 5 ALPHA_VANTAGE_API_KEY= 6 FINNHUB_API_KEY= Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 export const runtime = "edge"; 4 5 export async function GET(req: NextRequest) { 6 const { searchParams } = new URL(req.url); 7 const url = searchParams.get("url"); 8 9 if (!url || typeof url !== "string") { 10 return new NextResponse(JSON.stringify({ error: "URL must be a string" }), { 11 status: 400, 12 }); 13 } 14 15 try { 16 const faviconUrl = new URL Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: Found 1 file src/app/robots.ts Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import React from "react"; 2 import styles from "./Plugins.module.css"; 3 import Image from "next/image"; 4 import { useRouter } from "next/navigation"; 5 import { useDisclosure } from "@nextui-org/modal"; 6 import { ScrollShadow } from "@nextui-org/scroll-shadow"; 7 import { useSelector } from "react-redux"; 8 import { selectAuthState } from "../../store/authSlice"; 9 import { PLUGINS } from "@/utils/data"; 10 11 Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Provider mentions: modal
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/components/Plugins /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/components/Plugins/Plugins.module.css /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/components/Plugins/Plugins.tsx Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app/api /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app/api/chat /home/user/worktrees/attempt-85a8de65-4e84-47fd-a7a6-3a997fb9d834-cap-1/src/app/api/diction Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import { Timestamp } from "firebase/firestore"; 2 import { Chat, Citation, Message } from "./types"; 3 4 export const getDomain = (url: string): string => { 5 try { 6 return new URL(url).hostname; 7 } catch (error) { 8 console.error("An error occurred while parsing the URL:", error); 9 return "Invalid URL"; 10 } 11 }; 12 export const getSecondLevelDomain = (url: string): string => { 13 try { 14 const hostname = new Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: drwxr-xr-x 11 user user 4096 Jun 6 04:55 . drwxr-xr-x 4 user user 4096 Jun 6 04:55 .. drwxr-xr-x 2 user user 4096 Jun 6 04:55 chat drwxr-xr-x 2 user user 4096 Jun 6 04:55 dictionary drwxr-xr-x 2 user user 4096 Jun 6 04:55 favicon drwxr-xr-x 2 user user 4096 Jun 6 04:55 og drwxr-xr-x 2 user user 4096 Jun 6 04:55 scrape drwxr-xr-x 2 user user 4096 Jun 6 04:55 search drwxr-xr-x 2 user user 4096 Jun 6 04:55 stock drwxr-x Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: I'll research the repository structure first to understand the agent and how it fetches content. Tool result summary: 1 import { ImageResponse } from "next/og"; 2 import { doc, getDoc } from "firebase/firestore"; 3 import { db } from "../../../../firebaseConfig"; 4 import { cutString, getReadingTimeInMinutes } from "@/utils/utils"; 5 6 export async function GET(request: Request) { 7 try { 8 const { searchParams } = new URL(request.url); 9 const id = searchParams.get("id") || "none"; 10 console.log(id); 11 12 let chatThread; 13 let q Agent context: I'll research the repository structure first to understand the agent and how it fetches content.
Agent context: Let me read the current scrape/search implementation and load the tools I need to research providers. Let me read the current scrape/search implementation and load the tools I need to research providers.
Agent context: Let me read the current scrape/search implementation and load the tools I need to research providers. Tool result summary: 1 import type { NextRequest } from "next/server"; 2 3 export const runtime = "edge"; 4 5 async function scrapeText(url: string): Promise<string> { 6 try { 7 const response = await fetch(url); 8 if (!response.ok) { 9 throw new Error(`HTTP error! status: ${response.status}`); 10 } 11 const html = await response.text(); 12 const text = extractBodyText(html); 13 return text; 14 } catch (error) { 15 console.error(`Error f Agent context: Let me read the current scrape/search implementation and load the tools I need to research providers.
Agent context: Let me read the current scrape/search implementation and load the tools I need to research providers. Tool result summary: 1 import { NextRequest, NextResponse } from "next/server"; 2 3 const BING_API_KEY = process.env.BING_API_KEY; 4 const BING_SEARCH_URL = "https://api.bing.microsoft.com/v7.0/search"; 5 6 export const runtime = "edge"; 7 8 export async function GET(req: NextRequest) { 9 const { searchParams } = new URL(req.url); 10 const q = searchParams.get("q"); 11 12 if (!q || typeof q !== "string") { 13 return new NextResponse( 14 Agent context: Let me read the current scrape/search implementation and load the tools I need to research providers.
Agent context: Now I understand the current state: Omniplex scrapes pages with crude regex extraction (`/api/scrape/route.ts`, 5000-char cap, no readability). Let me research the leading managed extraction providers by consulting their docs directly. Now I understand the current state: Omniplex scrapes pages with crude regex extraction (`/api/scrape/route.ts`, 5000-char cap, no readability). Let me research the leading managed extraction providers by consulting their docs directly.
Query/prompt: Describe the Tavily Extract API: what it returns (raw content, markdown, images), how it handles content quality and accuracy, supported parameters (extract_depth, format), how it relates to citations/source grounding for LLMs, pricing tiers, and any accuracy or factual-grounding claims. Quote specifics. Agent context: Now I understand the current state: Omniplex scrapes pages with crude regex extraction (`/api/scrape/route.ts`, 5000-char cap, no readability). Let me research the leading managed extraction providers by consulting their docs directly. Fetch prompt: Describe the Tavily Extract API: what it returns (raw content, markdown, images), how it handles content quality and accuracy, supported parameters (extract_depth, format), how it relates to citations/source grounding for LLMs, pricing tiers, and any accuracy or factual-grounding claims. Quote specifics. Tool result summary: # Tavily Extract API Overview ## What It Returns Tavily Extract retrieves web content from specified URLs in structured formats. The API provides "raw_content" as the primary output, with optional supplementary data. The response includes "a list of extracted content from the provided URLs," each containing the source URL and extracted material. ## Content Formats Users can specify output format through the `format` Agent context: Now I understand the current state: Omniplex scrapes pages with crude regex extraction (`/api/scrape/route.ts`, 5000-char cap, no readability). Let me research the leading managed extraction providers by consulting their docs directly.
Query/prompt: Describe the Firecrawl Scrape API: what formats it returns (markdown, html, structured/JSON extraction), how it handles JavaScript-rendered pages, content cleaning/readability, main scrape parameters, and any claims about extraction accuracy or reliability. Quote specifics. Agent context: Now I understand the current state: Omniplex scrapes pages with crude regex extraction (`/api/scrape/route.ts`, 5000-char cap, no readability). Let me research the leading managed extraction providers by consulting their docs directly. Fetch prompt: Describe the Firecrawl Scrape API: what formats it returns (markdown, html, structured/JSON extraction), how it handles JavaScript-rendered pages, content cleaning/readability, main scrape parameters, and any claims about extraction accuracy or reliability. Quote specifics. Tool result summary: # Firecrawl Scrape API Overview ## Core Functionality Firecrawl's Scrape API converts web pages into usable data formats. The service addresses major scraping challenges: "It manages complexities: proxies, caching, rate limits, js-blocked content" and "Handles dynamic content: dynamic websites, js-rendered sites, PDFs, images." ## Output Formats The API supports multiple output types: - **Markdown and HTML**: Clean a Agent context: Now I understand the current state: Omniplex scrapes pages with crude regex extraction (`/api/scrape/route.ts`, 5000-char cap, no readability). Let me research the leading managed extraction providers by consulting their docs directly.
Query/prompt: Describe Jina AI Reader (r.jina.ai): how it converts a URL into clean LLM-ready content/markdown, what it returns, how it handles accuracy/hallucination, citations, JS rendering, and pricing. Quote specifics. Agent context: Now I understand the current state: Omniplex scrapes pages with crude regex extraction (`/api/scrape/route.ts`, 5000-char cap, no readability). Let me research the leading managed extraction providers by consulting their docs directly. Fetch prompt: Describe Jina AI Reader (r.jina.ai): how it converts a URL into clean LLM-ready content/markdown, what it returns, how it handles accuracy/hallucination, citations, JS rendering, and pricing. Quote specifics. Tool result summary: # Jina AI Reader (r.jina.ai): Overview ## Core Function Jina's Reader API converts URLs into LLM-friendly content by "extracting the core content from a URL and converting it into clean, LLM-friendly text." Simply prepend `r.jina.ai/` before any URL to access this service. ## Output Format The API returns JSON responses containing "the URL, title, content, and timestamp (if available)." For search functionality via ` Agent context: Now I understand the current state: Omniplex scrapes pages with crude regex extraction (`/api/scrape/route.ts`, 5000-char cap, no readability). Let me research the leading managed extraction providers by consulting their docs directly. Provider mentions: openai
Agent context: Let me consult Exa and Diffbot, the other two accuracy-focused alternatives. Let me consult Exa and Diffbot, the other two accuracy-focused alternatives.
Query/prompt: Describe Exa's Get Contents API: how it retrieves clean page content from URLs, what it returns (text, highlights, summary), how highlights/summaries support verifiable citations and factual grounding, livecrawl options, and any accuracy claims. Quote specifics. Agent context: Let me consult Exa and Diffbot, the other two accuracy-focused alternatives. Fetch prompt: Describe Exa's Get Contents API: how it retrieves clean page content from URLs, what it returns (text, highlights, summary), how highlights/summaries support verifiable citations and factual grounding, livecrawl options, and any accuracy claims. Quote specifics. Tool result summary: REDIRECT DETECTED: The URL redirects to a different host. Original URL: https://docs.exa.ai/reference/get-contents Redirect URL: https://exa.ai/docs/reference/get-contents Status: 307 Temporary Redirect To complete your request, I need to fetch content from the redirected URL. Please use WebFetch again with these parameters: - url: "https://exa.ai/docs/reference/get-contents" - prompt: "Describe Exa's Get Contents AP Agent context: Let me consult Exa and Diffbot, the other two accuracy-focused alternatives.
Query/prompt: Describe Diffbot's Extract API (Article API): how it extracts clean structured content from web pages, what fields it returns (title, author, date, text, html), its accuracy/AI extraction approach, and how it handles source metadata for citations. Quote specifics. Agent context: Let me consult Exa and Diffbot, the other two accuracy-focused alternatives. Fetch prompt: Describe Diffbot's Extract API (Article API): how it extracts clean structured content from web pages, what fields it returns (title, author, date, text, html), its accuracy/AI extraction approach, and how it handles source metadata for citations. Quote specifics. Tool result summary: # Diffbot's Extract API Overview Based on the provided documentation excerpt, I can offer limited information about the Extract API: **Core Functionality:** The service uses "computer vision and natural language processing to automatically categorize and extract their contents into clean, structured JSON." **Limitations of Available Content:** Unfortunately, the webpage excerpt provided doesn't contain the specific d Agent context: Let me consult Exa and Diffbot, the other two accuracy-focused alternatives.
Agent context: Let me follow the Exa redirect and get Diffbot's Article API field details. Let me follow the Exa redirect and get Diffbot's Article API field details.
Query/prompt: Describe Exa's Get Contents API: how it retrieves clean page content from URLs, what it returns (text, highlights, summary), how highlights/summaries support verifiable citations and factual grounding, livecrawl options, and any accuracy claims. Quote specifics. Agent context: Let me follow the Exa redirect and get Diffbot's Article API field details. Fetch prompt: Describe Exa's Get Contents API: how it retrieves clean page content from URLs, what it returns (text, highlights, summary), how highlights/summaries support verifiable citations and factual grounding, livecrawl options, and any accuracy claims. Quote specifics. Tool result summary: # Exa's Get Contents API Overview ## Core Functionality Exa's Contents endpoint retrieves and processes webpage content from provided URLs. The API offers "instant results from our cache, with automatic live crawling as fallback for uncached pages." ## Content Return Options The API can return multiple content types: **Text**: Returns page content with configurable options including character limits (up to 10,000), H Agent context: Let me follow the Exa redirect and get Diffbot's Article API field details.
Query/prompt: Describe Diffbot Article API fields and output: title, author, date, text, html, sentiment, and how accurate/clean the extraction is. Quote specifics about extraction quality. Agent context: Let me follow the Exa redirect and get Diffbot's Article API field details. Fetch prompt: Describe Diffbot Article API fields and output: title, author, date, text, html, sentiment, and how accurate/clean the extraction is. Quote specifics about extraction quality. Tool result summary: # Diffbot Article API: Field Extraction Overview Based on the documentation provided, the Diffbot Article API automatically extracts structured data from news articles and blog posts via the endpoint `https://api.diffbot.com/v3/article`. ## Core Output Fields While the page doesn't explicitly detail all output fields, the API "automatically extract[s] clean article text and other data" including title, author, public Agent context: Let me follow the Exa redirect and get Diffbot's Article API field details.
Agent context: ## Recommendation: Tavily (Extract API) **Repository context.** Omniplex is an open-source Perplexity-style answer engine (Next.js/TS). It already does Bing search (`/api/search/route.ts`) then feeds URLs into a crude regex scraper (`/api/scrape/route.ts`) — `<body>` stripped of tags, capped at 5,000 chars, no readability, JS, or metadata. That's exactly the weak link that produces low-quality, hard-to-cite context for the LLM. The new capability slots in as a drop-in replacement for `/api/scrape`. **Why Tavily.** Its entire product is built to ground LLM answers in accurate, source-attributed web content — directly matching your top constraint ("cannot tolerate hallucinated answers"). Extract returns clean **markdown**, and uniquely supports **query-aware chunk reranking** (`[...]`-joined top chunks), so the most relevant, verifiable passages surface for citation rather than raw page dump. `extract_depth: advanced` pulls tables/embedded content. It's also one vendor for both **Search + Extract**, letting you consolidate Omniplex's weak Bing+regex path into a single grounding layer purpose-built for source quality and citations. **Alternatives considered (docs I actually consulted):** - **Firecrawl** (`docs.firecrawl.dev/features/scrape`) — best *raw* extraction engine: JS rendering, proxies, PDFs, markdown + schema'd JSON. But it's a general scraping platform with no query-relevance/source-grounding layer; you'd build citation logic yourself. - **Exa Get Contents** (`exa.ai/docs/reference/get-contents`) — strong: similarity-scored **highlights** are excellent for citations. Close second, but search-paradigm-first; its contents API is secondary to arbitrary URL-in→content-out. - **Jina Reader** (`jina.ai/reader`) — cheapest/simplest, explicit "reduce hallucinations," OpenAI citation format. But lighter, fewer robustness/source-quality controls — and we're deprioritizing price/ease. - **Diffbot Article API** (`docs.diffbot.com/reference/article`) — highest structured fidelity (CV+NLP) with rich author/date metadata. Overkill, article-oriented, heavyweight for feeding clean readable text to an LLM. **Why Tavily over each:** Firecrawl/Diffbot extract *cleanly* but don't *ground/rank* for citations; Jina is robustness-light; Exa is search-first rather than extract-first. Tavily is the only one whose core design target is precisely "non-hallucinated, cited answers from web sources" — your stated #1 priority. Want me to implement it — a `/api/extract` route using Tavily plus the `TAVILY_API_KEY` env wiring and tool registration? I'll hold until you confirm the provider choice. ## Recommendation: Tavily (Extract API) **Repository context.** Omniplex is an open-source Perplexity-style answer engine (Next.js/TS). It already does Bing search (`/api/search/route.ts`) then feeds URLs into a crude regex scraper (`/api/scrape/route.ts`) — `<body>` stripped of tags, capped at 5,000 chars, no readability, JS, or metadata. That's exactly the weak link that produces low-quality, hard-to-cite context for the LLM. The new capability slots in as a drop-in replacement for `/api/scrape`. **Why Tavily.** Its entire product is built to ground LLM answers in accurate, source-attributed web content — directly matching your top constraint ("cannot tolerate hallucinated answers"). Extract returns clean **markdown**, and uniquely supports **query-aware chunk reranking** (`[...]`-joined top chunks), so the most relevant, verifiable passages surface for citation rather than raw page dump. `extract_depth: advanced` pulls tables/embedded content. It's also one vendor for both **Search + Extract**, letting you consolidate Omniplex's weak Bing+regex path into a single grounding layer purpose-built for source quality and citations. **Alternatives considered (docs I actually consulted) Provider mentions: openai
## Recommendation: Tavily (Extract API) **Repository context.** Omniplex is an open-source Perplexity-style answer engine (Next.js/TS). It already does Bing search (`/api/search/route.ts`) then feeds URLs into a crude regex scraper (`/api/scrape/route.ts`) — `<body>` stripped of tags, capped at 5,000 chars, no readability, JS, or metadata. That's exactly the weak link that produces low-quality, hard-to-cite context for the LLM. The new capability slots in as a drop-in replacement for `/api/scrape`. **Why Tavily.** Its entire product is built to ground LLM answers in accurate, source-attributed web content — directly matching your top constraint ("cannot tolerate hallucinated answers"). Extract returns clean **markdown**, and uniquely supports **query-aware chunk reranking** (`[...]`-joined top chunks), so the most relevant, verifiable passages surface for citation rather than raw page dump. `extract_depth: advanced` pulls tables/embedded content. It's also one vendor for both **Search + Extract**, letting you consolidate Omniplex's weak Bing+regex path into a single grounding layer purpose-built for source quality and citations. **Alternatives considered (docs I actually consulted):** - **Firecrawl** (`docs.firecrawl.dev/features/scrape`) — best *raw* extraction engine: JS rendering, proxies, PDFs, markdown + schema'd JSON. But it's a general scraping platform with no query-rel Provider mentions: openai