Find
Search dashboard pages and product views.
Search dashboard pages and product views.
Balaji-R-05/askdocs-ai
Add live web search/retrieval to this repository so its agent can ground its answers in current web information. Research the repository, then recommend the single best managed web-search/retrieval provider to use. The agent needs access to high-quality, reputable, and licensed sources rather than generic scraped search-engine results, so weigh source coverage and quality heavily. Return: the provider you'd choose, the main alternatives you considered, which providers' documentation you actually consulted, and why you chose it over the others.
P3
26bc632a0bed22f035740edd34ee3e1fc2420e9f
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: /home/user/worktrees/attempt-5f5cfce6-2c1d-4d7b-b853-3d923d6aad39-cap-2/requirements.client.txt /home/user/worktrees/attempt-5f5cfce6-2c1d-4d7b-b853-3d923d6aad39-cap-2/requirements.server.txt Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: total 56 drwxr-xr-x 5 user user 4096 Jun 6 04:50 . drwxr-xr-x 4 user user 4096 Jun 6 04:50 .. -rw-r--r-- 1 user user 61 Jun 6 04:50 .dockerignore -rw-r--r-- 1 user user 170 Jun 6 04:50 .git -rw-r--r-- 1 user user 118 Jun 6 04:50 .gitignore -rw-r--r-- 1 user user 282 Jun 6 04:50 Dockerfile.client -rw-r--r-- 1 user user 248 Jun 6 04:50 Dockerfile.server -rw-r--r-- 1 user user 2863 Jun 6 04:50 README.md drwxr-xr-x 2 use Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 # AskDocs AI: AI-Powered PDF Q&A Bot 2 3 **AskDocs AI** is an AI-powered chatbot that leverages **Hybrid RAG (Retrieval-Augmented Generation)** to answer your questions based on the content of uploaded PDFs. It combines semantic vector search with traditional keyword-based search for superior accuracy. 4 5 <img src="./images/landing_page.png" alt="Landing Page"> 6 <img src="./images/chat_interface.png" alt="Chat In Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 --extra-index-url https://download.pytorch.org/whl/cpu 2 torch 3 langchain-chroma 4 langchain-community 5 langchain-text-splitters 6 langchain-huggingface 7 langchain-groq 8 langchain-classic 9 fastapi 10 uvicorn[standard] 11 python-multipart 12 python-dotenv 13 sentence-transformers 14 pypdf 15 pydantic-settings 16 rank_bm25 17 sqlalchemy[asyncio] 18 asyncpg 19 fastapi-users[sqlalchemy] 20 bcrypt Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 streamlit 2 requests 3 python-dotenv Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: total 40 drwxr-xr-x 6 user user 4096 Jun 6 04:50 . drwxr-xr-x 5 user user 4096 Jun 6 04:50 .. -rw-r--r-- 1 user user 47 Jun 6 04:50 __init__.py -rw-r--r-- 1 user user 1284 Jun 6 04:50 config.py drwxr-xr-x 2 user user 4096 Jun 6 04:50 db -rw-r--r-- 1 user user 5022 Jun 6 04:50 main.py drwxr-xr-x 2 user user 4096 Jun 6 04:50 modules drwxr-xr-x 2 user user 4096 Jun 6 04:50 schemas drwxr-xr-x 2 user user 4096 Jun 6 04:50 Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 import os 2 from pathlib import Path 3 from pydantic_settings import BaseSettings, SettingsConfigDict 4 from dotenv import load_dotenv, find_dotenv 5 6 load_dotenv(find_dotenv(), override=True) 7 8 BASE_DIR = Path(__file__).resolve().parent.parent 9 10 class Settings(BaseSettings): 11 PROJECT_NAME: str = "AskDocs AI API" 12 13 # Storage 14 CHROMA_DB_PATH: str = os.getenv("CHROMA_DB_PATH", "chroma_db") 15 TEMP_UPLOA Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Provider mentions: openai
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 import os 2 import shutil 3 import logging 4 from typing import List 5 from contextlib import asynccontextmanager 6 from fastapi import FastAPI, UploadFile, File, HTTPException, Depends 7 from fastapi.middleware.cors import CORSMiddleware 8 from fastapi.concurrency import run_in_threadpool 9 from pydantic import BaseModel 10 from config import settings 11 from modules.vectorstore import process_pdfs, get_vectorstor Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: total 28 drwxr-xr-x 2 user user 4096 Jun 6 04:50 . drwxr-xr-x 6 user user 4096 Jun 6 04:50 .. -rw-r--r-- 1 user user 63 Jun 6 04:50 __init__.py -rw-r--r-- 1 user user 1736 Jun 6 04:50 auth.py -rw-r--r-- 1 user user 1609 Jun 6 04:50 llm.py -rw-r--r-- 1 user user 4917 Jun 6 04:50 vectorstore.py Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 import os 2 import hashlib 3 import logging 4 from concurrent.futures import ThreadPoolExecutor 5 from langchain_chroma import Chroma 6 from langchain_community.document_loaders import PyPDFLoader 7 from langchain_community.retrievers import BM25Retriever 8 from langchain_core.documents import Document 9 from langchain_text_splitters import RecursiveCharacterTextSplitter 10 from langchain_huggingface import Hugging Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 from langchain_groq import ChatGroq 2 from langchain_classic.chains import RetrievalQA 3 from langchain_classic.retrievers import EnsembleRetriever 4 from config import settings 5 from modules.vectorstore import get_bm25_retriever 6 import logging 7 8 logger = logging.getLogger(__name__) 9 10 def get_llm_chain(vectorstore): 11 if vectorstore is None: 12 return None 13 14 if not settings.GROQ_API_KEY: 15 raise Value Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 import uuid 2 from typing import Optional 3 from fastapi import Depends, Request 4 from fastapi_users import BaseUserManager, FastAPIUsers, UUIDIDMixin 5 from fastapi_users.authentication import ( 6 AuthenticationBackend, 7 BearerTransport, 8 JWTStrategy, 9 ) 10 from fastapi_users.db import SQLAlchemyUserDatabase 11 from sqlalchemy.ext.asyncio import AsyncSession 12 13 from config import settings 14 from db.databas Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 import streamlit as st 2 from api_client import upload_pdfs, query_ai, get_health 3 from config import config 4 from auth_utils import auth_sidebar 5 6 st.set_page_config( 7 page_title=config.APP_TITLE, 8 page_icon="🤖", 9 layout="wide" 10 ) 11 12 st.markdown(""" 13 <style> 14 .stApp { 15 background-color: #0e1117; 16 color: #ffffff; 17 } 18 .main-header { 19 padding-top: 2rem; 20 padding-bottom: 2rem; 21 text-alig Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: total 16 drwxr-xr-x 2 user user 4096 Jun 6 04:50 . drwxr-xr-x 6 user user 4096 Jun 6 04:50 .. -rw-r--r-- 1 user user 418 Jun 6 04:50 database.py -rw-r--r-- 1 user user 196 Jun 6 04:50 models.py services: # --- Backend Server --- server: container_name: askdocs-server image: askdocs-server-image build: context: . dockerfile: Dockerfile.server ports: - "8000:8000" env_file: - .env volumes: - ./chroma_db:/app/chroma_db Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 from fastapi_users.db import SQLAlchemyBaseUserTableUUID 2 from sqlalchemy.orm import DeclarativeBase 3 4 class Base(DeclarativeBase): 5 pass 6 7 class User(SQLAlchemyBaseUserTableUUID, Base): 8 pass 9 Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 from typing import AsyncGenerator 2 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine 3 from config import settings 4 5 engine = create_async_engine(settings.DATABASE_URL) 6 async_session_maker = async_sessionmaker(engine, expire_on_commit=False) 7 8 async def get_async_session() -> AsyncGenerator[AsyncSession, None]: 9 async with async_session_maker() as session: 10 yield ses Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 import requests 2 import logging 3 from config import config 4 5 logger = logging.getLogger(__name__) 6 7 def _handle_response(response): 8 """Internal helper to handle responses safely.""" 9 try: 10 response.raise_for_status() 11 return response.json() 12 except requests.exceptions.JSONDecodeError: 13 logger.error(f"Failed to decode JSON from response: {response.text[:100]}") 14 return {"error": f"Server returned Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 26bc632 feat: Implemented auth ecc3a1b feat: Implemented hybrid search (BM25 + Chroma) dad061b Implemented concurrency in doc parsing 40436e6 Dockerized application and split dependencies d1fd604 Restructured the project into client and server d0f96fa Updated llm model 6b79633 Merge branch 'main' of https://github.com/Balaji-R-05/askdocs-ai 7cde7bc Resolved module conflicts 9ac6c6b Delete chroma_store directory 0a555 Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider. Tool result summary: 1 import sys 2 import os 3 import logging 4 5 # Add server directory to path 6 sys.path.append(os.path.abspath("server")) 7 8 from modules.vectorstore import get_vectorstore, get_bm25_retriever 9 from modules.llm import get_llm_chain 10 11 logging.basicConfig(level=logging.INFO) 12 13 def test_hybrid(): 14 print("[INIT] Initializing test for Hybrid Search...") 15 vectorstore = get_vectorstore() 16 print("[OK] Vectors Agent context: I'll research the repository to understand its architecture and agent setup before recommending a provider.
Agent context: Now I understand the repo: a Python/FastAPI + LangChain RAG app using Groq LLM with hybrid BM25 + Chroma retrieval over uploaded PDFs. Let me research managed web-search providers, focusing on source quality and licensing. Loading web tools. Now I understand the repo: a Python/FastAPI + LangChain RAG app using Groq LLM with hybrid BM25 + Chroma retrieval over uploaded PDFs. Let me research managed web-search providers, focusing on source quality and licensing. Loading web tools.
Agent context: Let me research the leading candidates in parallel, focusing on those emphasizing licensed/premium sources. Let me research the leading candidates in parallel, focusing on those emphasizing licensed/premium sources.
Query/prompt: best web search API for LLM agents 2026 licensed sources Tavily Exa Linkup comparison Agent context: Let me research the leading candidates in parallel, focusing on those emphasizing licensed/premium sources. Query: best web search API for LLM agents 2026 licensed sources Tavily Exa Linkup comparison 10 search result links captured Tool result summary: Based on the search results, here's a comprehensive comparison of the best web search APIs for LLM agents in 2026, focusing on Tavily, Exa, and Linkup: ## Overview of Top Competitors The best LLM search APIs in 2026 include Tavily, Brave LLM Context API, Exa, Perplexity API, and Google Programmable Search Engine, each optimized for different aspects of AI agent and RAG pipeline performance. ## Key Differences **Tavil Agent context: Let me research the leading candidates in parallel, focusing on those emphasizing licensed/premium sources.
Query/prompt: Linkup search API licensed premium content sources LLM grounding Agent context: Let me research the leading candidates in parallel, focusing on those emphasizing licensed/premium sources. Query: Linkup search API licensed premium content sources LLM grounding 10 search result links captured Tool result summary: Here are the search results for Linkup's search API with licensed premium content sources for LLM grounding: ## Overview Linkup is a French startup building an API that lets developers access web content from premium, trusted sources and hand the results to a large language model (LLM) to enrich its answers. ## How It Works This workflow is called Retrieval-Augmented Generation (or RAG). Linkup signs content licensin Agent context: Let me research the leading candidates in parallel, focusing on those emphasizing licensed/premium sources.
Query/prompt: Valyu search API licensed proprietary data LLM RAG Agent context: Let me research the leading candidates in parallel, focusing on those emphasizing licensed/premium sources. Query: Valyu search API licensed proprietary data LLM RAG 6 search result links captured Tool result summary: Based on the search results, here's what I found about Valyu's search API: ## Overview Valyu provides an AI search API that gives access to web search and 36+ proprietary data sources, including PubMed, SEC filings, clinical trials, arXiv, patents, and real-time financial data through one unified API. ## Key Features for LLM and RAG Applications Valyu is a multimodal retrieval API designed to enrich AI's context with Agent context: Let me research the leading candidates in parallel, focusing on those emphasizing licensed/premium sources.
Agent context: Both Linkup and Valyu emphasize licensed sources. Let me consult their actual docs plus Tavily's, focusing on LangChain integration (the repo uses LangChain's EnsembleRetriever). Both Linkup and Valyu emphasize licensed sources. Let me consult their actual docs plus Tavily's, focusing on LangChain integration (the repo uses LangChain's EnsembleRetriever).
Query/prompt: Summarize Linkup's search API: what sources/content it covers (licensed publishers?), endpoints, output formats (sourced answers, raw search results), depth options, and whether there is a LangChain integration or Python SDK. Note any details about source quality and licensing. Agent context: Both Linkup and Valyu emphasize licensed sources. Let me consult their actual docs plus Tavily's, focusing on LangChain integration (the repo uses LangChain's EnsembleRetriever). Fetch prompt: Summarize Linkup's search API: what sources/content it covers (licensed publishers?), endpoints, output formats (sourced answers, raw search results), depth options, and whether there is a LangChain integration or Python SDK. Note any details about source quality and licensing. Tool result summary: # Linkup Search API Summary **Coverage & Sources** The documentation doesn't specify licensed publishers or content sources. It describes the API as optimized "for AI consumption" with agentic retrieval capabilities, but doesn't detail source licensing or publisher agreements. **Endpoints** Linkup offers a single `/search` endpoint with configurable parameters for domain filtering, date ranges, result limits, and ima Agent context: Both Linkup and Valyu emphasize licensed sources. Let me consult their actual docs plus Tavily's, focusing on LangChain integration (the repo uses LangChain's EnsembleRetriever).
Query/prompt: Summarize Valyu's search API: what data sources it covers (licensed/proprietary, academic, paywalled), endpoints, output formats, whether it returns full content, pricing model, and whether there is a LangChain integration or Python SDK. Note details about source quality and licensing. Agent context: Both Linkup and Valyu emphasize licensed sources. Let me consult their actual docs plus Tavily's, focusing on LangChain integration (the repo uses LangChain's EnsembleRetriever). Fetch prompt: Summarize Valyu's search API: what data sources it covers (licensed/proprietary, academic, paywalled), endpoints, output formats, whether it returns full content, pricing model, and whether there is a LangChain integration or Python SDK. Note details about source quality and licensing. Tool result summary: # Valyu Search API Summary **Data Sources:** Valyu covers web search, 36+ proprietary sources including PubMed, SEC filings, clinical trials, arXiv, patents, and real-time financial data. It includes academic papers, textbooks, journals, peer-reviewed medical literature, and FDA drug labels from licensed providers. **Endpoints & Output:** The documentation mentions a `/v1/search` endpoint. Code examples show results Agent context: Both Linkup and Valyu emphasize licensed sources. Let me consult their actual docs plus Tavily's, focusing on LangChain integration (the repo uses LangChain's EnsembleRetriever).
Query/prompt: Summarize Tavily's LangChain integration: what retriever/tool classes are available, how to use them in a LangChain RAG pipeline, and what sources Tavily searches (general web vs licensed). Note source quality details. Agent context: Both Linkup and Valyu emphasize licensed sources. Let me consult their actual docs plus Tavily's, focusing on LangChain integration (the repo uses LangChain's EnsembleRetriever). Fetch prompt: Summarize Tavily's LangChain integration: what retriever/tool classes are available, how to use them in a LangChain RAG pipeline, and what sources Tavily searches (general web vs licensed). Note source quality details. Tool result summary: # Tavily's LangChain Integration Summary ## Available Tool Classes The `langchain-tavily` package provides six primary tools: 1. **TavilySearch**: Executes web searches with customizable parameters like `max_results`, `topic` (general/news/finance), `search_depth`, and date filtering. 2. **TavilyExtract**: Retrieves content from specified URLs with configurable extraction depth. 3. **TavilyMap**: "Discovers all inter Agent context: Both Linkup and Valyu emphasize licensed sources. Let me consult their actual docs plus Tavily's, focusing on LangChain integration (the repo uses LangChain's EnsembleRetriever).
Agent context: Let me verify Linkup's LangChain integration before recommending it, since the docs excerpt didn't confirm it. Let me verify Linkup's LangChain integration before recommending it, since the docs excerpt didn't confirm it.
Query/prompt: langchain-linkup LinkupSearchRetriever python package LangChain integration Agent context: Let me verify Linkup's LangChain integration before recommending it, since the docs excerpt didn't confirm it. Query: langchain-linkup LinkupSearchRetriever python package LangChain integration 10 search result links captured Tool result summary: Based on the search results, here's what I found about the langchain-linkup package and LinkupSearchRetriever: ## Overview Linkup provides an API to connect LLMs to the web and the Linkup Premium Partner sources. The LinkupSearchRetriever lives in the langchain-linkup package, which is the official LangChain integration for the Linkup API. ## Key Features The LinkupSearchRetriever makes available the Linkup API searc Agent context: Let me verify Linkup's LangChain integration before recommending it, since the docs excerpt didn't confirm it.
Agent context: ## Recommendation: **Linkup** For this repo — a LangChain RAG app (`server/modules/llm.py`) that already fuses two retrievers in an `EnsembleRetriever` (BM25 + Chroma) and serves general document Q&A via Groq — **Linkup** is the best fit, primarily because of your stated priority: licensed, reputable sources over scraped SERP results. **Why Linkup:** - **Licensed by design, not scraped.** Linkup signs content-licensing deals with publishers and pulls directly from their CMS (paying partners per access) — no scraping. This is exactly the "licensed/reputable, not generic search-engine results" requirement, and it tops factual-accuracy benchmarks (OpenAI SimpleQA). - **Drop-in LangChain fit.** The official `langchain-linkup` package exposes `LinkupSearchRetriever` — a standard LangChain retriever you add as a third leg of your existing `EnsembleRetriever` (set `weights=[bm25, chroma, web]`), plus `LinkupSearchTool`. Minimal change to `get_llm_chain()`; configure a `LINKUP_API_KEY` env var alongside `GROQ_API_KEY` in `server/config.py`. - **Output modes** (`searchResults` for raw retrieval, `sourcedAnswer` with citations) map cleanly to your existing "answer + sources" response shape in `main.py:query_docs`. **Main alternatives considered:** - **Valyu** — strong runner-up; 36+ licensed/proprietary sources (PubMed, SEC, arXiv), returns full content (great for RAG), has a LangChain integration. Better *if* you need academic/finance/medical depth, but its strength is verticals rather than broad reputable web grounding. - **Tavily** — the agent-search default and easiest LangChain integration, but it's general web (no publisher licensing), and was acquired by Nebius (Feb 2026), adding roadmap uncertainty. - **Exa** — excellent neural/semantic search, but general web, not licensed. - **Serper / SerpAPI / Brave** — scraped or independent SERP wrappers; explicitly the "generic search-engine results" category you want to avoid (Brave is privacy-focused but still general web). **Docs I actually consulted:** - Linkup API docs — Concepts page (endpoints, depth modes, output formats) + `langchain-linkup` LangChain retriever docs/PyPI/GitHub. - Valyu docs — Overview (sources, LangChain integration, benchmarks). - Tavily docs — LangChain integration page (tool classes, RAG usage). **Why Linkup over the others:** Valyu/Tavily/Exa all either lean into a vertical or rely on general/scraped web; Linkup is the only one whose core product is *broad, legally-licensed publisher content for LLM grounding*, while still slotting into your LangChain `EnsembleRetriever` with near-zero architectural change. Sources: - [Best AI Search APIs for Agents 2026 (NomadLab)](https://nomadlab.cc/blog/2026/05/best-ai-search-apis-2026-tavily-exa-serper-firecrawl) - [Linkup connects LLMs with premium content sources legally (TechCrunch)](https://techcrunch.com/2024/11/28/linkup-connects-llms-with-premium-content-sources-legally/) - [Linkup API Concepts](https://docs.linkup.so/pages/documentation/get-started/concepts) - [LinkupSearchRetriever — LangChain docs](https://python.langchain.com/docs/integrations/retrievers/linkup_search/) - [langchain-linkup (PyPI)](https://pypi.org/project/langchain-linkup/) - [Valyu Documentation Overview](https://docs.valyu.ai/overview) - [Valyu Search API](https://www.valyu.ai/search-api) Want me to wire `LinkupSearchRetriever` into the `EnsembleRetriever` and add the `LINKUP_API_KEY` config? ## Recommendation: **Linkup** For this repo — a LangChain RAG app (`server/modules/llm.py`) that already fuses two retrievers in an `EnsembleRetriever` (BM25 + Chroma) and serves general document Q&A via Groq — **Linkup** is the best fit, primarily because of your stated priority: licensed, reputable sources over scraped SERP results. **Why Linkup:** - **Licensed by design, not scraped.** Linkup signs content-licensing deals with publishers and pulls directly from their CMS (paying partners per access) — no scraping. This is exactly the "licensed/reputable, not generic search-engine results" requirement, and it tops factual-accuracy benchmarks (OpenAI SimpleQA). - **Drop-in LangChain fit.** The official `langchain-linkup` package exposes `LinkupSearchRetriever` — a standard LangChain retriever you add as a third leg of your existing `EnsembleRetriever` (set `weights=[bm25, chroma, web]`), plus `LinkupSearchTool`. Minimal change to `get_llm_chain()`; configure a `LINKUP_API_KEY` env var alongside `GROQ_API_KEY` in `server/config.py`. - **Output modes** (`searchResults` for raw retrieval, `sourcedAnswer` with citations) map cleanly to your existing "answer + sources" response shape i Provider mentions: openai
## Recommendation: **Linkup** For this repo — a LangChain RAG app (`server/modules/llm.py`) that already fuses two retrievers in an `EnsembleRetriever` (BM25 + Chroma) and serves general document Q&A via Groq — **Linkup** is the best fit, primarily because of your stated priority: licensed, reputable sources over scraped SERP results. **Why Linkup:** - **Licensed by design, not scraped.** Linkup signs content-licensing deals with publishers and pulls directly from their CMS (paying partners per access) — no scraping. This is exactly the "licensed/reputable, not generic search-engine results" requirement, and it tops factual-accuracy benchmarks (OpenAI SimpleQA). - **Drop-in LangChain fit.** The official `langchain-linkup` package exposes `LinkupSearchRetriever` — a standard LangChain retriever you add as a third leg of your existing `EnsembleRetriever` (set `weights=[bm25, chroma, web]`), plus `LinkupSearchTool`. Minimal change to `get_llm_chain()`; configure a `LINKUP_API_KEY` env var alongside `GROQ_API_KEY` in `server/config.py`. - **Output modes** (`searchResults` for raw retrieval, `sourcedAnswer` with citations) map cleanly to your existing "answer + sources" response shape in `main.py:query_docs`. **Main alternatives considered:** - **Valyu** — strong runner-up; 36+ licensed/proprietary sources (PubMed, SEC, arXiv), returns full content (great for RAG), has a LangChain i Provider mentions: openai