Find
Search dashboard pages and product views.
Search dashboard pages and product views.
Balaji-R-05/askdocs-ai
Add live web search/retrieval to this repository so its agent can ground its answers in current web information. Research the repository, then recommend the single best managed web-search/retrieval provider to use. The agent needs access to high-quality, reputable, and licensed sources rather than generic scraped search-engine results, so weigh source coverage and quality heavily. Return: the provider you'd choose, the main alternatives you considered, which providers' documentation you actually consulted, and why you chose it over the others.
P3
26bc632a0bed22f035740edd34ee3e1fc2420e9f
Agent context: I’ll inspect the repo first to understand the agent stack and integration constraints, then I’ll verify provider docs and current offerings before making a recommendation. I’ll inspect the repo first to understand the agent stack and integration constraints, then I’ll verify provider docs and current offerings before making a recommendation.
Agent context: I’ll inspect the repo first to understand the agent stack and integration constraints, then I’ll verify provider docs and current offerings before making a recommendation. Provider mentions: openai
Agent context: I’ll inspect the repo first to understand the agent stack and integration constraints, then I’ll verify provider docs and current offerings before making a recommendation. exit code 127 Output: /bin/bash: line 1: rg: command not found Provider mentions: openai
Agent context: I’ll inspect the repo first to understand the agent stack and integration constraints, then I’ll verify provider docs and current offerings before making a recommendation.
Agent context: I’ll inspect the repo first to understand the agent stack and integration constraints, then I’ll verify provider docs and current offerings before making a recommendation. exit code 0 Output: /home/user/worktrees/attempt-3e7e0617-5c47-4443-b573-4590bb2056d1-cap-2 /bin/bash: line 1: rg: command not found
Agent context: I’ll inspect the repo first to understand the agent stack and integration constraints, then I’ll verify provider docs and current offerings before making a recommendation.
Agent context: I’ll inspect the repo first to understand the agent stack and integration constraints, then I’ll verify provider docs and current offerings before making a recommendation. exit code 0 Output: ./README.md ./requirements.client.txt ./requirements.server.txt
Agent context: `rg` is not installed in this environment, so I’m falling back to `find`/`grep`. I’ve only confirmed this is a small Python-looking repo so far; next I’m reading the top-level docs and source layout. `rg` is not installed in this environment, so I’m falling back to `find`/`grep`. I’ve only confirmed this is a small Python-looking repo so far; next I’m reading the top-level docs and source layout.
Agent context: `rg` is not installed in this environment, so I’m falling back to `find`/`grep`. I’ve only confirmed this is a small Python-looking repo so far; next I’m reading the top-level docs and source layout.
Agent context: `rg` is not installed in this environment, so I’m falling back to `find`/`grep`. I’ve only confirmed this is a small Python-looking repo so far; next I’m reading the top-level docs and source layout. exit code 0 Output: .dockerignore .git .gitignore Dockerfile.client Dockerfile.server README.md client/__init__.py client/api_client.py client/auth_utils.py client/config.py client/main.py compose.yaml images/chat_interface.png images/db_indexed.png images/landing_page.png requirements.client.txt requirements.server.txt server/__init__.py server/config.py server/db/database.py server/db/models.py server/main.py server/modules/__init__.py server/modules/auth.py server/modules/llm.py server/modules/vectorstore.py server/schemas/auth.py server/tests/test_hybrid_search.py
Agent context: `rg` is not installed in this environment, so I’m falling back to `find`/`grep`. I’ve only confirmed this is a small Python-looking repo so far; next I’m reading the top-level docs and source layout.
Agent context: `rg` is not installed in this environment, so I’m falling back to `find`/`grep`. I’ve only confirmed this is a small Python-looking repo so far; next I’m reading the top-level docs and source layout. exit code 0 Output: # AskDocs AI: AI-Powered PDF Q&A Bot **AskDocs AI** is an AI-powered chatbot that leverages **Hybrid RAG (Retrieval-Augmented Generation)** to answer your questions based on the content of uploaded PDFs. It combines semantic vector search with traditional keyword-based search for superior accuracy. <img src="./images/landing_page.png" alt="Landing Page"> <img src="./images/chat_interface.png" alt="Chat Interface"> ## Key Features - **Hybrid Search**: Combines **ChromaDB** (semantic) and **BM25** (keyword) retrieval. - **LLM Powered**: High-performance LLM via Groq Cloud. - **Async Processing**: PDF ingestion and indexing are offloaded to background threads. - **Multimodal Support**: Optimized for PDF extraction and processing. ## Tech Stack - **Backend:** FastAPI, LangChain (Classic), ChromaDB, Groq Cloud - **Frontend:** Streamlit - **Search Engines:** BM25 (Keyword), Vector (Cosine Simi
Agent context: `rg` is not installed in this environment, so I’m falling back to `find`/`grep`. I’ve only confirmed this is a small Python-looking repo so far; next I’m reading the top-level docs and source layout. Provider mentions: openai
Agent context: `rg` is not installed in this environment, so I’m falling back to `find`/`grep`. I’ve only confirmed this is a small Python-looking repo so far; next I’m reading the top-level docs and source layout. exit code 0 Output: ./README.md:3:**AskDocs AI** is an AI-powered chatbot that leverages **Hybrid RAG (Retrieval-Augmented Generation)** to answer your questions based on the content of uploaded PDFs. It combines semantic vector search with traditional keyword-based search for superior accuracy. ./README.md:10:- **Hybrid Search**: Combines **ChromaDB** (semantic) and **BM25** (keyword) retrieval. ./README.md:29:| `HYBRID_SEARCH_BM25_WEIGHT` | Weight for keyword search (0.0 to 1.0) | `0.5` | ./README.md:30:| `HYBRID_SEARCH_CHROMA_WEIGHT` | Weight for semantic search (0.0 to 1.0) | `0.5` | ./README.md:37:- **Persistent Memory**: Uses Docker volumes to persist the ChromaDB vector store and uploaded files. ./README.md:84:python server/tests/test_hybrid_search.py ./client/main.py:63: <strong>AskDocs AI</strong> is an AI-powered chatbot that leverages <strong>RAG (Retrieval-Augmented Generation)</strong> to answe Provider mentions: openai
Agent context: `rg` is not installed in this environment, so I’m falling back to `find`/`grep`. I’ve only confirmed this is a small Python-looking repo so far; next I’m reading the top-level docs and source layout.
Agent context: `rg` is not installed in this environment, so I’m falling back to `find`/`grep`. I’ve only confirmed this is a small Python-looking repo so far; next I’m reading the top-level docs and source layout. exit code 0 Output: --extra-index-url https://download.pytorch.org/whl/cpu torch langchain-chroma langchain-community langchain-text-splitters langchain-huggingface langchain-groq langchain-classic fastapi uvicorn[standard] python-multipart python-dotenv sentence-transformers pypdf pydantic-settings rank_bm25 sqlalchemy[asyncio] asyncpg fastapi-users[sqlalchemy] bcryptstreamlit requests python-dotenv
Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. exit code 0 Output: from langchain_groq import ChatGroq from langchain_classic.chains import RetrievalQA from langchain_classic.retrievers import EnsembleRetriever from config import settings from modules.vectorstore import get_bm25_retriever import logging logger = logging.getLogger(__name__) def get_llm_chain(vectorstore): if vectorstore is None: return None if not settings.GROQ_API_KEY: raise ValueError("GROQ_API_KEY is not set in environment or config.") llm = ChatGroq( groq_api_key=settings.GROQ_API_KEY, model_name=settings.LLM_MODEL, temperature=0.3, ) # 1. Semantic Retriever (Vector) chroma_retriever = vectorstore.as_retriever(search_kwargs={"k": 3}) # 2. Keyword Retriever (BM25) bm25_retriever = get_bm25_retriever(vectorstore) # 3. Ensemble Retriever (Hybrid) if bm25_retriever: logger.info(f"[HYBRID] Initializing Hybrid Search (BM25: {settings.HYBRID_SEARCH_BM25_WEIGHT}, Chroma: {settings.HYBRID_SEA
Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. exit code 0 Output: import os import shutil import logging from typing import List from contextlib import asynccontextmanager from fastapi import FastAPI, UploadFile, File, HTTPException, Depends from fastapi.middleware.cors import CORSMiddleware from fastapi.concurrency import run_in_threadpool from pydantic import BaseModel from config import settings from modules.vectorstore import process_pdfs, get_vectorstore from modules.llm import get_llm_chain from db.database import engine from db.models import Base, User from modules.auth import fastapi_users, auth_backend, current_active_user from schemas.auth import UserRead, UserCreate, UserUpdate logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__) @asynccontextmanager async def lifespan(app: FastAPI): """Lifecycle events for the FastAPI application.""" logger.info("🛠️ Star
Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. exit code 0 Output: import os import hashlib import logging from concurrent.futures import ThreadPoolExecutor from langchain_chroma import Chroma from langchain_community.document_loaders import PyPDFLoader from langchain_community.retrievers import BM25Retriever from langchain_core.documents import Document from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_huggingface import HuggingFaceEmbeddings from config import settings logger = logging.getLogger(__name__) def file_hash(file_path: str) -> str: """Generate a unique hash for file contents (duplicate detection).""" with open(file_path, "rb") as f: return hashlib.md5(f.read()).hexdigest() def load_single_pdf(path): """Worker function to load a single PDF.""" try: logger.info(f"📄 Starting to parse: {os.path.basename(path)}") loader = PyPDFLoader(path) file_docs = loader.load() if not file_docs: logger.warning(f"⚠️ No text e
Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. exit code 0 Output: import os from pathlib import Path from pydantic_settings import BaseSettings, SettingsConfigDict from dotenv import load_dotenv, find_dotenv load_dotenv(find_dotenv(), override=True) BASE_DIR = Path(__file__).resolve().parent.parent class Settings(BaseSettings): PROJECT_NAME: str = "AskDocs AI API" # Storage CHROMA_DB_PATH: str = os.getenv("CHROMA_DB_PATH", "chroma_db") TEMP_UPLOADS_DIR: str = os.getenv("TEMP_UPLOADS_DIR", "temp_uploads") # Database & Security DATABASE_URL: str = os.getenv("DATABASE_URL", "postgresql+asyncpg://postgres:postgres@db:5432/askdocs") SECRET_KEY: str = os.getenv("SECRET_KEY", "CHANGEME_SECRET_KEY_12345") # LLM GROQ_API_KEY: str | None = os.getenv("GROQ_API_KEY", "") LLM_MODEL: str = os.getenv("LLM_MODEL", "openai/gpt-oss-120b") # Throughput Settings MAX_WORKERS: int = os.cpu_count() or 4 # Hybrid Search Weights HYBRID_SEARCH_BM25_WEIGHT: float = float(os.gete Provider mentions: openai
Query/prompt: Linkup search API licensed sources documentation retrieval API AI trusted sources Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Query: Linkup search API licensed sources documentation retrieval API AI trusted sources Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Expanded queries: Linkup search API licensed sources documentation retrieval API AI trusted sources | Tavily API documentation search AI answers source coverage | Exa API documentation search contents web search neural sources | Brave Search API documentation data rights independent web search API
Query/prompt: https://docs.linkup.so/ Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Query: https://docs.linkup.so/ Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Query/prompt: https://docs.linkup.so/ Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Query: https://docs.linkup.so/ Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Query/prompt: Linkup premium sources licensed content partners trusted sources AI search API Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Query: Linkup premium sources licensed content partners trusted sources AI search API Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Expanded queries: Linkup premium sources licensed content partners trusted sources AI search API | Linkup API licensed sources premium content trusted sources documentation | Linkup announces premium content partners licensed sources AI search
Query/prompt: OpenAI API web search tool documentation sources citations search context size Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Query: OpenAI API web search tool documentation sources citations search context size Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Expanded queries: OpenAI API web search tool documentation sources citations search context size | Perplexity Sonar API documentation search citations web search | Google Vertex AI grounding with Google Search documentation citations Provider mentions: openai
Query/prompt: https://docs.perplexity.ai/docs/sonar/quickstart Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Query: https://docs.perplexity.ai/docs/sonar/quickstart Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Query/prompt: https://docs.linkup.so/pages/documentation/endpoints/search/overview Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Query: https://docs.linkup.so/pages/documentation/endpoints/search/overview Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Query/prompt: https://docs.linkup.so/pages/sdk/js/js Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Query: https://docs.linkup.so/pages/sdk/js/js Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Query/prompt: https://docs.linkup.so/pages/integrations/langchain Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase. Query: https://docs.linkup.so/pages/integrations/langchain Agent context: The repo is a FastAPI + Streamlit PDF RAG app using LangChain Classic, Chroma, BM25, and ChatGroq. I’m checking the query chain shape now so the provider choice accounts for how web retrieval would actually plug into this codebase.
Agent context: The important repo constraint is that the current answer path is a LangChain `RetrievalQA` chain over PDF chunks, not a general tool-calling agent. A good web provider here should work as a retriever/context source that can be composed with the existing Chroma/BM25 retrieval and ChatGroq answer generation. The important repo constraint is that the current answer path is a LangChain `RetrievalQA` chain over PDF chunks, not a general tool-calling agent. A good web provider here should work as a retriever/context source that can be composed with the existing Chroma/BM25 retrieval and ChatGroq answer generation.
Agent context: The important repo constraint is that the current answer path is a LangChain `RetrievalQA` chain over PDF chunks, not a general tool-calling agent. A good web provider here should work as a retriever/context source that can be composed with the existing Chroma/BM25 retrieval and ChatGroq answer generation.
Agent context: The important repo constraint is that the current answer path is a LangChain `RetrievalQA` chain over PDF chunks, not a general tool-calling agent. A good web provider here should work as a retriever/context source that can be composed with the existing Chroma/BM25 retrieval and ChatGroq answer generation. exit code 0 Output: 109 @app.post("/query", response_model=QueryResponse) 110 async def query_docs( 111 request: QueryRequest, 112 user: User = Depends(current_active_user) 113 ): 114 """Query the vectorstore using an LLM.""" 115 if not settings.GROQ_API_KEY: 116 raise HTTPException(status_code=500, detail="GROQ_API_KEY is missing. Check server logs.") 117 118 vectorstore = get_vectorstore() 119 if not vectorstore: 120 raise HTTPException(status_code=400, detail="No documents indexed. Please upload PDFs first.") 121 122 try: 123 chain = get_llm_chain(vectorstore) 124 if not chain: 125 raise HTTPException(status_code=500, detail="Failed to initialize LLM chain.") 126 127 # RetrievalQA invoke can also be heavy, offloading to threadpool 128 result = await run_in_threadpool(chain.invoke, {"query": request.query}) 129 130 answer = result["result"] 131 sources = [doc.metadata.get("source", "Unknown") for doc in r
Agent context: The important repo constraint is that the current answer path is a LangChain `RetrievalQA` chain over PDF chunks, not a general tool-calling agent. A good web provider here should work as a retriever/context source that can be composed with the existing Chroma/BM25 retrieval and ChatGroq answer generation.
Agent context: The important repo constraint is that the current answer path is a LangChain `RetrievalQA` chain over PDF chunks, not a general tool-calling agent. A good web provider here should work as a retriever/context source that can be composed with the existing Chroma/BM25 retrieval and ChatGroq answer generation. exit code 0 Output: 1 from langchain_groq import ChatGroq 2 from langchain_classic.chains import RetrievalQA 3 from langchain_classic.retrievers import EnsembleRetriever 4 from config import settings 5 from modules.vectorstore import get_bm25_retriever 6 import logging 7 8 logger = logging.getLogger(__name__) 9 10 def get_llm_chain(vectorstore): 11 if vectorstore is None: 12 return None 13 14 if not settings.GROQ_API_KEY: 15 raise ValueError("GROQ_API_KEY is not set in environment or config.") 16 17 llm = ChatGroq( 18 groq_api_key=settings.GROQ_API_KEY, 19 model_name=settings.LLM_MODEL, 20 temperature=0.3, 21 ) 22 23 # 1. Semantic Retriever (Vector) 24 chroma_retriever = vectorstore.as_retriever(search_kwargs={"k": 3}) 25 26 # 2. Keyword Retriever (BM25) 27 bm25_retriever = get_bm25_retriever(vectorstore) 28 29 # 3. Ensemble Retriever (Hybrid) 30 if bm25_retriever: 31 logger.info(f"[HYBRID] Initializing Hyb
Agent context: The important repo constraint is that the current answer path is a LangChain `RetrievalQA` chain over PDF chunks, not a general tool-calling agent. A good web provider here should work as a retriever/context source that can be composed with the existing Chroma/BM25 retrieval and ChatGroq answer generation.
Agent context: The important repo constraint is that the current answer path is a LangChain `RetrievalQA` chain over PDF chunks, not a general tool-calling agent. A good web provider here should work as a retriever/context source that can be composed with the existing Chroma/BM25 retrieval and ChatGroq answer generation. exit code 0 Output: 1 --extra-index-url https://download.pytorch.org/whl/cpu 2 torch 3 langchain-chroma 4 langchain-community 5 langchain-text-splitters 6 langchain-huggingface 7 langchain-groq 8 langchain-classic 9 fastapi 10 uvicorn[standard] 11 python-multipart 12 python-dotenv 13 sentence-transformers 14 pypdf 15 pydantic-settings 16 rank_bm25 17 sqlalchemy[asyncio] 18 asyncpg 19 fastapi-users[sqlalchemy] 20 bcrypt
Query/prompt: https://techcrunch.com/2024/11/28/linkup-connects-llms-with-premium-content-sources-legally/ Agent context: The important repo constraint is that the current answer path is a LangChain `RetrievalQA` chain over PDF chunks, not a general tool-calling agent. A good web provider here should work as a retriever/context source that can be composed with the existing Chroma/BM25 retrieval and ChatGroq answer generation. Query: https://techcrunch.com/2024/11/28/linkup-connects-llms-with-premium-content-sources-legally/ Agent context: The important repo constraint is that the current answer path is a LangChain `RetrievalQA` chain over PDF chunks, not a general tool-calling agent. A good web provider here should work as a retriever/context source that can be composed with the existing Chroma/BM25 retrieval and ChatGroq answer generation.
Agent context: **Recommendation** Choose **Linkup**. This repo’s backend is already a LangChain RAG app: `server/modules/llm.py` builds a `RetrievalQA` chain over Chroma + BM25 with ChatGroq, and `/query` returns source documents from that chain. Linkup fits that shape best because it can be used as a LangChain retriever, has a Python SDK, supports search, fetch, sourced answers, structured output, source/date filtering, and can return raw retrieval context instead of forcing you into a replacement answer model. See [server/modules/llm.py](/home/user/worktrees/attempt-3e7e0617-5c47-4443-b573-4590bb2056d1-cap-2/server/modules/llm.py:17) and [server/main.py](/home/user/worktrees/attempt-3e7e0617-5c47-4443-b573-4590bb2056d1-cap-2/server/main.py:109). The decisive factor is source quality/licensing. Linkup’s docs emphasize production AI search, source customization, security, and endpoints for Search/Fetch/Research. Its Search endpoint returns ranked sources, sourced answers, or structured output, with domain/date controls. Its Fetch endpoint returns clean LLM-ready markdown. Its LangChain docs show `LinkupSearchRetriever`, which is directly compatible with this repo’s stack. ([techcrunch.com](https://techcrunch.com/2024/11/28/linkup-connects-llms-with-premium-content-sources-legally/)) ([docs.linkup.so](https://docs.linkup.so/pages/integrations/langchain)) I also found the strongest explicit licensed-source claim for Linkup outside its docs: TechCrunch reports that Linkup connects developers to premium/trusted content, signs licensing deals with publishers, avoids scraping for those partners, and pays content partners based on access. ([techcrunch.com](https://techcrunch.com/2024/11/28/linkup-connects-llms-with-premium-content-sources-legally/)) **Main Alternatives Considered** - **Tavily**: Strong agent-oriented search/extract/research API, fast, production-focused, and Groq/LangChain-friendly. I would pick it if the priority were developer ergonomics and broad web retrieval, but its docs read more like high-quality live web retrieval than a licensed/premium-source network. - **Exa**: Excellent semantic/neural search and useful content categories like company, people, research papers, news, and financial reports. Good for discovery, less compelling than Linkup on licensed/premium-source positioning. - **Brave Search API**: Good independent web index and not a Google/Bing scraper, but its docs explicitly say the API does not grant rights to third-party webpage content. That is a poor fit for the user’s licensed-source requirement. - **Perplexity Sonar**: Web-grounded answers with citations and OpenAI-compatible usage, but it is more of a search-answer model than a clean retrieval provider to compose with the existing Groq/LangChain RAG chain. - **Google Grounding / OpenAI Web Search**: Strong built-in grounding options, but they tie retrieval to those providers’ model APIs rather than acting as a neutral retrieval layer for this Groq-based app. Google’s grounding docs describe public web data and display/terms requirements; OpenAI’s tool is integrated into Responses/Chat Completions with citation annotations. **Docs Actually Consulted** Linkup official docs, including Search, Fetch, Source Filtering, Python SDK, Pricing, and LangChain integration; Tavily official site/API docs; Exa official docs; Brave Search API docs; Perplexity Sonar docs; Google Cloud Grounding with Google Search docs; OpenAI Web Search docs. I would integrate Linkup first as a separate web retriever alongside the existing PDF retriever, then return both PDF and web citations in the `/query` response. For this repo that likely means adding `linkup-sdk` or `langchain-linkup`, a `LINKUP_API_KEY`, and a web-retrieval branch in `get_llm_chain()` rather than replacing the current PDF RAG pipeline. **Recommendation** Choose **Linkup**. This repo’s backend is already a LangChain RAG app: `server/modules/llm.py` builds a `RetrievalQA` chain over Chroma + BM25 with ChatGroq, and `/query` returns source documents from that chain. Linkup fits that shape best because it can be used as a LangChain retriever, has a Python SDK, supports search, fetch, sourced answers, structured output, source/date filtering, and can return raw retrieval context instead of forcing you into a replacement answer model. See [server/modules/llm.py](/home/user/worktrees/attempt-3e7e0617-5c47-4443-b573-4590bb2056d1-cap-2/server/modules/llm.py:17) and [server/main.py](/home/user/worktrees/attempt-3e7e0617-5c47-4443-b573-4590bb2056d1-cap-2/server/main.py:109). The decisive factor is source quality/licensing. Linkup’s docs emphasize production AI search, source customization, security, and endpoints for Search/Fetch/Research. Its Search endpoint returns ranked sources, sourced answers, or structured output, with domain/date controls. Its Fetch endpoint returns clean LLM-ready markdown. Its LangChain docs show `LinkupSearchRetriever`, which is directly compatible with this repo’s stack. ([techcrunch.com](https: Provider mentions: openai
**Recommendation** Choose **Linkup**. This repo’s backend is already a LangChain RAG app: `server/modules/llm.py` builds a `RetrievalQA` chain over Chroma + BM25 with ChatGroq, and `/query` returns source documents from that chain. Linkup fits that shape best because it can be used as a LangChain retriever, has a Python SDK, supports search, fetch, sourced answers, structured output, source/date filtering, and can return raw retrieval context instead of forcing you into a replacement answer model. See [server/modules/llm.py](/home/user/worktrees/attempt-3e7e0617-5c47-4443-b573-4590bb2056d1-cap-2/server/modules/llm.py:17) and [server/main.py](/home/user/worktrees/attempt-3e7e0617-5c47-4443-b573-4590bb2056d1-cap-2/server/main.py:109). The decisive factor is source quality/licensing. Linkup’s docs emphasize production AI search, source customization, security, and endpoints for Search/Fetch/Research. Its Search endpoint returns ranked sources, sourced answers, or structured output, with domain/date controls. Its Fetch endpoint returns clean LLM-ready markdown. Its LangChain docs show `LinkupSearchRetriever`, which is directly compatible with this repo’s stack. ([techcrunch.com](https://techcrunch.com/2024/11/28/linkup-connects-llms-with-premium-content-sources-legally/)) ([docs.linkup.so](https://docs.linkup.so/pages/integrations/langchain)) I also found the strongest explicit lic Provider mentions: openai