Powered by Exa.ai - The Search Engine for AI Applications
Websets News Monitor is an open-source demo application that monitors the web for articles about specific topics using Exa's Websets API. It creates curated news feeds for each webset with webhooks for daily updates and intelligent deduplication.
For more details on how the key functionality works, see the documentation.
- Frontend: Next.js, TypeScript, Tailwind CSS
- Backend: Next.js API Routes, Prisma ORM
- Database: PostgreSQL with pgvector extension
- APIs: Exa Websets API, OpenAI API
- Hosting: Vercel (frontend), Neon (database)
- Node.js 18+
- PostgreSQL database URL with pgvector (e.g., from Neon)
- Exa API key (required)
- OpenAI API key (required)
- Clone the repository:
git clone https://github.com/exa-labs/websets-news-monitor.git
cd websets-news-monitor
- Install dependencies:
npm install
- Copy
.env.example
and fill in the environment variables:
cp .env.example .env.local
- Set up the database:
npx prisma generate
npx prisma db push
- Create websets and webhooks:
npm run setup:websets
npm run setup:webhook
- Start the development server:
npm run dev
Visit http://localhost:3000/websets-news-monitor to see your news monitor!
Customize the news monitors by editing scripts/setup-websets.js
:
const websets = [
{
name: "Startup Funding",
query: "Startups that raised a funding round in the last week",
criteria: [
{
description: "Article is about a startup raising a funding round of at least $1M",
},
{
description: "Article published in a top 20 tech publication (TechCrunch, The Verge, Wired, etc.)",
},
{
description: "Article was published in the last week",
}
]
},
// Additional websets like "Futurism", "Memes", "Hot Takes", "Product Launches", "Positive"...
];
After modifying, run the setup scripts again to apply your changes.
To reset your websets in the Exa API and database:
npm run delete:websets
The application uses an intelligent deduplication system to prevent showing duplicate or highly similar news stories. The deduplication logic is implemented in src/lib/dedupe.ts
and works as follows:
-
Embedding Generation: When a new article arrives via webhook (
src/app/api/webhook/route.ts
), the article title is converted to an embedding vector using OpenAI's embedding model. -
Vector Similarity Search: Using PostgreSQL's pgvector extension, the system performs a similarity search to find articles from the past 7 days with similar embeddings within the same webset.
-
Semantic Analysis: The top 10 similar articles are then analyzed by an LLM (GPT-4.1-mini) to determine if the new article is truly a duplicate, considering:
- Same event or topic coverage
- Different wording but same story
- User experience in a news aggregator
-
Duplicate Prevention: If determined to be a duplicate, the article is skipped and not inserted into the database, keeping your news feeds clean and focused.
This two-stage approach (vector similarity + LLM verification) ensures accurate deduplication while maintaining high performance.
⭐ About Exa.ai
This project is powered by Exa.ai, a powerful search engine and web search API designed specifically for AI applications. Exa provides:
- Advanced semantic search capabilities
- Clean web content extraction
- Real-time data retrieval
- Comprehensive web search functionality
- Superior search accuracy for AI applications
Built with ❤️ by team Exa