Overview
Markets move on narratives, not isolated headlines. Araverus is a story-first financial intelligence platform that threads related news articles across days, surfaces narrative velocity, and maps market impact — so you understand in minutes what’s actually driving markets.
Built as a solo full-stack project, Araverus combines a Next.js 15 frontend with a Python data pipeline (~10,000 lines) that runs autonomously every day. The platform ingests WSJ headlines, discovers free alternative sources via Google News, crawls and analyzes content with LLMs, groups articles into narrative threads, and generates bilingual audio briefings.
The Problem
Financial news is fragmented. A single developing story — like a Fed rate decision — spawns dozens of articles across publications, each covering a different angle. Traditional news aggregators show these as isolated items, leaving readers to stitch the narrative together themselves. Existing tools either require expensive terminal subscriptions or lack narrative context entirely.
Architecture
The system is split into two main layers: a Next.js web application and a Python data pipeline, connected through Supabase (Postgres + Auth + Storage).
Data Pipeline (Python)
The pipeline runs daily at 6 AM ET via launchd on a Mac Mini. It executes 9 scripts in sequence across 7 phases:
Ingest → Search → Rank → Crawl → Embed & Thread → Brief → Notify
WSJ RSS feeds are ingested and preprocessed with Gemini Flash-Lite. Articles are exported as JSONL with search queries, then matched against Google News to discover free alternative sources. Candidates are ranked using bge-base-en-v1.5 embeddings (cosine similarity), resolved to final URLs, and crawled. A two-stage LLM gate filters relevance: Flash-Lite for quick triage (~60 articles/day pass), then Flash for detailed analysis producing headlines, summaries, key takeaways, and keyword extraction.
Story Threading (LLM Judge)
The threading system is the core differentiator. Each article is converted into a 768-dimensional vector. An LLM Judge (Gemini 2.5 Flash) decides whether each article belongs to an existing thread or starts a new one. This replaced an earlier heuristic system that used 20+ constants — the LLM Judge uses just 7 and understands narrative context far better, drastically reducing thread contamination.
Threads have a heat-based lifecycle (Active → Cooling → Archived → Resurrected) and are analyzed for narrative velocity (accelerating, decelerating, stable, new), market impacts (sectors, tickers, commodities with direction and confidence), and causal relationships between threads.
Frontend
The web app is a WSJ-inspired editorial design built with Next.js 15 (App Router), React 19, and Tailwind CSS 4. Key pages include a newspaper-style home page with market data widgets, a 3-column headlines view with category filtering, a threads view showing narrative evolution, and a bilingual audio briefing player with sentence-level alignment. The markets section features a D3.js sector heatmap, FRED macro indicators, and a CNN Fear & Greed Index dashboard.
Audio Briefings
Daily AI-generated briefings are produced in both English and Korean. English uses Google Chirp 3 HD for neural TTS; Korean uses Gemini TTS. Each briefing includes chapter markers and sentence-level transcript alignment, allowing users to read along or jump to specific sections.
Key Features
Story Threading — AI clusters articles into narrative threads and tracks evolution across days with heat-based lifecycle management.
Market Impact Mapping — Each thread maps exposure to sectors, tickers, and macro factors with directional signals and confidence scores.
Bilingual Audio Briefings — Daily EN/KO neural TTS briefings with chapter navigation and sentence-level transcript alignment.
Market Data Dashboard — S&P 500 sector heatmap (D3.js), FRED macro indicators, Fear & Greed Index, stock performance charts (lightweight-charts v5).
Autonomous Pipeline — Fully automated daily execution with health monitoring, search engine notification (IndexNow), and ISR cache revalidation. Runs at ~$11/month operational cost.
Tech Stack
Frontend: Next.js 15, React 19, TypeScript, Tailwind CSS 4, Framer Motion, D3.js, lightweight-charts v5
Backend / Pipeline: Python (~10k LOC), 9 scripts across 7 phases, bge-base-en-v1.5 embeddings
AI / LLM: Gemini 2.5 Pro & Flash (briefing + analysis), GPT-4o-mini (analysis), Google Chirp 3 HD (EN TTS), Gemini TTS (KO)
Infrastructure: Supabase (Postgres + Auth + Storage), Vercel (web), Mac Mini (pipeline cron via launchd), GitHub Actions
