🔍 WSearch

Real-time Wikipedia Search Engine with BM25 Ranking

WSearch is a fast, elegant search engine that queries Wikipedia in real-time and ranks results using the Okapi BM25 algorithm — the same ranking model used by Elasticsearch and Solr.

→ Try it live at wsearch.onrender.com

✨ Features

Real-time Wikipedia crawling — fetches fresh articles on every search, no stale index
Okapi BM25 ranking — industry-standard relevance scoring (k1=1.5, b=0.75) with title boosting (3×) and category boosting (2×)
Autocomplete — Wikipedia OpenSearch API suggestions with keyboard navigation
In-memory caching — previously fetched articles are cached to speed up repeat searches
Dark luxury UI — Vyntr-inspired design with Playfair Display serif, gold accents, frosted glass search bar
SPA navigation — URL updates on search, browser back/forward works correctly
Zero dependencies on the frontend — pure vanilla JS, no React or Vue

🏗️ Architecture

┌─────────────────────────────────────────────────────┐
│                     Browser                         │
│         frontend/index.html + main.js               │
└──────────────────┬──────────────────────────────────┘
                   │ HTTP
┌──────────────────▼──────────────────────────────────┐
│              Express Server (Node.js)                │
│                 server/server.js                     │
│                                                     │
│  ┌─────────────┐   ┌──────────────┐   ┌──────────┐  │
│  │  BM25 Index │   │   Crawler    │   │ Tokenizer│  │
│  │  (in-memory)│   │ (real-time)  │   │ (Porter) │  │
│  └─────────────┘   └──────┬───────┘   └──────────┘  │
└─────────────────────────┬─┼───────────────────────-─┘
                          │ │
              ┌───────────▼─▼──────────┐
              │    Wikipedia API        │
              │  en.wikipedia.org       │
              └────────────────────────┘

How a search works

User types query
      ↓
Wikipedia OpenSearch API → fetch 8–10 article titles
      ↓
Parallel fetch of each Wikipedia article (cheerio scraping)
      ↓
Strip infoboxes, navboxes, references → extract clean text + categories
      ↓
Feed into BM25Index → score every document against query terms
      ↓
Return top 10 results with highlighted snippets

📁 Project Structure

search_engine/
├── frontend/
│   ├── index.html        # SPA shell — hero + sticky topbar + results
│   ├── style.css         # Dark luxury theme (Playfair Display, gold palette)
│   └── main.js           # Search, autocomplete, layout switching
│
├── server/
│   └── server.js         # Express app — BM25 engine + Wikipedia fetcher + API routes
│
├── crawler/
│   └── crawler.js        # WikiCrawler EventEmitter (optional batch crawling)
│
├── indexer/
│   ├── tokenizer.js      # Lowercase → strip punctuation → stopwords → Porter stem
│   └── stopwords.js      # ~80 common English stopwords
│
├── package.json
└── README.md

🚀 Getting Started

Prerequisites

Node.js 18+
npm

Installation

# Clone the repository
git clone https://github.com/YOUR_USERNAME/wsearch.git
cd wsearch/search_engine

# Install dependencies
npm install

# Start the server
npm start

Open http://localhost:3000 in your browser.

Development (auto-reload)

npm run dev

⚙️ BM25 Configuration

The ranking algorithm is tuned with the following parameters in server/server.js:

Parameter	Value	Effect
`k1`	`1.5`	Term frequency saturation — higher = more weight on repeated terms
`b`	`0.75`	Document length normalization — 0 = no normalization, 1 = full
Title boost	`3×`	Title tokens are indexed 3 times for higher relevance
Category boost	`2×`	Category tokens indexed twice

🌐 Deployment

Render (current — free tier)

Live at wsearch.onrender.com

# Build command
npm install

# Start command
node server/server.js

Environment variables:

NODE_ENV=production
RENDER_EXTERNAL_URL=https://wsearch.onrender.com

Railway

# Auto-detected as Node.js
# Start command: node server/server.js
# No sleep on free tier — always on

Local / Self-hosted

npm start
# → http://localhost:3000

🛠️ Tech Stack

Layer	Technology
Runtime	Node.js 18+
Server	Express 4
HTTP client	Axios
HTML parsing	Cheerio
Ranking	Okapi BM25 (custom implementation)
Stemming	Porter Stemmer (custom, zero dependencies)
Frontend	Vanilla JS, CSS3
Fonts	Playfair Display, Inter (Google Fonts)
Wikipedia	OpenSearch API + HTML scraping

📄 License

MIT — free to use, modify and deploy.

Built with ☕ and Node.js · wsearch.onrender.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 WSearch

Real-time Wikipedia Search Engine with BM25 Ranking

✨ Features

🏗️ Architecture

How a search works

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

Development (auto-reload)

⚙️ BM25 Configuration

🌐 Deployment

Render (current — free tier)

Railway

Local / Self-hosted

🛠️ Tech Stack

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
crawler		crawler
frontend		frontend
indexer		indexer
server		server
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

🔍 WSearch

Real-time Wikipedia Search Engine with BM25 Ranking

✨ Features

🏗️ Architecture

How a search works

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

Development (auto-reload)

⚙️ BM25 Configuration

🌐 Deployment

Render (current — free tier)

Railway

Local / Self-hosted

🛠️ Tech Stack

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages