Skip to content

codeurali/topic-watch

Repository files navigation

Topic Watch

Generic keyword/topic monitoring engine. Polls news, web search, RSS feeds, and Twitter/X, filters results by your configured keywords, classifies risk and sentiment, stores items in SQLite, and sends alerts to Telegram or WhatsApp.

What it does

  1. Collect — queries Google News RSS, SearXNG (web/news), direct RSS feeds, and Twitter/X for articles matching your topic keywords
  2. Filter — keeps only relevant + recent items, deduplicates across sources
  3. Classify — assigns a tone (positif/neutre/negatif) and risk level (low/medium/high/critical) using configurable term lists
  4. Store — persists items in a local SQLite database and writes JSONL snapshots
  5. Alert — sends undelivered items to Telegram and/or WhatsApp, ordered by risk

Requirements

  • Python 3.10+
  • A running SearXNG instance (for web/news queries)
  • A Telegram bot token + chat ID, and/or a Green API account (for WhatsApp)
  • Optional: a twitterapi.io API key for Twitter/X search

Quick start (new server)

# 1. Clone / copy the project folder
git clone <repo> topic-watch
cd topic-watch

# 2. Install dependencies
bash setup.sh

# 3. Configure secrets
cp .env.example .env
$EDITOR .env            # fill in TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID, SEARX_BASE …

# 4. Create your topic config
cp config.yaml my-topic.yaml
$EDITOR my-topic.yaml   # customize keywords, queries, RSS feeds …

# 5. First collection run
.venv/bin/python watch.py --config my-topic.yaml --mode collect

# 6. Send alerts for what was found
.venv/bin/python watch.py --config my-topic.yaml --mode alerts --send-telegram

Config file reference (config.yaml)

All fields are optional — unset fields default to empty lists / sane defaults.

Field Type Description
project_name string Human-readable name shown in alert headers
project_slug string Lowercase slug used for DB and file naming
standalone_triggers list Any single term triggers relevance (substring match)
anchor_terms list Scored terms — ≥ anchor_score_threshold hits → relevant
anchor_score_threshold int Minimum anchor score (default: 2)
positive_terms list Boost positive tone score
negative_terms list Boost negative tone score
risk_terms list Each hit raises risk; 1→medium, 3+→high
mobilization_signals list Any single hit → critical risk
risk_overrides dict "Actor name": critical — force risk level per actor
excluded_url_patterns list Python regex patterns matched against url title — skip if matched
search_queries.news list Queries sent to Google News RSS
search_queries.web list Queries sent to SearXNG
search_queries.twitter list Queries sent to twitterapi.io search
search_queries.mobilization list Additional queries sent to SearXNG
hostile_rss_feeds list RSS feed URLs fetched directly (filtered post-fetch)
news_rss_feeds list Same, but source_type tagged as news_rss
twitter_accounts list Twitter handles monitored by timeline
target_actors list Cross-queried with event_join_terms → news + web queries
social_actors list Same, for Facebook/Instagram/TikTok site: queries
event_join_terms list Event-side terms for actor × event cross-queries

Example: minimal config

project_name: "Climate Watch France"
project_slug: "climate-fr"

standalone_triggers:
  - "loi climat"
  - "taxe carbone"

anchor_terms:
  - "climat"
  - "co2"
  - "transition énergétique"
  - "cop30"
anchor_score_threshold: 2

negative_terms:
  - "opposition"
  - "recours"
  - "blocage"

risk_terms:
  - "manifestation"
  - "grève"
  - "blocage"

search_queries:
  news:
    - '"loi climat" OR "taxe carbone" 2026'
  web:
    - 'site:x.com ("loi climat" OR "taxe carbone")'
  twitter:
    - '"taxe carbone" lang:fr'

news_rss_feeds:
  - "https://www.lemonde.fr/planete/rss_full.xml"
  - "https://www.liberation.fr/arc/outboundfeeds/rss/category/terre/?outputType=xml"

CLI reference

python watch.py [--config FILE] --mode MODE [OPTIONS]
Option Default Description
--config FILE config.yaml Path to topic YAML config
--mode collect collect, alerts, or digest
--window-hours N 6 Digest window (hours)
--send-telegram off Send alerts to Telegram
--send-whatsapp off Send alerts to WhatsApp (Green API)
--push-loki off Push items to Grafana Loki
--push-xo off Push items to local OTLP collector (port 4318)

Running multiple topics on the same server

Each config has its own project_slug, which gives it a separate DB and output files.

# Climate alerts
python watch.py --config topics/climate.yaml --mode collect
python watch.py --config topics/climate.yaml --mode alerts --send-telegram

# Elections alerts
python watch.py --config topics/elections.yaml --mode collect
python watch.py --config topics/elections.yaml --mode alerts --send-telegram

Scheduling with cron

# Every 30 minutes: collect + alert
*/30 * * * * cd /opt/topic-watch && .venv/bin/python watch.py --mode collect >> /var/log/topic-watch.log 2>&1
*/30 * * * * cd /opt/topic-watch && .venv/bin/python watch.py --mode alerts --send-telegram >> /var/log/topic-watch.log 2>&1

# Daily digest at 8:00
0 8 * * * cd /opt/topic-watch && .venv/bin/python watch.py --mode digest --window-hours 24 --send-telegram >> /var/log/topic-watch.log 2>&1

Or use the provided helper:

chmod +x run_watch.sh
./run_watch.sh --config my-topic.yaml

SearXNG

setup.sh installs and starts SearXNG automatically via Docker Compose on port 8889. run_watch.sh checks at each run that SearXNG is up and restarts it if needed.

If you prefer to manage it manually:

# Start
docker compose up -d searxng

# Stop
docker compose down

# Logs
docker compose logs -f searxng

The JSON output format (required for web queries) is pre-enabled in searxng/settings.yml.


Environment variables

Variable Required Description
SEARX_BASE Yes (web queries) SearXNG base URL
TELEGRAM_BOT_TOKEN For Telegram Bot token from @BotFather
TELEGRAM_CHAT_ID For Telegram Target channel/group ID
GREEN_API_INSTANCE For WhatsApp Green API instance ID
GREEN_API_TOKEN For WhatsApp Green API token
GREEN_API_CHAT_ID For WhatsApp Target WhatsApp chat ID
TWITTERAPIO_KEY For Twitter twitterapi.io API key
GRAFANA_LOKI_URL Optional Loki push endpoint
GRAFANA_LOKI_USER Optional Loki basic-auth user
GRAFANA_CLOUD_TOKEN Optional Loki basic-auth token
CRON_OUTPUT_DIR Optional Override digest file output directory

Project layout

topic-watch/
├── watch.py            ← main CLI entry point
├── config.yaml         ← example topic config (copy and customize)
├── .env.example        ← env variable template (copy to .env)
├── requirements.txt
├── setup.sh            ← creates .venv and installs deps
├── run_watch.sh        ← collect + alert in one call
├── engine/
│   ├── config.py       ← YAML config loader
│   ├── fetch.py        ← HTTP fetchers (SearXNG, RSS, Twitter, DDG)
│   ├── classify.py     ← relevance, tone, risk classification
│   ├── collect.py      ← query builder + collection pipeline
│   ├── db.py           ← SQLite persistence
│   └── notify.py       ← Telegram, WhatsApp, Loki, OTLP, digest
├── data/               ← SQLite databases (auto-created, gitignored)
└── output/             ← JSONL snapshots + digest files (gitignored)

Adding a new delivery channel

Implement a function in engine/notify.py following the same pattern as telegram_send(), then call it from alert_mode().

About

Generic topic-monitoring engine — RSS, Twitter/X, SearXNG → Telegram/WhatsApp alerts. Docker or systemd, runs anywhere.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors