Semantic retrieval pipeline for the Eastern Slavic protest-video corpus (v2)
-
Updated
May 19, 2026 - Jupyter Notebook
Semantic retrieval pipeline for the Eastern Slavic protest-video corpus (v2)
This project extends the v2-0 Hybrid Model (statistical features + TTM embeddings) by adding a third modality: text embeddings derived from equipment master data. The three feature vectors — statistical x ∈ ℝ²⁸, TTM embedding y ∈ ℝ⁶⁴, and text embedding z ∈ ℝ¹⁰²⁴ — are concatenated into a 1,116-dimensional triplet feature h and fed into a LightGBM.
An experimental project designed for embedding-based text similarity search in web pages.
A production-grade multilingual RAG system for Moroccan legal question answering, grounded in official Arabic legal texts with traceable citations and a systematic evaluation framework.
Add a description, image, and links to the multilingual-e5 topic page so that developers can more easily learn about it.
To associate your repository with the multilingual-e5 topic, visit your repo's landing page and select "manage topics."