Skip to content

Fix: Complete Multi-Language Support for React-to-Me (#104)#140

Open
bleedblack1 wants to merge 1 commit intoreactome:mainfrom
bleedblack1:multi_language
Open

Fix: Complete Multi-Language Support for React-to-Me (#104)#140
bleedblack1 wants to merge 1 commit intoreactome:mainfrom
bleedblack1:multi_language

Conversation

@bleedblack1
Copy link
Copy Markdown

This PR resolves an issue where the React-to-Me chatbot always responded in English, even when users submitted questions in other languages.

The root cause was that the detected language from the language detection pipeline was never passed to the final generation step. As a result, the LLM only received English inputs and produced English outputs.

This change ensures the chatbot:

  • Detects the user's language
  • Uses English queries for retrieval
  • Generates the final response in the user’s detected language

The fix maintains retrieval quality while enabling full multilingual response support.


Problem

The chatbot pipeline already contained language detection and query rephrasing, but the information flow stopped before the generation stage.

Current behavior:

  1. User question is processed
  2. Language detection stores the detected language
  3. Query is translated to English for retrieval
  4. The generation step ignores the detected language

Because the model receives English queries and English context, responses are always generated in English.


Solution

The fix introduces an English-for-Search, Native-for-Response strategy.

Pipeline after this change:

User Input (any language)
      ↓
Language Detection
      ↓
Translate query to English
      ↓
Hybrid Retrieval (English index)
      ↓
LLM generates response in detected language

Retrieval continues to use English queries for optimal embedding similarity, while the response language follows the detected language.


Implementation Details

1. React-to-Me Profile Fix

File updated:

src/agent/profiles/react_to_me.py

The generation step now retrieves the detected language and injects a response instruction when the language is not English.

Example logic:

query = state["rephrased_input"]
detected_language = state.get("detected_language", "English")

if detected_language.lower() != "english":
    query = f"{query}\n\n[CRITICAL INSTRUCTION: respond in {detected_language}]"

This instruction is appended to the query before invoking the RAG chain.

This approach avoids modifying the RAG chain architecture while ensuring the LLM follows the correct output language.


2. Prompt Reinforcement

File updated:

src/retrievers/reactome/prompt.py

The system prompt now explicitly acknowledges that language instructions may appear in the query and must be followed.

This acts as a secondary safeguard to ensure consistent multilingual responses.


3. Rephrase Task Clarification

File updated:

src/agent/tasks/rephrase.py

The prompt documentation now explains the rationale behind always returning English queries.

This clarification helps future contributors understand that English queries are required because:

  • embeddings are English
  • the document index is English
  • retrieval quality depends on English inputs

4. Cross-Database Summarization Improvements

File updated:

src/agent/tasks/cross_database/summarize_reactome_uniprot.py

Updates include:

  • Typo corrections
  • Improved language instruction clarity
  • Corrected instruction numbering

The updated prompt enforces consistent language output while preserving scientific terminology.


Scientific Terminology Preservation

The LLM is instructed not to translate scientific identifiers, including:

  • gene symbols
  • protein names
  • Reactome pathway IDs
  • database URLs

Examples remain unchanged:

TP53
R-HSA-109581
https://reactome.org

This ensures scientific accuracy across languages.


Impact

English users

  • No change in behavior.

Non-English users

  • Responses are now generated in their native language.

Retrieval system

  • No changes to embeddings or document index.

Streaming responses

  • Unaffected.

Result

This change enables true multilingual responses in the React-to-Me chatbot while preserving the existing RAG retrieval architecture and maintaining search accuracy.

Fix: #104

r2m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RAG only responds in English

1 participant