Skip to content

Improve response precision by refining system prompts for Reactome and UniProt retrievers#121

Open
GovindhKishore wants to merge 2 commits intoreactome:mainfrom
GovindhKishore:improve-response-precision
Open

Improve response precision by refining system prompts for Reactome and UniProt retrievers#121
GovindhKishore wants to merge 2 commits intoreactome:mainfrom
GovindhKishore:improve-response-precision

Conversation

@GovindhKishore
Copy link
Copy Markdown

@GovindhKishore GovindhKishore commented Mar 4, 2026

Improve Response Precision by Refining System Prompts

Problem

As more data sources are integrated into the retrieval pipeline, chatbot responses have been growing increasingly verbose and noisy. This is a known concern raised in the project. The root cause on the generation side is that both reactome/prompt.py and uniprot/prompt.py explicitly instruct the LLM to be comprehensive and capture all mechanistically relevant details - which causes the LLM to include everything it finds regardless of direct relevance to the question asked.

Problematic instructions in the old prompts:

reactome/prompt.py:

"comprehensively, mechanistically, and with precision"
"Capture ALL mechanistically relevant details"
"information-rich narrative"
"presenting background, mechanisms, and significance"

uniprot/prompt.py:

"comprehensively, accurately, and in an engaging manner"
"Answer the question comprehensively and accurately"
"Provide any useful background information"

These instructions tell the LLM to maximise coverage - which causes response bloat as more data sources are added.


Solution

Shift both prompts from exhaustive coverage to relevance-first precision:

  • Replace "comprehensively" with "accurately and precisely"
  • Replace "Capture ALL details" with "Include only the most directly relevant details"
  • Add explicit scoping: "Answer ONLY what was asked", "Do NOT repeat information", "Do NOT add background unless essential"
  • Add stop condition: "stops when the question is fully answered"
  • Retain all citation formatting, source discipline, and tone guidelines unchanged

Files Changed

src/retrievers/reactome/prompt.py
src/retrievers/uniprot/prompt.py

Manual Testing

Both old and new prompts were tested manually using identical context and questions to measure the impact of the changes.


Test 1: UniProt Prompt

Test Context:

Titin is a key component in the assembly and functioning of vertebrate 
striated muscles. In non-muscle cells, titin plays a role in chromosome 
condensation and chromosome segregation during mitosis.
Source: https://www.uniprot.org/uniprotkb/Q8WZ42/entry

SMC2 is a central component of the condensin complex, required for 
conversion of interphase chromatin into mitotic-like condensed chromosomes. 
SMC2 forms a heterodimer with SMC4 creating a functional ATPase. The 
condensin complex introduces positive supercoils into relaxed DNA. Higher 
eukaryotes contain two condensin complexes — condensin I and II — both 
containing SMC2 and SMC4.
Source: https://www.uniprot.org/uniprotkb/O95347/entry

Collagen alpha-1(VII) forms anchoring fibrils between the dermal-epidermal 
junction and underlying stroma. Mutations in COL7A1 cause dystrophic 
epidermolysis bullosa.
Source: https://www.uniprot.org/uniprotkb/Q02388/entry

Question: "What is the role of SMC2 in chromosome condensation during mitosis?"


OLD UniProt Prompt Response:

During mitosis, chromosomes must become highly compact so they can be accurately separated into daughter cells. A key protein involved in this process is SMC2, which functions as part of a molecular machine called the condensin complex.

Role of SMC2 in Chromosome Condensation

SMC2 is a central structural and functional component of the condensin complex, a protein assembly responsible for converting loosely organized interphase chromatin into the tightly packed mitotic chromosomes required for cell division. SMC2

  1. Formation of the Condensin Core Complex
    SMC2 forms a heterodimer with SMC4, creating the core of the condensin complex. This pairing produces a functional ATPase, meaning the complex can use ATP hydrolysis to power structural changes in DNA. SMC2

  2. Driving Chromatin Condensation
    The condensin complex containing SMC2 promotes chromosome compaction by restructuring DNA topology. Specifically, it:

  • Introduces positive supercoils into relaxed DNA
  • Converts nicked DNA into positively knotted structures
    These changes compact DNA into the dense structures characteristic of mitotic chromosomes. SMC2
  1. Function in Multiple Condensin Complexes
    In higher eukaryotes, SMC2 participates in two distinct condensin complexes:
  • Condensin I
  • Condensin II
    Both complexes contain SMC2 and SMC4 and are required for successful mitosis, indicating that SMC2 is essential for proper chromosome organization and segregation during cell division. SMC2

Why This Matters
By enabling the condensin complex to reshape and compact chromatin, SMC2 ensures chromosomes are properly condensed and structured during mitosis, which is crucial for accurate chromosome segregation and the maintenance of genomic stability. SMC2

Sources

  • SMC2 – UniProt Knowledgebase entry for Structural Maintenance of Chromosomes protein 2 (SMC2).

Word count: ~220 words. Includes introductory paragraph, 3 numbered sections, and a "Why This Matters" closing - all for a question with a direct one-paragraph answer.


NEW UniProt Prompt Response:

SMC2 plays a central role in chromosome condensation during mitosis as a key component of the condensin complex.

Specifically, SMC2 forms a heterodimer with SMC4, creating a functional ATPase complex that is required for the condensation of chromatin into mitotic chromosomes. This activity is essential for converting interphase chromatin into the highly condensed chromosomes characteristic of mitosis.

The condensin complex containing SMC2 actively modifies DNA topology by:

  • Introducing positive supercoils into relaxed DNA
  • Converting nicked DNA into positively knotted forms

These structural changes in DNA help compact chromatin into tightly organized mitotic chromosomes. In higher eukaryotes, two condensin complexes exist — condensin I and condensin II — and both contain SMC2 and SMC4, making SMC2 essential for proper chromosome condensation and successful mitosis.

Sources

Word count: ~110 words. All key facts retained. Titin and COL7A1 sources correctly ignored as irrelevant. No unnecessary sections or closing summary.


Test 2: Reactome Prompt

Test Context:

TP53 protein stability is regulated by post-translational modifications 
including ubiquitination by MDM2 which triggers degradation. Under stress, 
ATM and ATR phosphorylate TP53 at S15 and S20 causing stabilization. 
Phosphorylation at S46 specifically promotes pro-apoptotic gene transcription.
Source: https://reactome.org/content/detail/R-HSA-5633007

NOXA (PMAIP1) is transactivated by TP53 in a p53-dependent manner and 
also by E2F1. Activated NOXA translocates to mitochondria where it inhibits 
anti-apoptotic BCL-2 family members to promote apoptosis.
Source: https://reactome.org/content/detail/R-HSA-111448

TP53 also regulates glucose metabolism by suppressing glycolysis and 
promoting oxidative phosphorylation via TIGAR which reduces 
fructose-2,6-bisphosphate levels.
Source: https://reactome.org/content/detail/R-HSA-70326

Question: "How does TP53 promote apoptosis through NOXA activation?"


OLD Reactome Prompt Response:

TP53 promotes apoptosis through activation of the pro-apoptotic gene NOXA (PMAIP1) via a sequence of regulatory events involving post-translational modification, transcriptional activation, and mitochondrial signaling. Within the Reactome framework, this mechanism links cellular stress signaling to mitochondrial apoptosis pathways through TP53-mediated transcriptional regulation.

  1. Stress-induced stabilization and activation of TP53
    Under normal cellular conditions, TP53 protein levels are tightly controlled by the E3 ubiquitin ligase MDM2. MDM2 ubiquitinates TP53, targeting it for proteasome-mediated degradation, which maintains low TP53 levels in unstressed cells. Regulation of TP53 Activity through Phosphorylation
    When cells encounter stress, TP53 becomes stabilized through post-translational phosphorylation. Specifically, the kinases ATM and ATR phosphorylate TP53 at serine residues S15 and S20, which prevents its degradation and enhances its transcriptional activity. Regulation of TP53 Activity through Phosphorylation
    An additional regulatory modification further determines the outcome of TP53 signaling. Phosphorylation at serine S46 biases TP53 toward activating pro-apoptotic target genes, rather than genes involved in cell-cycle arrest. Regulation of TP53 Activity through Phosphorylation

  2. TP53-dependent transcriptional activation of NOXA
    Once stabilized and transcriptionally active, TP53 induces expression of several genes that promote apoptosis. One such target is NOXA (PMAIP1). NOXA is transactivated in a TP53-dependent manner, meaning that activated TP53 directly promotes transcription of the NOXA gene. The transcription factor E2F1 can also activate NOXA expression through a separate regulatory mechanism. NOXA Activation
    This transcriptional activation links nuclear stress signaling mediated by TP53 to downstream mitochondrial apoptosis pathways. NOXA Activation

  3. Mitochondrial action of NOXA in apoptosis
    After NOXA protein is synthesized, it translocates to mitochondria, where it functions as a pro-apoptotic regulator. NOXA Activation
    At the mitochondria, NOXA promotes apoptosis by inhibiting anti-apoptotic members of the BCL-2 protein family. By neutralizing these survival-promoting proteins, NOXA shifts the balance of BCL-2 family interactions toward pro-apoptotic signaling, facilitating mitochondrial apoptotic processes. NOXA Activation

  4. Integrated mechanism
    Taken together, the Reactome mechanism linking TP53 and NOXA can be summarized as follows:

  5. Cellular stress activates ATM/ATR kinases.

  6. ATM/ATR phosphorylate TP53, stabilizing it and enhancing its transcriptional activity.

  7. Additional phosphorylation at S46 biases TP53 toward apoptosis-related gene expression.

  8. TP53 transactivates the NOXA gene, increasing NOXA protein production.

  9. NOXA translocates to mitochondria, where it inhibits anti-apoptotic BCL-2 family proteins.

  10. This inhibition promotes mitochondria-mediated apoptosis.
    Thus, TP53 acts as a stress-responsive transcription factor that converts upstream damage signals into mitochondrial apoptotic signaling through NOXA activation.

Sources

Word count: ~380 words. Includes 4 numbered sections, repeated information across sections, and a full 6-step integrated summary - all for a question that is fully answered in 2 paragraphs.


NEW Reactome Prompt Response:

TP53 promotes apoptosis through activation of the NOXA (PMAIP1) gene by regulating its transcription under cellular stress.

Under normal, unstressed conditions, TP53 protein levels are kept low because the E3 ubiquitin ligase MDM2 ubiquitinates TP53, targeting it for proteasome-mediated degradation TP53 Regulates Transcription of Cell Death Genes. When cells experience stress, TP53 becomes stabilized through phosphorylation, including phosphorylation at serine 15 (S15) and serine 20 (S20) by the kinases ATM and ATR TP53 Regulates Transcription of Cell Death Genes.

A key regulatory step occurs when TP53 is phosphorylated at serine 46 (S46). This modification shifts TP53's transcriptional program toward pro-apoptotic gene expression rather than cell-cycle arrest genes TP53 Regulates Transcription of Cell Death Genes.

One of the genes activated by TP53 in this pro-apoptotic program is NOXA (PMAIP1). TP53 transactivates the NOXA gene, leading to production of the NOXA protein TP53 Regulates Transcription of Cell Death Genes.

After synthesis, NOXA protein translocates to mitochondria, where it promotes apoptosis by inhibiting anti-apoptotic members of the BCL-2 family TP53 Regulates Transcription of Cell Death Genes. By neutralizing these anti-apoptotic proteins, NOXA contributes to the activation of the mitochondrial apoptotic pathway.

In summary, TP53 promotes apoptosis through NOXA activation by:

  1. Stabilizing and activating TP53 through stress-induced phosphorylation.
  2. Phosphorylation at S46 directing TP53 toward pro-apoptotic gene transcription.
  3. Transactivation of the NOXA gene.
  4. NOXA protein acting at mitochondria to inhibit anti-apoptotic BCL-2 family proteins, thereby promoting apoptosis.

Sources

Word count: ~210 words. Same key facts retained. Glucose metabolism source correctly ignored as irrelevant. Cleaner flow with no redundant section headers.


AI Assistance: Used AI tools to assist with drafting. All changes reviewed and verified by the contributor.

Summary of Results

Test Old Word Count New Word Count Reduction
UniProt : SMC2 condensation ~220 words ~110 words ~50%
Reactome : TP53/NOXA apoptosis ~380 words ~210 words ~45%

In both cases all key factual content is preserved. The reduction comes entirely from removing redundant structure, unnecessary section headers, and tangentially related information.


Relation to Other Work

This PR complements #116 (FlashRank reranking) which addresses the retrieval side of the same verbosity problem:

Together they form a complete solution to the response quality concern raised in the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Reduce Response Verbosity by Improving System Prompt Precision

1 participant