Improve response precision by refining system prompts for Reactome and UniProt retrievers#121
Open
GovindhKishore wants to merge 2 commits intoreactome:mainfrom
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improve Response Precision by Refining System Prompts
Problem
As more data sources are integrated into the retrieval pipeline, chatbot responses have been growing increasingly verbose and noisy. This is a known concern raised in the project. The root cause on the generation side is that both
reactome/prompt.pyanduniprot/prompt.pyexplicitly instruct the LLM to be comprehensive and capture all mechanistically relevant details - which causes the LLM to include everything it finds regardless of direct relevance to the question asked.Problematic instructions in the old prompts:
reactome/prompt.py:uniprot/prompt.py:These instructions tell the LLM to maximise coverage - which causes response bloat as more data sources are added.
Solution
Shift both prompts from exhaustive coverage to relevance-first precision:
"comprehensively"with"accurately and precisely""Capture ALL details"with"Include only the most directly relevant details""Answer ONLY what was asked","Do NOT repeat information","Do NOT add background unless essential""stops when the question is fully answered"Files Changed
Manual Testing
Both old and new prompts were tested manually using identical context and questions to measure the impact of the changes.
Test 1: UniProt Prompt
Test Context:
Question:
"What is the role of SMC2 in chromosome condensation during mitosis?"OLD UniProt Prompt Response:
During mitosis, chromosomes must become highly compact so they can be accurately separated into daughter cells. A key protein involved in this process is SMC2, which functions as part of a molecular machine called the condensin complex.
Role of SMC2 in Chromosome Condensation
SMC2 is a central structural and functional component of the condensin complex, a protein assembly responsible for converting loosely organized interphase chromatin into the tightly packed mitotic chromosomes required for cell division. SMC2
Formation of the Condensin Core Complex
SMC2 forms a heterodimer with SMC4, creating the core of the condensin complex. This pairing produces a functional ATPase, meaning the complex can use ATP hydrolysis to power structural changes in DNA. SMC2
Driving Chromatin Condensation
The condensin complex containing SMC2 promotes chromosome compaction by restructuring DNA topology. Specifically, it:
These changes compact DNA into the dense structures characteristic of mitotic chromosomes. SMC2
In higher eukaryotes, SMC2 participates in two distinct condensin complexes:
Both complexes contain SMC2 and SMC4 and are required for successful mitosis, indicating that SMC2 is essential for proper chromosome organization and segregation during cell division. SMC2
Why This Matters
By enabling the condensin complex to reshape and compact chromatin, SMC2 ensures chromosomes are properly condensed and structured during mitosis, which is crucial for accurate chromosome segregation and the maintenance of genomic stability. SMC2
Sources
NEW UniProt Prompt Response:
SMC2 plays a central role in chromosome condensation during mitosis as a key component of the condensin complex.
Specifically, SMC2 forms a heterodimer with SMC4, creating a functional ATPase complex that is required for the condensation of chromatin into mitotic chromosomes. This activity is essential for converting interphase chromatin into the highly condensed chromosomes characteristic of mitosis.
The condensin complex containing SMC2 actively modifies DNA topology by:
These structural changes in DNA help compact chromatin into tightly organized mitotic chromosomes. In higher eukaryotes, two condensin complexes exist — condensin I and condensin II — and both contain SMC2 and SMC4, making SMC2 essential for proper chromosome condensation and successful mitosis.
Sources
Test 2: Reactome Prompt
Test Context:
Question:
"How does TP53 promote apoptosis through NOXA activation?"OLD Reactome Prompt Response:
TP53 promotes apoptosis through activation of the pro-apoptotic gene NOXA (PMAIP1) via a sequence of regulatory events involving post-translational modification, transcriptional activation, and mitochondrial signaling. Within the Reactome framework, this mechanism links cellular stress signaling to mitochondrial apoptosis pathways through TP53-mediated transcriptional regulation.
Stress-induced stabilization and activation of TP53
Under normal cellular conditions, TP53 protein levels are tightly controlled by the E3 ubiquitin ligase MDM2. MDM2 ubiquitinates TP53, targeting it for proteasome-mediated degradation, which maintains low TP53 levels in unstressed cells. Regulation of TP53 Activity through Phosphorylation
When cells encounter stress, TP53 becomes stabilized through post-translational phosphorylation. Specifically, the kinases ATM and ATR phosphorylate TP53 at serine residues S15 and S20, which prevents its degradation and enhances its transcriptional activity. Regulation of TP53 Activity through Phosphorylation
An additional regulatory modification further determines the outcome of TP53 signaling. Phosphorylation at serine S46 biases TP53 toward activating pro-apoptotic target genes, rather than genes involved in cell-cycle arrest. Regulation of TP53 Activity through Phosphorylation
TP53-dependent transcriptional activation of NOXA
Once stabilized and transcriptionally active, TP53 induces expression of several genes that promote apoptosis. One such target is NOXA (PMAIP1). NOXA is transactivated in a TP53-dependent manner, meaning that activated TP53 directly promotes transcription of the NOXA gene. The transcription factor E2F1 can also activate NOXA expression through a separate regulatory mechanism. NOXA Activation
This transcriptional activation links nuclear stress signaling mediated by TP53 to downstream mitochondrial apoptosis pathways. NOXA Activation
Mitochondrial action of NOXA in apoptosis
After NOXA protein is synthesized, it translocates to mitochondria, where it functions as a pro-apoptotic regulator. NOXA Activation
At the mitochondria, NOXA promotes apoptosis by inhibiting anti-apoptotic members of the BCL-2 protein family. By neutralizing these survival-promoting proteins, NOXA shifts the balance of BCL-2 family interactions toward pro-apoptotic signaling, facilitating mitochondrial apoptotic processes. NOXA Activation
Integrated mechanism
Taken together, the Reactome mechanism linking TP53 and NOXA can be summarized as follows:
Cellular stress activates ATM/ATR kinases.
ATM/ATR phosphorylate TP53, stabilizing it and enhancing its transcriptional activity.
Additional phosphorylation at S46 biases TP53 toward apoptosis-related gene expression.
TP53 transactivates the NOXA gene, increasing NOXA protein production.
NOXA translocates to mitochondria, where it inhibits anti-apoptotic BCL-2 family proteins.
This inhibition promotes mitochondria-mediated apoptosis.
Thus, TP53 acts as a stress-responsive transcription factor that converts upstream damage signals into mitochondrial apoptotic signaling through NOXA activation.
Sources
NEW Reactome Prompt Response:
TP53 promotes apoptosis through activation of the NOXA (PMAIP1) gene by regulating its transcription under cellular stress.
Under normal, unstressed conditions, TP53 protein levels are kept low because the E3 ubiquitin ligase MDM2 ubiquitinates TP53, targeting it for proteasome-mediated degradation TP53 Regulates Transcription of Cell Death Genes. When cells experience stress, TP53 becomes stabilized through phosphorylation, including phosphorylation at serine 15 (S15) and serine 20 (S20) by the kinases ATM and ATR TP53 Regulates Transcription of Cell Death Genes.
A key regulatory step occurs when TP53 is phosphorylated at serine 46 (S46). This modification shifts TP53's transcriptional program toward pro-apoptotic gene expression rather than cell-cycle arrest genes TP53 Regulates Transcription of Cell Death Genes.
One of the genes activated by TP53 in this pro-apoptotic program is NOXA (PMAIP1). TP53 transactivates the NOXA gene, leading to production of the NOXA protein TP53 Regulates Transcription of Cell Death Genes.
After synthesis, NOXA protein translocates to mitochondria, where it promotes apoptosis by inhibiting anti-apoptotic members of the BCL-2 family TP53 Regulates Transcription of Cell Death Genes. By neutralizing these anti-apoptotic proteins, NOXA contributes to the activation of the mitochondrial apoptotic pathway.
In summary, TP53 promotes apoptosis through NOXA activation by:
Sources
AI Assistance: Used AI tools to assist with drafting. All changes reviewed and verified by the contributor.
Summary of Results
In both cases all key factual content is preserved. The reduction comes entirely from removing redundant structure, unnecessary section headers, and tangentially related information.
Relation to Other Work
This PR complements #116 (FlashRank reranking) which addresses the retrieval side of the same verbosity problem:
Together they form a complete solution to the response quality concern raised in the project.