Skip to content

Multiple failures with small LLM (Ollama) #160

@Ehsanstp

Description

@Ehsanstp

SelfQueryRetriever causing syntax crash and completeness_grader throwing AttributeError

  • OS: Windows 11 (WSL 2, Ubuntu)
  • Embedding model: sentence-transformers/all-MiniLM-L6-v2 model
  • Embeddings dataset version: Reactome Release 96
  • GraphDB version: reactome/graphdb:latest (Release 96)
  • Generation backend: Ollama (currently: qwen2.5:0.5b)

BUGS:

A. AttributError in completeness_grader.py

This is an unhandled exception in the assess_completeness node of the external_search workflow. When the LLM output doesn't stick to the CompletenessGrade schema, the with_structured_output() returns None for the result instead of raising. The chatbot is already fully streamed to the UI before postprocess() though, regardless of the crash. What breaks is the Tavily web search fallback.

Error log:

biochat_chainlit  |   File "/app/src/external_search/completeness_grader.py", line 50, in ainvoke
biochat_chainlit  |     return {"external_search": result.binary_score}
biochat_chainlit  |                                ^^^^^^^^^^^^^^^^^^^
biochat_chainlit  | AttributeError: 'NoneType' object has no attribute 'binary_score'
biochat_chainlit  | During task with name 'assess_completeness' and id 'cfcc2097-33e8-405a-c0a7-439fda1f79d0' 

Cause:

  1. Wrong format. Small LLMs return plain text or malformed JSON
  2. If generation exceeds the context window, it might produce output that cannot be parsed.

B. SelfQueryRetriever Syntax Crash.

Wrong syntax generation causing error.

Error log:

Image

Cause:

Small models generate Python-style boolean expressions ( and ) instead of the correct syntax (and_()). So the parser raises Unexpected token error.

FIX:

Both of the bugs have been investigates and fixes are ready as a PR.

BUG 1:

Adding a three-tier fallback:

  • structured output
  • raw text parsing and extracting the grade state
  • Hard default: "No"

"No" is used as the hard default as it triggers web search rather than skipping it altogether if the grader state is unknown. Cost of false "No" is one unnecessary Tavily call.

BUG 2:

Add keyword arguments to chain_kwargs inside SelfQueryRetriever.from_llm(), this will help to reduce the generation of invalid components. But this is not guaranteed for all malformed filters. Besides prompt modification or adding solid examples in the functional format that the model can follow can help: and_(eq("gene", "TP53"), eq("synonyms_geneName", "degradation")).

Note: This won't be a problem with larger models. A similar issue was discussed in this Github issue - langchain-ai/langchain#9368

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions