feat: add integration test case for flow execution#315
Conversation
coderzc
left a comment
There was a problem hiding this comment.
Thanks for the contribution. I reviewed this PR and found two issues that should be addressed before merge:
- [P1] New integration test is not executed in CI
- File: .github/workflows/hugegraph-llm.yml (Run integration tests step)
- test_flows_integration.py is added in this PR, but the workflow still runs only three explicit integration files. So the new tests are not part of PR checks and provide no regression protection.
- Suggestion: include the new file in the pytest command, or switch to directory-level collection with markers.
- [P2] Missing external-service skip guard in new integration tests
- File: hugegraph-llm/src/tests/integration/test_flows_integration.py
- The tests call real flows directly (BUILD_VECTOR_INDEX / GRAPH_EXTRACT / RAG_* / TEXT2GREMLIN) without should_skip_external()-style gating or mocks. Once enabled in CI/local integration runs, they can become flaky and environment-dependent.
- Suggestion: add explicit skip conditions for external dependencies, or stabilize with mocks/stubs for non-deterministic external calls.
Please address these and I can re-review quickly.
Thank u for your suggestions. For P1, I plan to add a end-to-end test for all the test cases in the app demo, which need api-key to launch the basic test. So I don't add the test scripts to the CI. I wanna add a contributed.md for this project and call for all the contributor to check the basic functionality of this project. What about your opinion? For P2, I don't simulate the external api because of the complexity of the workflow (caused by the nondeterminism of llm), so this test scripts is not added to the CI procedure. I'm sorry I didn't reply to you in a timely manner. |
| near_neighbor_first = False | ||
| custom_related_information = "" | ||
|
|
||
| graph_search, gremlin_prompt, vector_search = update_ui_configs( |
There was a problem hiding this comment.
test_rag 实际只测了 raw 模式 + 测试会写回 config_prompt.yaml
问题 1(死代码 → 名义 4 模式实际只测 1 种)
update_ui_configs(...)(line 213-222)只调用一次,传入的 vector_only_answer / graph_only_answer / graph_vector_answer 都是 False,因此返回 vector_search=False, graph_search=False。
后续四段 schedule_flow 始终复用这同一组标记(见 line 227-228 / 251-252 / 275-276 / 299-300),而对局部 bool 的反复赋值(line 244-247 / 268-271 / 292-295)根本没传给 flow,是死代码。
=> 四次调用实际全部以"无检索"模式运行;RAG_VECTOR_ONLY / RAG_GRAPH_ONLY / RAG_GRAPH_VECTOR 这三条核心路径没被实际验证。CONTRIBUTING.md 中的 "All RAG modes" 描述不成立。
问题 2(副作用:测试会改 contributor 的 yaml)
update_ui_configs 在 prompt 内容变化时会触发 prompt.update_yaml_file()(见 rag_block.py:141-147)。Contributor 跑完该测试后,工作区里的 config_prompt.yaml 会被悄悄改写为测试中的 EN prompt 与 default_question="梁漱溟和梁济的关系是什么?",下次 git status 会冒出无关 diff,极易误提交。
修法:参数化覆盖 4 种模式 + 直接构造 vector_search / graph_search,绕开 demo helper 副作用:
@pytest.mark.parametrize(
("flow_name", "raw", "vec_only", "graph_only", "graph_vec"),
[
(FlowName.RAG_RAW, True, False, False, False),
(FlowName.RAG_VECTOR_ONLY, False, True, False, False),
(FlowName.RAG_GRAPH_ONLY, False, False, True, False),
(FlowName.RAG_GRAPH_VECTOR, False, False, False, True),
],
)
def test_rag(self, flow_name, raw, vec_only, graph_only, graph_vec):
query = "梁漱溟和梁济的关系是什么?"
graph_search = graph_only or graph_vec
vector_search = vec_only or graph_vec
res = self.scheduler.schedule_flow(
flow_name,
query=query,
vector_search=vector_search,
graph_search=graph_search,
raw_answer=raw,
vector_only_answer=vec_only,
graph_only_answer=graph_only,
graph_vector_answer=graph_vec,
graph_ratio=0.6,
rerank_method="bleu",
near_neighbor_first=False,
custom_related_information="",
answer_prompt=PromptConfig.answer_prompt_EN,
keywords_extract_prompt=PromptConfig.keywords_extract_prompt_EN,
gremlin_tmpl_num=-1,
gremlin_prompt=PromptConfig.gremlin_generate_prompt_EN,
)
assert isinstance(res, dict) and res.get("answer"), f"{flow_name} returned empty answer"| """Flow集成测试 - 验证各个Flow能正常执行不抛异常""" | ||
|
|
||
| @pytest.fixture(autouse=True) | ||
| def setup(self): |
There was a problem hiding this comment.
should_skip_external() 守卫,本地默认命令必爆
hugegraph-llm/src/tests/conftest.py:45 强制设置 SKIP_EXTERNAL_SERVICES=true,所有现存 integration test(如 test_kg_construction.py:126、test_rag_pipeline.py:113)都会调用 should_skip_external() 在此前提下自动 skip。
新文件没有这个守卫 → contributor 跑 pytest src/tests/ 或 pytest src/tests/integration/ 时,其它 integration test 全部跳过,只有这套新 test 不跳过且必爆(因为它会真实调 LLM/HugeGraph 但默认 SKIP=true 时基础设施不一定齐全),对 contributor 来说是最糟糕的体验。
@coderzc 已在 P2 提出过一次,作者回复"故意不进 CI"——但进不进 CI 与本地默认 pytest 行为是两回事。至少二选一:
方案 A:在 setup 中加守卫,与现有 integration test 一致:
from src.tests.test_utils import should_skip_external
@pytest.fixture(autouse=True)
def setup(self):
if should_skip_external():
pytest.skip("Skipping tests that require external services")
self.index_text = "..."
self.scheduler = SchedulerSingleton.get_instance()方案 B:用自定义 mark + CONTRIBUTING.md 引导:
# pytest.ini 或 pyproject.toml
[tool.pytest.ini_options]
markers = ["local_e2e: tests requiring HugeGraph + LLM API key (skipped by default)"]
# 测试文件
@pytest.mark.local_e2e
class TestFlowsIntegration: ...并在 CONTRIBUTING.md 写明 pytest -m local_e2e 才会跑这套测试。
| self.scheduler.schedule_flow(FlowName.UPDATE_VID_EMBEDDINGS) | ||
| log.info("✓ UPDATE_VID_EMBEDDING flow executed successfully") | ||
| except Exception as e: | ||
| pytest.fail(f"BUILD_VECTOR_INDEX flow failed: {e}") |
There was a problem hiding this comment.
try/except + pytest.fail 反模式
本行(line 131)以及 line 189、line 199 三处 pytest.fail(...) 全部写成 "BUILD_VECTOR_INDEX flow failed: {e}",但实际抛出的可能是 BUILD_SCHEMA / PROMPT_GENERATE / IMPORT_GRAPH_DATA 等任意一个 flow。对于一个面向 contributor 本地自查的 smoke test,错误信息直接误导新人去排查错误的 flow。
更深层的问题是 try/except + pytest.fail(str(e)) 本身就是反模式:
- pytest 默认遇到异常就会 fail 并展示完整 traceback;这层包装反而吞掉异常类型与 cause chain;
- 多余的 try 块降低了可读性。
最简修法:直接删除 try/except 让异常自然冒泡。例如本测试(test_build_knowledge_graph):
def test_build_knowledge_graph(self):
res = self.scheduler.schedule_flow(FlowName.BUILD_VECTOR_INDEX, [self.index_text])
assert "chunks" in res
log.info("BUILD_VECTOR_INDEX flow executed successfully")
# ...后续步骤同样直接调用,不要包 try/except如果非要保留上下文,至少把 flow 名称变量化:pytest.fail(f"{current_flow} failed: {e}")。
| """ | ||
| self.scheduler = SchedulerSingleton.get_instance() | ||
|
|
||
| def test_build_knowledge_graph(self): |
There was a problem hiding this comment.
test_build_knowledge_graph 串行执行 BUILD_VECTOR_INDEX → GRAPH_EXTRACT → IMPORT_GRAPH_DATA → UPDATE_VID_EMBEDDINGS 四个 flow(line 41-129)。若 IMPORT_GRAPH_DATA 挂掉,配合上面那条复制粘贴 bug,contributor 看到的错误是 "BUILD_VECTOR_INDEX flow failed",会被严重误导。
另外 CONTRIBUTING.md 表格把它标为 1 项测试,与实际行为不符。
修法:拆成 4 个独立测试,用 fixture 共享上一步的产物;或至少改用 pytest-subtests:
def test_build_kg__vector_index(self):
res = self.scheduler.schedule_flow(FlowName.BUILD_VECTOR_INDEX, [self.index_text])
assert "chunks" in res and len(res["chunks"]) > 0
def test_build_kg__graph_extract(self):
data = self.scheduler.schedule_flow(FlowName.GRAPH_EXTRACT, ...)
assert data["vertices"] and data["edges"]
# 以此类推好处:失败定位精确到 flow;CONTRIBUTING.md 表格与代码对齐;contributor 跑挂时不需要从头再来。
| } | ||
| """ | ||
|
|
||
| self.scheduler.schedule_flow(FlowName.BUILD_SCHEMA, [self.index_text], query_examples, few_shot) |
There was a problem hiding this comment.
test_schema_generator 无任何断言
本行 self.scheduler.schedule_flow(FlowName.BUILD_SCHEMA, ...) 没有任何断言,等价于"只要不抛异常就算通过"。schema 生成器返回空 list、错误结构、或 LLM 回了一句 "抱歉,我无法生成" 都会被判通过。
同样的弱断言遍布全文件:
- line 197 / 242 / 266 / 290 / 314 / 329:仅
assert res is not None - line 125:
assert res is not None(IMPORT_GRAPH_DATA)
对于面向 contributor 的本地自查,"通过"应该真正校验返回结构。
最低限度修法:
res = self.scheduler.schedule_flow(FlowName.BUILD_SCHEMA, ...)
assert isinstance(res, dict)
assert res.get("schema", {}).get("vertexlabels"), "BUILD_SCHEMA returned empty vertexlabels"
assert res["schema"].get("edgelabels"), "BUILD_SCHEMA returned empty edgelabels"理想做法:把期望产物存到 src/tests/data/flows/expected_schema.json(仓库已有 src/tests/data/ 目录),用结构 diff 校验。
| self.index_text = """ | ||
| 梁漱溟年轻时,一日,他与父亲梁济讨论当时一战欧洲的时局,梁济突然问道:“这个世界会好吗?”梁漱溟答:“我相信世界是一天一天往好里去的。”梁济叹道:“能好就好啊!”然后离家,三日后,梁济投湖自尽。晚年梁漱溟回忆自己的一生和跌宕起伏的近代社会,总结了一本书,书名就叫《这个世界会好吗?》。梁漱溟的回答与年轻时一致。但很多人特别是遗老遗少们总在回忆往日的时光,仿佛那是人类的黄金时代。如同鲁迅笔下的九斤老太,整日里念叨着“一代不如一代”。或者极端如梁济,对世界未来充满悲观,一死了之。在今天的时代,很多人认为“世界正变得越来越糟”,这其中不乏知名的知识分子。平克将这种情况称之为「进步恐惧症」,并总结为「认知偏差」。因为每天的新闻报道里总是充斥着战争、恐怖主义、犯罪、污染等坏消息,不是因为这些事情是主流,而是因为它们是热点,导致给人们的印象是世界越来越糟。所谓“好事不出门,坏事传千里”,而在互联网时代,发达的信息传播让坏事传播的更快更广。要纠正这种「可得性偏差」的方法是用数据说话。数字是最能反应趋势,看战争的比例、犯罪死亡人数在总人数的占比,就能看出犯罪是增加了,还是减少了。实际上,从各种数字显示,人类暴力事件在历史呈明显的下降趋势,这在平克之前发表的另一大部头著作《人性中的善良天使:暴力为什么会减少》中详细阐述过。世界变得更好了,说到底就是进步。 | ||
| """ | ||
| self.scheduler = SchedulerSingleton.get_instance() |
There was a problem hiding this comment.
SchedulerSingleton 是进程级单例,跨测试共享 pipeline_pool
SchedulerSingleton.get_instance()(见 flows/scheduler.py:181-191)是进程级单例,其 pipeline_pool 内每个 GPipelineManager 都会缓存 pipeline。这意味着:
- 顺序耦合:
test_build_knowledge_graph写入的图数据 / 向量索引会被后续test_rag看到。如果用户用pytest -k test_rag单独跑,test_rag是否通过取决于此前是否跑过test_build_knowledge_graph,行为不可复现。 - 脏 pipeline:某测试中途异常未走到
manager.release/add时,下一个测试可能拿到错误状态的 pipeline。
对 contributor 在反复修复-重试的场景下尤其糟糕:上一次失败留下的脏状态会让下一次的失败原因看起来与代码无关。
修法:teardown 中重置;或为每个测试用唯一的 graph_name(避免共享数据)。例如:
@pytest.fixture(autouse=True)
def setup(self):
if should_skip_external():
pytest.skip("...")
self.index_text = "..."
huge_settings.graph_name = f"test_{uuid.uuid4().hex[:8]}"
self.scheduler = SchedulerSingleton.get_instance()
yield
# teardown:清空该 graph,或释放 pipeline至少在文件 docstring 写明"测试间存在隐式数据依赖",让维护者心里有数。
|
|
||
| # Run the end-to-end integration tests | ||
| cd hugegraph-llm | ||
| python -m pytest src/tests/integration/test_flows_integration.py -v |
There was a problem hiding this comment.
uv 管理依赖,CONTRIBUTING.md 主命令应统一为 uv run pytest
hugegraph-llm/AGENTS.md 与 .github/workflows/hugegraph-llm.yml:82 都使用 uv run pytest。本文件 line 23 的 python -m pytest 在 contributor 没正确激活 .venv、或 .venv 缺 --extra llm 时会跑到系统 Python,得到莫名其妙的 import error。
CONTRIBUTING.md 是 contributor 的入口,命令必须与 CI 保持一致。
# Run the end-to-end integration tests
cd hugegraph-llm
uv run pytest src/tests/integration/test_flows_integration.py -v另外,由于 conftest.py 默认 SKIP_EXTERNAL_SERVICES=true,contributor 直接跑命令时这套 test 可能被全部 skip。需要在文档里告知:
# 显式开启外部服务测试
SKIP_EXTERNAL_SERVICES=false uv run pytest src/tests/integration/test_flows_integration.py -v| python -m pytest src/tests/integration/test_flows_integration.py -v | ||
| ``` | ||
|
|
||
| All 6 tests must pass before you submit your code: |
There was a problem hiding this comment.
🧹 Minor:CONTRIBUTING.md 选址 / 范围 / 强制性
-
选址错位:标题 "Contributing to HugeGraph-AI" 但内容只覆盖
hugegraph-llm子模块。HugeGraph-AI 是 monorepo(含hugegraph-ml、hugegraph-python-client、vermeer-python-client)。建议:- 方案 A:把当前内容移动到
hugegraph-llm/TESTING.md(与hugegraph-llm/AGENTS.md同级),把根 CONTRIBUTING.md 作为通用入口; - 方案 B:保留根级 CONTRIBUTING.md,但把当前内容降级为 "### Module: hugegraph-llm" 章节,并补充 ASF 通用部分(DCO/sign-off、commit message 规范、ICLA 链接、Issue/PR 模板)。
- 方案 A:把当前内容移动到
-
强制性过度:line 26 "All 6 tests must pass before you submit your code" 强制要求 contributor 本地启动 HugeGraph + 配置 LLM API key 才能贡献。对纯文档 / UI / 非 flow PR 是过度要求。建议改为:
如果你的修改涉及 flow / 索引 / RAG 相关代码,请在本地通过下列 6 个测试。
-
额外缺失:当前文档没列出 LLM API key 与 embedding provider 等前置条件,但作者表态"测试需要 api-key"——这条信息应当出现在 Prerequisites。
-
数字硬编码:"All 6 tests must pass" / "
6 passed"(line 39)会随测试增删而过时,建议改成"全部通过"。
No description provided.