feat: add integration test case for flow execution by weijinglin · Pull Request #315 · apache/hugegraph-ai

weijinglin · 2025-12-09T14:52:08Z

No description provided.

coderzc

Thanks for the contribution. I reviewed this PR and found two issues that should be addressed before merge:

[P1] New integration test is not executed in CI

File: .github/workflows/hugegraph-llm.yml (Run integration tests step)
test_flows_integration.py is added in this PR, but the workflow still runs only three explicit integration files. So the new tests are not part of PR checks and provide no regression protection.
Suggestion: include the new file in the pytest command, or switch to directory-level collection with markers.

[P2] Missing external-service skip guard in new integration tests

File: hugegraph-llm/src/tests/integration/test_flows_integration.py
The tests call real flows directly (BUILD_VECTOR_INDEX / GRAPH_EXTRACT / RAG_* / TEXT2GREMLIN) without should_skip_external()-style gating or mocks. Once enabled in CI/local integration runs, they can become flaky and environment-dependent.
Suggestion: add explicit skip conditions for external dependencies, or stabilize with mocks/stubs for non-deterministic external calls.

Please address these and I can re-review quickly.

weijinglin · 2026-03-20T11:07:03Z

Thanks for the contribution. I reviewed this PR and found two issues that should be addressed before merge:

[P1] New integration test is not executed in CI

File: .github/workflows/hugegraph-llm.yml (Run integration tests step)

test_flows_integration.py is added in this PR, but the workflow still runs only three explicit integration files. So the new tests are not part of PR checks and provide no regression protection.

Suggestion: include the new file in the pytest command, or switch to directory-level collection with markers.

[P2] Missing external-service skip guard in new integration tests

File: hugegraph-llm/src/tests/integration/test_flows_integration.py

The tests call real flows directly (BUILD_VECTOR_INDEX / GRAPH_EXTRACT / RAG_* / TEXT2GREMLIN) without should_skip_external()-style gating or mocks. Once enabled in CI/local integration runs, they can become flaky and environment-dependent.

Suggestion: add explicit skip conditions for external dependencies, or stabilize with mocks/stubs for non-deterministic external calls.

Please address these and I can re-review quickly.

Thank u for your suggestions.

For P1, I plan to add a end-to-end test for all the test cases in the app demo, which need api-key to launch the basic test. So I don't add the test scripts to the CI. I wanna add a contributed.md for this project and call for all the contributor to check the basic functionality of this project. What about your opinion?

For P2, I don't simulate the external api because of the complexity of the workflow (caused by the nondeterminism of llm), so this test scripts is not added to the CI procedure.

I'm sorry I didn't reply to you in a timely manner.

… into flow_test

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

imbajin

下面 8 条评论按优先级给出。其中 3 条阻塞项必须修：
(1) test_rag 实际只跑了 raw 模式、未真正覆盖 4 种 RAG 模式；
(2) update_ui_configs 在测试中会写回 config_prompt.yaml 污染 contributor 工作区；
(3) 缺 should_skip_external 守卫导致默认 pytest 命令必爆。

另：PR 标题建议从 feat: 改为 test:，372 行新增中仅 1 行是生产代码 refactor。

imbajin · 2026-05-19T13:43:17Z

+        near_neighbor_first = False
+        custom_related_information = ""
+
+        graph_search, gremlin_prompt, vector_search = update_ui_configs(


‼️ Critical：test_rag 实际只测了 raw 模式 + 测试会写回 config_prompt.yaml

问题 1（死代码 → 名义 4 模式实际只测 1 种）
update_ui_configs(...)（line 213-222）只调用一次，传入的 vector_only_answer / graph_only_answer / graph_vector_answer 都是 False，因此返回 vector_search=False, graph_search=False。
后续四段 schedule_flow 始终复用这同一组标记（见 line 227-228 / 251-252 / 275-276 / 299-300），而对局部 bool 的反复赋值（line 244-247 / 268-271 / 292-295）根本没传给 flow，是死代码。
=> 四次调用实际全部以"无检索"模式运行；RAG_VECTOR_ONLY / RAG_GRAPH_ONLY / RAG_GRAPH_VECTOR 这三条核心路径没被实际验证。CONTRIBUTING.md 中的 "All RAG modes" 描述不成立。

问题 2（副作用：测试会改 contributor 的 yaml）
update_ui_configs 在 prompt 内容变化时会触发 prompt.update_yaml_file()（见 rag_block.py:141-147）。Contributor 跑完该测试后，工作区里的 config_prompt.yaml 会被悄悄改写为测试中的 EN prompt 与 default_question="梁漱溟和梁济的关系是什么？"，下次 git status 会冒出无关 diff，极易误提交。

修法：参数化覆盖 4 种模式 + 直接构造 vector_search / graph_search，绕开 demo helper 副作用：

@pytest.mark.parametrize( ("flow_name", "raw", "vec_only", "graph_only", "graph_vec"), [ (FlowName.RAG_RAW, True, False, False, False), (FlowName.RAG_VECTOR_ONLY, False, True, False, False), (FlowName.RAG_GRAPH_ONLY, False, False, True, False), (FlowName.RAG_GRAPH_VECTOR, False, False, False, True), ], ) def test_rag(self, flow_name, raw, vec_only, graph_only, graph_vec): query = "梁漱溟和梁济的关系是什么？" graph_search = graph_only or graph_vec vector_search = vec_only or graph_vec res = self.scheduler.schedule_flow( flow_name, query=query, vector_search=vector_search, graph_search=graph_search, raw_answer=raw, vector_only_answer=vec_only, graph_only_answer=graph_only, graph_vector_answer=graph_vec, graph_ratio=0.6, rerank_method="bleu", near_neighbor_first=False, custom_related_information="", answer_prompt=PromptConfig.answer_prompt_EN, keywords_extract_prompt=PromptConfig.keywords_extract_prompt_EN, gremlin_tmpl_num=-1, gremlin_prompt=PromptConfig.gremlin_generate_prompt_EN, ) assert isinstance(res, dict) and res.get("answer"), f"{flow_name} returned empty answer"

imbajin · 2026-05-19T13:43:17Z

+    """Flow集成测试 - 验证各个Flow能正常执行不抛异常"""
+
+    @pytest.fixture(autouse=True)
+    def setup(self):


‼️ Critical：缺 should_skip_external() 守卫，本地默认命令必爆

hugegraph-llm/src/tests/conftest.py:45 强制设置 SKIP_EXTERNAL_SERVICES=true，所有现存 integration test（如 test_kg_construction.py:126、test_rag_pipeline.py:113）都会调用 should_skip_external() 在此前提下自动 skip。

新文件没有这个守卫 → contributor 跑 pytest src/tests/ 或 pytest src/tests/integration/ 时，其它 integration test 全部跳过，只有这套新 test 不跳过且必爆（因为它会真实调 LLM/HugeGraph 但默认 SKIP=true 时基础设施不一定齐全），对 contributor 来说是最糟糕的体验。

@coderzc 已在 P2 提出过一次，作者回复"故意不进 CI"——但进不进 CI 与本地默认 pytest 行为是两回事。至少二选一：

方案 A：在 setup 中加守卫，与现有 integration test 一致：

from src.tests.test_utils import should_skip_external @pytest.fixture(autouse=True) def setup(self): if should_skip_external(): pytest.skip("Skipping tests that require external services") self.index_text = "..." self.scheduler = SchedulerSingleton.get_instance()

方案 B：用自定义 mark + CONTRIBUTING.md 引导：

# pytest.ini 或 pyproject.toml [tool.pytest.ini_options] markers = ["local_e2e: tests requiring HugeGraph + LLM API key (skipped by default)"] # 测试文件 @pytest.mark.local_e2e class TestFlowsIntegration: ...

并在 CONTRIBUTING.md 写明 pytest -m local_e2e 才会跑这套测试。

imbajin · 2026-05-19T13:43:17Z

+            self.scheduler.schedule_flow(FlowName.UPDATE_VID_EMBEDDINGS)
+            log.info("✓ UPDATE_VID_EMBEDDING flow executed successfully")
+        except Exception as e:
+            pytest.fail(f"BUILD_VECTOR_INDEX flow failed: {e}")


‼️ Critical：复制粘贴错误信息 + try/except + pytest.fail 反模式

本行（line 131）以及 line 189、line 199 三处 pytest.fail(...) 全部写成 "BUILD_VECTOR_INDEX flow failed: {e}"，但实际抛出的可能是 BUILD_SCHEMA / PROMPT_GENERATE / IMPORT_GRAPH_DATA 等任意一个 flow。对于一个面向 contributor 本地自查的 smoke test，错误信息直接误导新人去排查错误的 flow。

更深层的问题是 try/except + pytest.fail(str(e)) 本身就是反模式：

pytest 默认遇到异常就会 fail 并展示完整 traceback；这层包装反而吞掉异常类型与 cause chain；

多余的 try 块降低了可读性。

最简修法：直接删除 try/except 让异常自然冒泡。例如本测试（test_build_knowledge_graph）：

def test_build_knowledge_graph(self): res = self.scheduler.schedule_flow(FlowName.BUILD_VECTOR_INDEX, [self.index_text]) assert "chunks" in res log.info("BUILD_VECTOR_INDEX flow executed successfully") # ...后续步骤同样直接调用，不要包 try/except

如果非要保留上下文，至少把 flow 名称变量化：pytest.fail(f"{current_flow} failed: {e}")。

imbajin · 2026-05-19T13:43:17Z

+        """
+        self.scheduler = SchedulerSingleton.get_instance()
+
+    def test_build_knowledge_graph(self):


⚠️ Important：单个测试串行 4 个 flow，配合错误信息 bug 让定位极困难

test_build_knowledge_graph 串行执行 BUILD_VECTOR_INDEX → GRAPH_EXTRACT → IMPORT_GRAPH_DATA → UPDATE_VID_EMBEDDINGS 四个 flow（line 41-129）。若 IMPORT_GRAPH_DATA 挂掉，配合上面那条复制粘贴 bug，contributor 看到的错误是 "BUILD_VECTOR_INDEX flow failed"，会被严重误导。

另外 CONTRIBUTING.md 表格把它标为 1 项测试，与实际行为不符。

修法：拆成 4 个独立测试，用 fixture 共享上一步的产物；或至少改用 pytest-subtests：

def test_build_kg__vector_index(self): res = self.scheduler.schedule_flow(FlowName.BUILD_VECTOR_INDEX, [self.index_text]) assert "chunks" in res and len(res["chunks"]) > 0 def test_build_kg__graph_extract(self): data = self.scheduler.schedule_flow(FlowName.GRAPH_EXTRACT, ...) assert data["vertices"] and data["edges"] # 以此类推

好处：失败定位精确到 flow；CONTRIBUTING.md 表格与代码对齐；contributor 跑挂时不需要从头再来。

imbajin · 2026-05-19T13:43:17Z

+            }
+            """
+
+            self.scheduler.schedule_flow(FlowName.BUILD_SCHEMA, [self.index_text], query_examples, few_shot)


⚠️ Important：断言过弱 / test_schema_generator 无任何断言

本行 self.scheduler.schedule_flow(FlowName.BUILD_SCHEMA, ...) 没有任何断言，等价于"只要不抛异常就算通过"。schema 生成器返回空 list、错误结构、或 LLM 回了一句 "抱歉，我无法生成" 都会被判通过。

同样的弱断言遍布全文件：

line 197 / 242 / 266 / 290 / 314 / 329：仅 assert res is not None

line 125：assert res is not None（IMPORT_GRAPH_DATA）

对于面向 contributor 的本地自查，"通过"应该真正校验返回结构。

最低限度修法：

res = self.scheduler.schedule_flow(FlowName.BUILD_SCHEMA, ...) assert isinstance(res, dict) assert res.get("schema", {}).get("vertexlabels"), "BUILD_SCHEMA returned empty vertexlabels" assert res["schema"].get("edgelabels"), "BUILD_SCHEMA returned empty edgelabels"

理想做法：把期望产物存到 src/tests/data/flows/expected_schema.json（仓库已有 src/tests/data/ 目录），用结构 diff 校验。

imbajin · 2026-05-19T13:43:17Z

+        self.index_text = """
+        梁漱溟年轻时，一日，他与父亲梁济讨论当时一战欧洲的时局，梁济突然问道：“这个世界会好吗？”梁漱溟答：“我相信世界是一天一天往好里去的。”梁济叹道：“能好就好啊！”然后离家，三日后，梁济投湖自尽。晚年梁漱溟回忆自己的一生和跌宕起伏的近代社会，总结了一本书，书名就叫《这个世界会好吗？》。梁漱溟的回答与年轻时一致。但很多人特别是遗老遗少们总在回忆往日的时光，仿佛那是人类的黄金时代。如同鲁迅笔下的九斤老太，整日里念叨着“一代不如一代”。或者极端如梁济，对世界未来充满悲观，一死了之。在今天的时代，很多人认为“世界正变得越来越糟”，这其中不乏知名的知识分子。平克将这种情况称之为「进步恐惧症」，并总结为「认知偏差」。因为每天的新闻报道里总是充斥着战争、恐怖主义、犯罪、污染等坏消息，不是因为这些事情是主流，而是因为它们是热点，导致给人们的印象是世界越来越糟。所谓“好事不出门，坏事传千里”，而在互联网时代，发达的信息传播让坏事传播的更快更广。要纠正这种「可得性偏差」的方法是用数据说话。数字是最能反应趋势，看战争的比例、犯罪死亡人数在总人数的占比，就能看出犯罪是增加了，还是减少了。实际上，从各种数字显示，人类暴力事件在历史呈明显的下降趋势，这在平克之前发表的另一大部头著作《人性中的善良天使：暴力为什么会减少》中详细阐述过。世界变得更好了，说到底就是进步。
+        """
+        self.scheduler = SchedulerSingleton.get_instance()


⚠️ Important：SchedulerSingleton 是进程级单例，跨测试共享 pipeline_pool

SchedulerSingleton.get_instance()（见 flows/scheduler.py:181-191）是进程级单例，其 pipeline_pool 内每个 GPipelineManager 都会缓存 pipeline。这意味着：

顺序耦合：test_build_knowledge_graph 写入的图数据 / 向量索引会被后续 test_rag 看到。如果用户用 pytest -k test_rag 单独跑，test_rag 是否通过取决于此前是否跑过 test_build_knowledge_graph，行为不可复现。

脏 pipeline：某测试中途异常未走到 manager.release/add 时，下一个测试可能拿到错误状态的 pipeline。

对 contributor 在反复修复-重试的场景下尤其糟糕：上一次失败留下的脏状态会让下一次的失败原因看起来与代码无关。

修法：teardown 中重置；或为每个测试用唯一的 graph_name（避免共享数据）。例如：

@pytest.fixture(autouse=True) def setup(self): if should_skip_external(): pytest.skip("...") self.index_text = "..." huge_settings.graph_name = f"test_{uuid.uuid4().hex[:8]}" self.scheduler = SchedulerSingleton.get_instance() yield # teardown：清空该 graph，或释放 pipeline

至少在文件 docstring 写明"测试间存在隐式数据依赖"，让维护者心里有数。

imbajin · 2026-05-19T13:43:17Z

+
+# Run the end-to-end integration tests
+cd hugegraph-llm
+python -m pytest src/tests/integration/test_flows_integration.py -v


⚠️ Important：项目用 uv 管理依赖，CONTRIBUTING.md 主命令应统一为 uv run pytest

hugegraph-llm/AGENTS.md 与 .github/workflows/hugegraph-llm.yml:82 都使用 uv run pytest。本文件 line 23 的 python -m pytest 在 contributor 没正确激活 .venv、或 .venv 缺 --extra llm 时会跑到系统 Python，得到莫名其妙的 import error。

CONTRIBUTING.md 是 contributor 的入口，命令必须与 CI 保持一致。

# Run the end-to-end integration tests cd hugegraph-llm uv run pytest src/tests/integration/test_flows_integration.py -v

另外，由于 conftest.py 默认 SKIP_EXTERNAL_SERVICES=true，contributor 直接跑命令时这套 test 可能被全部 skip。需要在文档里告知：

# 显式开启外部服务测试 SKIP_EXTERNAL_SERVICES=false uv run pytest src/tests/integration/test_flows_integration.py -v

imbajin · 2026-05-19T13:43:17Z

+python -m pytest src/tests/integration/test_flows_integration.py -v
+```
+
+All 6 tests must pass before you submit your code:


🧹 Minor：CONTRIBUTING.md 选址 / 范围 / 强制性

选址错位：标题 "Contributing to HugeGraph-AI" 但内容只覆盖 hugegraph-llm 子模块。HugeGraph-AI 是 monorepo（含 hugegraph-ml、hugegraph-python-client、vermeer-python-client）。建议：

方案 A：把当前内容移动到 hugegraph-llm/TESTING.md（与 hugegraph-llm/AGENTS.md 同级），把根 CONTRIBUTING.md 作为通用入口；

方案 B：保留根级 CONTRIBUTING.md，但把当前内容降级为 "### Module: hugegraph-llm" 章节，并补充 ASF 通用部分（DCO/sign-off、commit message 规范、ICLA 链接、Issue/PR 模板）。

强制性过度：line 26 "All 6 tests must pass before you submit your code" 强制要求 contributor 本地启动 HugeGraph + 配置 LLM API key 才能贡献。对纯文档 / UI / 非 flow PR 是过度要求。建议改为：

如果你的修改涉及 flow / 索引 / RAG 相关代码，请在本地通过下列 6 个测试。

额外缺失：当前文档没列出 LLM API key 与 embedding provider 等前置条件，但作者表态"测试需要 api-key"——这条信息应当出现在 Prerequisites。

数字硬编码："All 6 tests must pass" / "6 passed"（line 39）会随测试增删而过时，建议改成"全部通过"。

test: add test case for flow execution

caf0123

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 9, 2025

github-actions Bot added the llm label Dec 9, 2025

dosubot Bot added enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Dec 9, 2025

add license

89e83f3

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Dec 18, 2025

fixed the test script

5bdb06a

coderzc reviewed Mar 6, 2026

View reviewed changes

add a contributed.md

680059b

imbajin requested a review from Copilot May 19, 2026 11:44

Copilot started reviewing on behalf of imbajin May 19, 2026 11:44 View session

Merge branch 'main' of https://github.com/apache/incubator-hugegraph-ai…

10e8d59

… into flow_test

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels May 19, 2026

Copilot AI reviewed May 19, 2026

View reviewed changes

imbajin reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add integration test case for flow execution#315

feat: add integration test case for flow execution#315
weijinglin wants to merge 5 commits into
apache:mainfrom
hugegraph:flow_test

weijinglin commented Dec 9, 2025

Uh oh!

coderzc left a comment

Uh oh!

weijinglin commented Mar 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

imbajin left a comment •

edited

Loading

Uh oh!

imbajin May 19, 2026

Uh oh!

imbajin May 19, 2026

Uh oh!

imbajin May 19, 2026

Uh oh!

imbajin May 19, 2026

Uh oh!

imbajin May 19, 2026

Uh oh!

imbajin May 19, 2026

Uh oh!

imbajin May 19, 2026

Uh oh!

imbajin May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

weijinglin commented Dec 9, 2025

Uh oh!

coderzc left a comment

Choose a reason for hiding this comment

Uh oh!

weijinglin commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

imbajin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

weijinglin commented Mar 20, 2026 •

edited

Loading

imbajin left a comment •

edited

Loading