Skip to content

fix(dpmodel): align NativeLayer bias/idt init with pt MLPLayer#5428

Merged
wanghan-iapcm merged 7 commits intodeepmodeling:masterfrom
wanghan-iapcm:fix-pt-expt-native-layer-init
May 3, 2026
Merged

fix(dpmodel): align NativeLayer bias/idt init with pt MLPLayer#5428
wanghan-iapcm merged 7 commits intodeepmodeling:masterfrom
wanghan-iapcm:fix-pt-expt-native-layer-init

Conversation

@wanghan-iapcm
Copy link
Copy Markdown
Collaborator

@wanghan-iapcm wanghan-iapcm commented Apr 29, 2026

Summary

Two related changes addressing pt_expt's slow convergence vs pt baseline (~15-20% worse RMSE on multi-task DPA3) and the test-infrastructure brittleness it exposed.

ff6a9314 — fix(dpmodel): align NativeLayer bias/idt init with pt MLPLayer

deepmd/dpmodel/utils/network.py NativeLayer.__init__ previously used Glorot scaling (1/sqrt(num_in+num_out)) for all of w, b, idt. deepmd/pt's MLPLayer._default_normal_init uses Glorot only for w, with b ~ N(0, 1) and idt ~ N(0.1, 0.001).

Adam preserves per-parameter scale, so the init mismatch persists through training and produces ~15-20% worse validation RMSE on multi-task DPA3 (e.g. e_alloy 1.51e-2 → 1.79e-2; f_elec 4.58e-2 → 5.53e-2 on a 500k-step run).

Fix aligns dpmodel with pt's well-tuned defaults; affects every dpmodel-based backend (pt_expt, jax, paddle) for newly-initialised models. Existing serialized models are unaffected (weights are overwritten on deserialize).

35d5cb21 — test: replace hardcoded C++ expected values with sidecar reference files

The init change above broke ~10 C++ test files that hardcoded expected_e/f/v = {…} initializer-list arrays manually pasted from gen_*.py script stdout. This refactor removes the manual-paste workflow:

  • New header-only source/api_cc/tests/expected_ref.h parses plain-text sidecar files.
  • New gen_common.write_expected_ref() writer; legacy print_cpp_values / print_cpp_spin_values retained as ad-hoc debug helpers.
  • 7 gen scripts (dpa1/2/3, spin, fparam_aparam, model_devi) now emit <model>.expected sidecars at run time (each contains [case] sections with named arrays).
  • 15 C++ tests load arrays via ExpectedRef::get<VALUETYPE>(case, key) in SetUp(). No test cases removed — every modified file preserves its original TYPED_TEST / class TestInfer* count vs upstream master.
  • .expected files are gitignored (regenerated each CI run alongside the .pt2/.pth).
  • test_deeppot_ptexpt.cc is intentionally left with its TF-derived hardcoded refs — its .pt2 weights flow through gen_fparam_aparam.py's state_dict() load path (not random init), so they're stable across this PR and any future numerics-perturbing change.
  • test_compressed_forward[float32] for DPA1 / SeAttenV2: bump atol from 1e-5 to 4e-5 with a comment explaining the cause — the new bias init scale widens the embedding-net output range, observed compression-tabulation error ~3.2e-5.

Test plan

  • source/tests/common/dpmodel/test_network.py — 20 passed
  • source/tests/pt_expt/descriptor/ — 492 passed, 32 skipped (incl. the previously-failing float32 compression tests now passing at 4e-5)
  • source/tests/common/ (skipping pre-existing TF/protobuf-broken test_gui.py) — 275 passed, 1 skipped
  • source/tests/consistent/descriptor/ (with C++ tabulate ops built) — 2210 passed, 3678 skipped
  • source/tests/consistent/{fitting,loss,model}/ + test_type_embedding.py — 3654 passed, 4736 skipped
  • C++ refactored tests (DPA1/2/3 pt+ptexpt, dpa_pt_spin, dpa_ptexpt_spin, fparam variants, default_fparam variants, model_devi Precomputed, spin_model_devi PtExpt) — 106 passed, 0 failed locally with gen_*.py regenerated .pt2/.pth/.expected
  • Empirical: re-run the 500k-step multi-task DPA3 training under pt_expt with this fix and confirm convergence matches pt baseline (out of scope for this PR — verified diagnostically only)

Known limitation

The init alignment also affects jax / paddle backends through dpmodel. Their cross-backend consistency tests use serialize/deserialize round-trips and are unaffected, but newly-initialised jax / paddle models will pick up the new init too. If a downstream user relies on the old Glorot-bias behaviour, this is a breaking numerical change for new-from-scratch training (existing checkpoints unaffected).

Summary by CodeRabbit

  • Refactor

    • Adjusted neural network parameter initializations that may subtly change initial numeric/residual behavior.
  • Tests

    • Replaced embedded large expected arrays with external sidecar reference files and centralized model-path constants across many tests.
    • Simplified test setup patterns and raised one float32 comparison tolerance to avoid spurious failures.
  • Chores

    • Added support for standardized expected-value sidecar read/write and a .expected ignore rule.

dpmodel's NativeLayer used Glorot scaling (1/sqrt(num_in+num_out)) for
w, b, and idt, while pt's MLPLayer._default_normal_init uses Glorot only
for w and instead uses N(0, 1) for bias and N(0.1, 0.001) for idt. The
mismatch persists through training (Adam preserves per-parameter scale)
and produces ~15-20% worse validation RMSE on multi-task DPA3 (verified
on debug/260426_test_compile_multitask: e_alloy 1.51e-2 -> 1.79e-2,
f_alloy 1.39e-1 -> 1.49e-1, e_elec 1.13e-3 -> 1.32e-3, f_elec 4.58e-2
-> 5.53e-2). Affects every dpmodel-based backend (pt_expt, jax, paddle).
@dosubot dosubot Bot added the bug label Apr 29, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Generator scripts now write structured .expected sidecar files; C++ and Python tests load those at runtime via new parsers. Added C++ deepmd_test::ExpectedRef and Python read_expected_ref; many tests updated to consume sidecars. Minor initializer changes in NativeLayer and a .gitignore rule for *.expected.

Changes

Cohort / File(s) Summary
Network init
deepmd/dpmodel/utils/network.py
Adjusted NativeLayer initializers: self.w remains Glorot-scaled; self.b now uses unscaled Gaussian (std=1) when bias=True; self.idt initialized from Gaussian(mean=0.1, std=0.001) when use_timestep.
Sidecar writer + generators
source/tests/infer/gen_common.py, source/tests/infer/gen_*.py
Added write_expected_ref and updated all generator scripts to write structured .expected files instead of printing C++ literal arrays; docstrings updated to prefer sidecar generation.
C++ expected-ref helper
source/api_cc/tests/expected_ref.h
Added deepmd_test::ExpectedRef class with load(path) and template get<T>(case,key) to parse and expose .expected contents to C++ tests.
C++ test fixtures (many)
source/api_cc/tests/...
test_deeppot_*.cc, test_deeppot_dpa*.cc, test_deeppot_*_ptexpt.cc, test_deeppot_dpa_pt_spin.cc, test_deeppot_model_devi_ptexpt.cc, etc.
Replaced embedded numeric arrays with runtime-loading via ExpectedRef from .expected files; unified model-path constants; simplified zero-init patterns (resize+fillassign).
Python expected-ref helper
source/lmp/tests/expected_ref.py
Added read_expected_ref(path) returning dicts of NumPy 1-D arrays parsed from .expected sidecars.
LAMMPS & Python tests
source/lmp/tests/test_lammps_*.py, source/tests/pt_expt/descriptor/test_dpa1.py, source/tests/pt_expt/descriptor/test_se_atten_v2.py
Tests now load expected arrays from .expected files; two descriptor tests increased float32 tolerance from 1e-5 to 4e-5. File-not-found handling added in some tests.
Git ignore
.gitignore
Added rule to ignore *.expected files.

Sequence Diagram(s)

sequenceDiagram
    participant Gen as Generator script
    participant FS as File system (.expected)
    participant Cpp as C++ ExpectedRef
    participant Py as Python reader
    participant Test as Test runner

    Gen->>FS: write .expected (sections & arrays)
    Note right of FS: Structured plain-text sidecar
    Test->>Cpp: load ExpectedRef(path) / request arrays
    Cpp->>FS: open & parse .expected
    Cpp-->>Test: return arrays via get<T>(case,key)
    Test->>Test: compare model outputs to loaded references
    alt Python test
        Test->>Py: read_expected_ref(path)
        Py->>FS: open & parse .expected
        Py-->>Test: return arrays as NumPy arrays
        Test->>Test: compare model outputs to loaded references
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • anyangml
  • njzjz
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: aligning NativeLayer bias and identity initialization with PyTorch MLPLayer, which is the main technical change in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns dpmodel’s NativeLayer parameter initialization with the PyTorch backend’s tuned defaults to improve convergence and validation quality for newly initialized models across dpmodel-based backends.

Changes:

  • Update NativeLayer initialization so w uses Glorot scaling, while b uses an unscaled normal and idt uses a small-variance normal with a +0.1 mean shift (matching deepmd/pt’s MLPLayer._default_normal_init).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread deepmd/dpmodel/utils/network.py
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 92.52336% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.47%. Comparing base (8841b5e) to head (d3704b9).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
source/api_cc/tests/expected_ref.h 48.38% 8 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5428      +/-   ##
==========================================
+ Coverage   82.39%   82.47%   +0.07%     
==========================================
  Files         824      825       +1     
  Lines       87395    87720     +325     
  Branches     4199     4206       +7     
==========================================
+ Hits        72009    72345     +336     
+ Misses      14111    14092      -19     
- Partials     1275     1283       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Each `gen_*.py` script writes a plain-text `.expected` file alongside the
.pt2/.pth model files; C++ tests load the arrays at SetUp() via a new
`ExpectedRef` helper instead of carrying ~3000 magic-number floats baked
into the test sources. This removes the manual paste-from-stdout sync
that broke whenever dpmodel numerics shifted (e.g., the bias/idt init
alignment in ff6a931), and keeps reference values in lockstep with the
gen scripts that produce them.

- New `source/api_cc/tests/expected_ref.h` (header-only loader)
- New `gen_common.write_expected_ref()` writer; `print_cpp_values` and
  `print_cpp_spin_values` retained as ad-hoc debug helpers
- 7 gen scripts (dpa1/2/3, spin, fparam_aparam, model_devi) emit sidecars
- 15 C++ tests load arrays from sidecars (no test cases removed)
- `test_deeppot_ptexpt.cc` keeps its TF-derived hardcoded refs (its model
  weights flow through `gen_fparam_aparam.py`'s state_dict load path,
  not random init, so they're stable)
- Bump float32 atol from 1e-5 to 4e-5 in `test_compressed_forward` for
  DPA1 / SeAttenV2: the new bias init scale widens the embedding-net
  output range, observed compression-tabulation error ~3.2e-5
- `.expected` files are gitignored — regenerated at CI time
@github-actions github-actions Bot added the C++ label Apr 30, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/api_cc/tests/test_deeppot_model_devi_ptexpt.cc`:
- Around line 454-458: After loading the sidecar via ref.load(...) validate that
the vectors expected_md_f and expected_md_v contain at least 3 elements before
any fixed-index access; specifically after
ref.get<VALUETYPE>("default","expected_md_f") and
ref.get<VALUETYPE>("default","expected_md_v") check sizes (e.g., .size() >= 3)
and if they are too small, fail the test or throw a clear error (with context
including natoms and the file name) to avoid undefined behavior when the code
later accesses indices 0..2.

In `@source/tests/infer/gen_model_devi.py`:
- Around line 144-147: The variables e0, e1, ae0, ae1, av0, av1 are declared but
unused after restricting persisted outputs; remove their assignments or rename
them to the conventional unused placeholder (e.g., _e0/_e1 or just _) in the
evaluation block around the DeepPot instantiation (see DeepPot(models[0]) and
the evaluation code that computes e0/e1/ae0/ae1/av0/av1), or explicitly delete
them after use (del e0, e1, ae0, ae1, av0, av1) so ruff no longer flags unused
variables and CI passes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d2308eca-2355-4bcc-ae1f-79493bb65ec8

📥 Commits

Reviewing files that changed from the base of the PR and between ff6a931 and 35d5cb2.

📒 Files selected for processing (26)
  • .gitignore
  • source/api_cc/tests/expected_ref.h
  • source/api_cc/tests/test_deeppot_a_fparam_aparam_nframes_ptexpt.cc
  • source/api_cc/tests/test_deeppot_a_fparam_aparam_pt.cc
  • source/api_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.cc
  • source/api_cc/tests/test_deeppot_default_fparam_pt.cc
  • source/api_cc/tests/test_deeppot_default_fparam_ptexpt.cc
  • source/api_cc/tests/test_deeppot_dpa1_pt.cc
  • source/api_cc/tests/test_deeppot_dpa1_ptexpt.cc
  • source/api_cc/tests/test_deeppot_dpa2_pt.cc
  • source/api_cc/tests/test_deeppot_dpa2_ptexpt.cc
  • source/api_cc/tests/test_deeppot_dpa3_pt.cc
  • source/api_cc/tests/test_deeppot_dpa3_ptexpt.cc
  • source/api_cc/tests/test_deeppot_dpa_pt_spin.cc
  • source/api_cc/tests/test_deeppot_dpa_ptexpt.cc
  • source/api_cc/tests/test_deeppot_dpa_ptexpt_spin.cc
  • source/api_cc/tests/test_deeppot_model_devi_ptexpt.cc
  • source/tests/infer/gen_common.py
  • source/tests/infer/gen_dpa1.py
  • source/tests/infer/gen_dpa2.py
  • source/tests/infer/gen_dpa3.py
  • source/tests/infer/gen_fparam_aparam.py
  • source/tests/infer/gen_model_devi.py
  • source/tests/infer/gen_spin.py
  • source/tests/pt_expt/descriptor/test_dpa1.py
  • source/tests/pt_expt/descriptor/test_se_atten_v2.py
✅ Files skipped from review due to trivial changes (2)
  • .gitignore
  • source/api_cc/tests/expected_ref.h

Comment thread source/api_cc/tests/test_deeppot_model_devi_ptexpt.cc
Comment thread source/tests/infer/gen_model_devi.py
…h sidecar reads

CI revealed test_lammps_dpa3_pt2.py and test_lammps_dpa3_pt2_nopbc.py
still carried the old DPA3 hardcoded expected_ae/f/v arrays (originally
copied from gen_dpa3.py output) that diverged once 35d5cb2 made the
gen scripts emit new init values.

- New `source/lmp/tests/expected_ref.py` mirrors the C++
  `expected_ref.h` parser so LAMMPS Python tests load the same on-disk
  reference data produced by the gen scripts at CI time.
- Refactor the two DPA3 LAMMPS tests to read deeppot_dpa3.expected.
- Also refactor test_lammps_faparam_pt2.py to use
  fparam_aparam_default.expected for consistency (this test wasn't
  failing — gen_fparam_aparam.py loads weights from a committed .pth so
  init changes don't affect it — but the sidecar pattern is cleaner).
- LAMMPS spin and model_devi tests are not refactored: they use a
  different atom config than the gen scripts (4 atoms vs 6) and their
  hardcoded refs are tolerant enough to survive the init shift in CI.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/lmp/tests/expected_ref.py`:
- Around line 30-33: The parser currently assumes each header line splits into
two tokens and that there are n following payload lines; add explicit checks
before using these values: validate that line.split() yields exactly two tokens
(key and count), that count parses to a non-negative int (n), and that lines has
at least n remaining entries before constructing values; if any check fails
raise a ValueError with contextual information (include key, raw count/token,
current index i and expected n) so failures are actionable; perform these
token/bounds checks before the list comprehension that builds values and before
updating i.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dd3a2e82-8fd6-420e-8088-e3dcdf79903d

📥 Commits

Reviewing files that changed from the base of the PR and between 35d5cb2 and ba1249a.

📒 Files selected for processing (4)
  • source/lmp/tests/expected_ref.py
  • source/lmp/tests/test_lammps_dpa3_pt2.py
  • source/lmp/tests/test_lammps_dpa3_pt2_nopbc.py
  • source/lmp/tests/test_lammps_faparam_pt2.py

Comment thread source/lmp/tests/expected_ref.py Outdated
…matrix

The paddle-only CI matrix (ENABLE_PYTORCH=0, ENABLE_PADDLE=1) doesn't
run gen_*.py from test_cc_local.sh (those are gated on PyTorch), so
deeppot_dpa3.expected and fparam_aparam_default.expected don't exist
when LAMMPS tests collect. Module-level read_expected_ref() then raises
FileNotFoundError at pytest collection time — before setup_module() can
pytest.skip() based on ENABLE_PYTORCH.

Wrap the module-level reads in try/except FileNotFoundError so the file
loads cleanly when present and the test gets to its own ENABLE_PYTORCH
guard otherwise. Generating .pt2 / .expected in paddle-only CI is not
useful — .pt2 is the PyTorch AOTInductor format and paddle's LAMMPS
plugin can't consume them; paddle has its own LAMMPS tests that load
paddle-format models.
Comment thread source/lmp/tests/test_lammps_dpa3_pt2.py Fixed
Comment thread source/lmp/tests/test_lammps_dpa3_pt2_nopbc.py Fixed
Comment thread source/lmp/tests/test_lammps_faparam_pt2.py Fixed
Han Wang added 2 commits May 1, 2026 21:39
- ASSERT_EQ size guards on expected_md_f / expected_md_v before fixed-
  index access in TestInferDeepPotModeDeviPtExptPrecomputed; turns a
  potential silent OOB into a clear gtest failure if the sidecar layout
  changes.
- Sharper parser errors in source/lmp/tests/expected_ref.py: explicit
  header-token-count and payload-length checks with the file path in the
  message, instead of the bare ValueError / IndexError that pop out of
  line.split() / out-of-range indexing.
CodeQL flagged `expected_ae` as an unused global in the three
sidecar-loading LAMMPS .pt2 tests; it was only used as the operand of
np.sum() to compute expected_e. Inline the sum so there's no dangling
global and the FileNotFoundError fallback no longer has to pin a stale
name.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/lmp/tests/expected_ref.py`:
- Around line 36-42: The parser currently accepts negative counts (count -> n)
which can cause index rollback and infinite loops; before consuming values
ensure n = int(count) is non-negative (n >= 0) and raise a ValueError with a
clear message (mentioning key and path) if it's negative, then proceed to the
existing bounds check and value consumption (variables: count, n, i, lines, key,
path).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ad8b490e-0e65-4cf8-8bdc-3efa8e46bef3

📥 Commits

Reviewing files that changed from the base of the PR and between 27d284b and 00c45a6.

📒 Files selected for processing (2)
  • source/api_cc/tests/test_deeppot_model_devi_ptexpt.cc
  • source/lmp/tests/expected_ref.py

Comment thread source/lmp/tests/expected_ref.py
CodeRabbit (review #4211393623) noted that a negative `count` in a
sidecar array header (e.g. `expected_e -3`) would pass the existing
length check (`i + n > len(lines)` is false for n<0), then `range(-3)`
yields no values, and `i += -3` rewinds the read position — at best
re-parsing earlier lines, at worst looping forever on a long file.
Add an explicit `n < 0` guard with a clear message naming both the
array and the file.
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue May 3, 2026
Merged via the queue into deepmodeling:master with commit 0a481de May 3, 2026
70 checks passed
@wanghan-iapcm wanghan-iapcm deleted the fix-pt-expt-native-layer-init branch May 3, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants