fix(dpmodel): align NativeLayer bias/idt init with pt MLPLayer#5428
Conversation
dpmodel's NativeLayer used Glorot scaling (1/sqrt(num_in+num_out)) for w, b, and idt, while pt's MLPLayer._default_normal_init uses Glorot only for w and instead uses N(0, 1) for bias and N(0.1, 0.001) for idt. The mismatch persists through training (Adam preserves per-parameter scale) and produces ~15-20% worse validation RMSE on multi-task DPA3 (verified on debug/260426_test_compile_multitask: e_alloy 1.51e-2 -> 1.79e-2, f_alloy 1.39e-1 -> 1.49e-1, e_elec 1.13e-3 -> 1.32e-3, f_elec 4.58e-2 -> 5.53e-2). Affects every dpmodel-based backend (pt_expt, jax, paddle).
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughGenerator scripts now write structured Changes
Sequence Diagram(s)sequenceDiagram
participant Gen as Generator script
participant FS as File system (.expected)
participant Cpp as C++ ExpectedRef
participant Py as Python reader
participant Test as Test runner
Gen->>FS: write .expected (sections & arrays)
Note right of FS: Structured plain-text sidecar
Test->>Cpp: load ExpectedRef(path) / request arrays
Cpp->>FS: open & parse .expected
Cpp-->>Test: return arrays via get<T>(case,key)
Test->>Test: compare model outputs to loaded references
alt Python test
Test->>Py: read_expected_ref(path)
Py->>FS: open & parse .expected
Py-->>Test: return arrays as NumPy arrays
Test->>Test: compare model outputs to loaded references
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Aligns dpmodel’s NativeLayer parameter initialization with the PyTorch backend’s tuned defaults to improve convergence and validation quality for newly initialized models across dpmodel-based backends.
Changes:
- Update
NativeLayerinitialization sowuses Glorot scaling, whilebuses an unscaled normal andidtuses a small-variance normal with a +0.1 mean shift (matchingdeepmd/pt’sMLPLayer._default_normal_init).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5428 +/- ##
==========================================
+ Coverage 82.39% 82.47% +0.07%
==========================================
Files 824 825 +1
Lines 87395 87720 +325
Branches 4199 4206 +7
==========================================
+ Hits 72009 72345 +336
+ Misses 14111 14092 -19
- Partials 1275 1283 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Each `gen_*.py` script writes a plain-text `.expected` file alongside the .pt2/.pth model files; C++ tests load the arrays at SetUp() via a new `ExpectedRef` helper instead of carrying ~3000 magic-number floats baked into the test sources. This removes the manual paste-from-stdout sync that broke whenever dpmodel numerics shifted (e.g., the bias/idt init alignment in ff6a931), and keeps reference values in lockstep with the gen scripts that produce them. - New `source/api_cc/tests/expected_ref.h` (header-only loader) - New `gen_common.write_expected_ref()` writer; `print_cpp_values` and `print_cpp_spin_values` retained as ad-hoc debug helpers - 7 gen scripts (dpa1/2/3, spin, fparam_aparam, model_devi) emit sidecars - 15 C++ tests load arrays from sidecars (no test cases removed) - `test_deeppot_ptexpt.cc` keeps its TF-derived hardcoded refs (its model weights flow through `gen_fparam_aparam.py`'s state_dict load path, not random init, so they're stable) - Bump float32 atol from 1e-5 to 4e-5 in `test_compressed_forward` for DPA1 / SeAttenV2: the new bias init scale widens the embedding-net output range, observed compression-tabulation error ~3.2e-5 - `.expected` files are gitignored — regenerated at CI time
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@source/api_cc/tests/test_deeppot_model_devi_ptexpt.cc`:
- Around line 454-458: After loading the sidecar via ref.load(...) validate that
the vectors expected_md_f and expected_md_v contain at least 3 elements before
any fixed-index access; specifically after
ref.get<VALUETYPE>("default","expected_md_f") and
ref.get<VALUETYPE>("default","expected_md_v") check sizes (e.g., .size() >= 3)
and if they are too small, fail the test or throw a clear error (with context
including natoms and the file name) to avoid undefined behavior when the code
later accesses indices 0..2.
In `@source/tests/infer/gen_model_devi.py`:
- Around line 144-147: The variables e0, e1, ae0, ae1, av0, av1 are declared but
unused after restricting persisted outputs; remove their assignments or rename
them to the conventional unused placeholder (e.g., _e0/_e1 or just _) in the
evaluation block around the DeepPot instantiation (see DeepPot(models[0]) and
the evaluation code that computes e0/e1/ae0/ae1/av0/av1), or explicitly delete
them after use (del e0, e1, ae0, ae1, av0, av1) so ruff no longer flags unused
variables and CI passes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d2308eca-2355-4bcc-ae1f-79493bb65ec8
📒 Files selected for processing (26)
.gitignoresource/api_cc/tests/expected_ref.hsource/api_cc/tests/test_deeppot_a_fparam_aparam_nframes_ptexpt.ccsource/api_cc/tests/test_deeppot_a_fparam_aparam_pt.ccsource/api_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.ccsource/api_cc/tests/test_deeppot_default_fparam_pt.ccsource/api_cc/tests/test_deeppot_default_fparam_ptexpt.ccsource/api_cc/tests/test_deeppot_dpa1_pt.ccsource/api_cc/tests/test_deeppot_dpa1_ptexpt.ccsource/api_cc/tests/test_deeppot_dpa2_pt.ccsource/api_cc/tests/test_deeppot_dpa2_ptexpt.ccsource/api_cc/tests/test_deeppot_dpa3_pt.ccsource/api_cc/tests/test_deeppot_dpa3_ptexpt.ccsource/api_cc/tests/test_deeppot_dpa_pt_spin.ccsource/api_cc/tests/test_deeppot_dpa_ptexpt.ccsource/api_cc/tests/test_deeppot_dpa_ptexpt_spin.ccsource/api_cc/tests/test_deeppot_model_devi_ptexpt.ccsource/tests/infer/gen_common.pysource/tests/infer/gen_dpa1.pysource/tests/infer/gen_dpa2.pysource/tests/infer/gen_dpa3.pysource/tests/infer/gen_fparam_aparam.pysource/tests/infer/gen_model_devi.pysource/tests/infer/gen_spin.pysource/tests/pt_expt/descriptor/test_dpa1.pysource/tests/pt_expt/descriptor/test_se_atten_v2.py
✅ Files skipped from review due to trivial changes (2)
- .gitignore
- source/api_cc/tests/expected_ref.h
…h sidecar reads CI revealed test_lammps_dpa3_pt2.py and test_lammps_dpa3_pt2_nopbc.py still carried the old DPA3 hardcoded expected_ae/f/v arrays (originally copied from gen_dpa3.py output) that diverged once 35d5cb2 made the gen scripts emit new init values. - New `source/lmp/tests/expected_ref.py` mirrors the C++ `expected_ref.h` parser so LAMMPS Python tests load the same on-disk reference data produced by the gen scripts at CI time. - Refactor the two DPA3 LAMMPS tests to read deeppot_dpa3.expected. - Also refactor test_lammps_faparam_pt2.py to use fparam_aparam_default.expected for consistency (this test wasn't failing — gen_fparam_aparam.py loads weights from a committed .pth so init changes don't affect it — but the sidecar pattern is cleaner). - LAMMPS spin and model_devi tests are not refactored: they use a different atom config than the gen scripts (4 atoms vs 6) and their hardcoded refs are tolerant enough to survive the init shift in CI.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@source/lmp/tests/expected_ref.py`:
- Around line 30-33: The parser currently assumes each header line splits into
two tokens and that there are n following payload lines; add explicit checks
before using these values: validate that line.split() yields exactly two tokens
(key and count), that count parses to a non-negative int (n), and that lines has
at least n remaining entries before constructing values; if any check fails
raise a ValueError with contextual information (include key, raw count/token,
current index i and expected n) so failures are actionable; perform these
token/bounds checks before the list comprehension that builds values and before
updating i.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dd3a2e82-8fd6-420e-8088-e3dcdf79903d
📒 Files selected for processing (4)
source/lmp/tests/expected_ref.pysource/lmp/tests/test_lammps_dpa3_pt2.pysource/lmp/tests/test_lammps_dpa3_pt2_nopbc.pysource/lmp/tests/test_lammps_faparam_pt2.py
…matrix The paddle-only CI matrix (ENABLE_PYTORCH=0, ENABLE_PADDLE=1) doesn't run gen_*.py from test_cc_local.sh (those are gated on PyTorch), so deeppot_dpa3.expected and fparam_aparam_default.expected don't exist when LAMMPS tests collect. Module-level read_expected_ref() then raises FileNotFoundError at pytest collection time — before setup_module() can pytest.skip() based on ENABLE_PYTORCH. Wrap the module-level reads in try/except FileNotFoundError so the file loads cleanly when present and the test gets to its own ENABLE_PYTORCH guard otherwise. Generating .pt2 / .expected in paddle-only CI is not useful — .pt2 is the PyTorch AOTInductor format and paddle's LAMMPS plugin can't consume them; paddle has its own LAMMPS tests that load paddle-format models.
- ASSERT_EQ size guards on expected_md_f / expected_md_v before fixed- index access in TestInferDeepPotModeDeviPtExptPrecomputed; turns a potential silent OOB into a clear gtest failure if the sidecar layout changes. - Sharper parser errors in source/lmp/tests/expected_ref.py: explicit header-token-count and payload-length checks with the file path in the message, instead of the bare ValueError / IndexError that pop out of line.split() / out-of-range indexing.
CodeQL flagged `expected_ae` as an unused global in the three sidecar-loading LAMMPS .pt2 tests; it was only used as the operand of np.sum() to compute expected_e. Inline the sum so there's no dangling global and the FileNotFoundError fallback no longer has to pin a stale name.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@source/lmp/tests/expected_ref.py`:
- Around line 36-42: The parser currently accepts negative counts (count -> n)
which can cause index rollback and infinite loops; before consuming values
ensure n = int(count) is non-negative (n >= 0) and raise a ValueError with a
clear message (mentioning key and path) if it's negative, then proceed to the
existing bounds check and value consumption (variables: count, n, i, lines, key,
path).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ad8b490e-0e65-4cf8-8bdc-3efa8e46bef3
📒 Files selected for processing (2)
source/api_cc/tests/test_deeppot_model_devi_ptexpt.ccsource/lmp/tests/expected_ref.py
CodeRabbit (review #4211393623) noted that a negative `count` in a sidecar array header (e.g. `expected_e -3`) would pass the existing length check (`i + n > len(lines)` is false for n<0), then `range(-3)` yields no values, and `i += -3` rewinds the read position — at best re-parsing earlier lines, at worst looping forever on a long file. Add an explicit `n < 0` guard with a clear message naming both the array and the file.
Summary
Two related changes addressing pt_expt's slow convergence vs pt baseline (~15-20% worse RMSE on multi-task DPA3) and the test-infrastructure brittleness it exposed.
ff6a9314— fix(dpmodel): align NativeLayer bias/idt init with pt MLPLayerdeepmd/dpmodel/utils/network.pyNativeLayer.__init__previously used Glorot scaling (1/sqrt(num_in+num_out)) for all ofw,b,idt.deepmd/pt'sMLPLayer._default_normal_inituses Glorot only forw, withb ~ N(0, 1)andidt ~ N(0.1, 0.001).Adam preserves per-parameter scale, so the init mismatch persists through training and produces ~15-20% worse validation RMSE on multi-task DPA3 (e.g.
e_alloy1.51e-2 → 1.79e-2;f_elec4.58e-2 → 5.53e-2 on a 500k-step run).Fix aligns dpmodel with pt's well-tuned defaults; affects every dpmodel-based backend (pt_expt, jax, paddle) for newly-initialised models. Existing serialized models are unaffected (weights are overwritten on
deserialize).35d5cb21— test: replace hardcoded C++ expected values with sidecar reference filesThe init change above broke ~10 C++ test files that hardcoded
expected_e/f/v = {…}initializer-list arrays manually pasted fromgen_*.pyscript stdout. This refactor removes the manual-paste workflow:source/api_cc/tests/expected_ref.hparses plain-text sidecar files.gen_common.write_expected_ref()writer; legacyprint_cpp_values/print_cpp_spin_valuesretained as ad-hoc debug helpers.<model>.expectedsidecars at run time (each contains[case]sections with named arrays).ExpectedRef::get<VALUETYPE>(case, key)inSetUp(). No test cases removed — every modified file preserves its originalTYPED_TEST/class TestInfer*count vs upstream master..expectedfiles are gitignored (regenerated each CI run alongside the.pt2/.pth).test_deeppot_ptexpt.ccis intentionally left with its TF-derived hardcoded refs — its.pt2weights flow throughgen_fparam_aparam.py'sstate_dict()load path (not random init), so they're stable across this PR and any future numerics-perturbing change.test_compressed_forward[float32]for DPA1 / SeAttenV2: bump atol from1e-5to4e-5with a comment explaining the cause — the new bias init scale widens the embedding-net output range, observed compression-tabulation error ~3.2e-5.Test plan
source/tests/common/dpmodel/test_network.py— 20 passedsource/tests/pt_expt/descriptor/— 492 passed, 32 skipped (incl. the previously-failing float32 compression tests now passing at 4e-5)source/tests/common/(skipping pre-existing TF/protobuf-brokentest_gui.py) — 275 passed, 1 skippedsource/tests/consistent/descriptor/(with C++ tabulate ops built) — 2210 passed, 3678 skippedsource/tests/consistent/{fitting,loss,model}/+test_type_embedding.py— 3654 passed, 4736 skippedgen_*.pyregenerated.pt2/.pth/.expectedKnown limitation
The init alignment also affects jax / paddle backends through dpmodel. Their cross-backend consistency tests use
serialize/deserializeround-trips and are unaffected, but newly-initialised jax / paddle models will pick up the new init too. If a downstream user relies on the old Glorot-bias behaviour, this is a breaking numerical change for new-from-scratch training (existing checkpoints unaffected).Summary by CodeRabbit
Refactor
Tests
Chores