Skip to content

Fix multithreading reproducibility#98

Open
lrobion wants to merge 3 commits into
MIT-LAE:mainfrom
lrobion:fix-multithreading-reproducibility
Open

Fix multithreading reproducibility#98
lrobion wants to merge 3 commits into
MIT-LAE:mainfrom
lrobion:fix-multithreading-reproducibility

Conversation

@lrobion
Copy link
Copy Markdown
Contributor

@lrobion lrobion commented May 14, 2026

Closes #19. Multithreaded simulations are now bitwise reproducible.

Multithreaded simulations even using a fixed random seed were not reproducible. This was for three reasons:

  1. Multithreaded access to the global RNG state in updating the temperature perturbation. Even if the RNG state is consistent, the threads access it in a non-deterministic way making the temperature perturbation different even run.
  2. Plain race condition in a conversion factor
  3. Parallel reductions in Aerosol.cpp that due to floating point non-associativity lead to different results. This floating point noise is magnified because the shear applied to the contrail depends on finding the maximum index of xOD with depends on multiple non-deterministic reductions. Eventually the numerical noise is sufficient to shift the index of the max xOD which changes the shear which then leads to a different simulation grid.

This addresses all 3 issues but comes at a signficant performance drop (40% slower simulations on 4h of the ISSL 140 test) due to the non-parallel reductions. This performance drop can be reduced by a lot if we don't repeat these (now slower) reductions multiple times per timestep (see #97).

I think once we address the repeated calculations, we can decide if the performance drop is still significant enough that want to look into this more. We could look into refactoring the reductions to be safe to do in parallel / lowering the bar on reproducibility where results are guaranteed reproducible only if you use the same number of threads. Further work on the performance of this should probably happen after #92 is merged as it also deals with performance in Aerosol.cpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent Results When "OpenMP Num Threads" is Greater Than 1

1 participant