Skip to content

Correctly account for isolated samples in normalisation of pair_coalescence_* methods#3460

Open
nspope wants to merge 1 commit into
tskit-dev:mainfrom
nspope:nsp-paircoal-partial-missing
Open

Correctly account for isolated samples in normalisation of pair_coalescence_* methods#3460
nspope wants to merge 1 commit into
tskit-dev:mainfrom
nspope:nsp-paircoal-partial-missing

Conversation

@nspope
Copy link
Copy Markdown
Contributor

@nspope nspope commented May 15, 2026

Fixes #3459

There's the somewhat thorny issue of what the correct normalisation is when only one of span_normalise or pair_normalise are True. In this case (using the notation of the linked issue), what I've done is normalise by the "average number of pairs" $\int p(x) dx / (b - a)$ for "pair normalisation" (so the units are "sequence length per pair") -- this reduces to ${n \choose 2}$ when there's no isolated samples which agrees with the original implementation. Similarly I normalise by the "average pair span" $\int p(x) dx / {n \choose 2}$ for "span normalisation" (so that the units are "# of sample pairs") -- which simplifies to $b - a$ as per the original implementation where there are no isolated samples.

So the output (for the $i$th node or time window) is one of:

  • $C_i$ (unnormalised)
  • $C_i / \int p(x) dx$ (fully normalised)
  • $C_i \times (b - a) / \int p(x) dx$ (pair normalised)
  • $C_i \times {n \choose 2} / \int p(x) dx$ (span normalised)

The above assumes that there's no missing trees over the interval--- if there are, replace $b - a$ by "nonmissing span" (the original implementation discounted missing span as well)

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.68%. Comparing base (1c689c0) to head (be9f67f).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3460      +/-   ##
==========================================
+ Coverage   91.67%   91.68%   +0.01%     
==========================================
  Files          38       38              
  Lines       32186    32248      +62     
  Branches     5150     5167      +17     
==========================================
+ Hits        29505    29567      +62     
  Misses       2348     2348              
  Partials      333      333              
Flag Coverage Δ
C 82.27% <92.20%> (+0.04%) ⬆️
c-python 77.59% <97.36%> (+0.05%) ⬆️
python-tests 96.40% <ø> (ø)
python-tests-no-jit 33.20% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Python API 98.70% <ø> (ø)
Python C interface 91.23% <ø> (ø)
C library 91.27% <100.00%> (+0.03%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nspope
Copy link
Copy Markdown
Contributor Author

nspope commented May 15, 2026

Added an additional test into the C suite for "ancestor-descendant" sample pairs

@nspope nspope force-pushed the nsp-paircoal-partial-missing branch from 53a8d74 to be9f67f Compare May 15, 2026 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect normalisation in TreeSequence.pair_coalescence_counts (and TreeSequence.pair_coalescence_rates) with isolated samples

1 participant