Skip to content

Incorrect normalisation in TreeSequence.pair_coalescence_counts (and TreeSequence.pair_coalescence_rates) with isolated samples #3459

@nspope

Description

@nspope

The pair_normalise option for TreeSequence.pair_coalescence_counts should result in the output being divided by the possible number of coalescing sample pairs---so that if pair_normalise and span_normalise are both set, the output will sum to one over the time dimension. However, if there are isolated samples over a portion of the tree sequence, then this won't be the case---the denominator is incorrect.

Specifically, let $c_i(x)$ be the number of sample pairs coalescing in node $i$ at position $x$. Without normalisation, pair_coalescence_counts returns $C_i = \int_a^b c_i(x) dx$ where the integral is over the interval $[a, b)$ (ignoring "empty" trees). Let $p(x)$ be the number of potentially coalescing sample pairs at position $x$ (that is, omitting isolated samples and pairs where one sample is a descendant of the other). The "full normalisation" should return $C_i / \int_a^b p(x) dx$. If all sample pairs are present over the entirety of the sequence, then $\int_a^b p(x) dx = {n \choose 2} (b - a)$, and the current implementation is hunky dory. But otherwise $\int_a^b p(x) dx < {n \choose 2} (b - a)$ and the normalised output is biased upwards.

A consequence is that pair coalescence rates (which rely on the normalised counts) are biased when there are isolated samples. For example, the figure below shows empirical rates estimated by TreeSequence.pair_coalescence_rates (colored lines that are individual simulations) around the theoretical expectation for the Zigzag model (black line). The "full" panel is when all samples are present, and the "drop" panel is when ancestry is partially removed so that the number of non-isolated samples varies dramatically across the sequence.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions