Skip to content

msg_headers deserializer misaligns against real peers (skips trailing tx-count varint) #320

@djdarcy

Description

@djdarcy

Per the subject-line ... bitcoin.messages.msg_headers.msg_deser deserializes each header with CBlockHeader.stream_deserialize, which reads exactly 80 bytes. But the P2P 'headers' message wire format appends a 0-byte transaction-count varint after each 80-byte header (Bitcoin Core writes blocks-without-transactions in headers messages, so each entry has the standard CBlock framing including the tx-count varint). The result is that after deserializing header[0], the stream is 1 byte ahead of where the deserializer thinks it is, and header[1] onwards are shifted by 1 byte (or more, if a future protocol upgrade uses a larger varint).

The bug doesn't show up in self-round-trip tests because msg_headers.msg_ser doesn't write the trailing byte either, so serialize-then-deserialize is symmetric. It only surfaces against on-wire data from a real peer.

Steps to reproduce

import socket
from bitcoin.messages import (
    MsgSerializable, msg_version, msg_verack,
    msg_getheaders, msg_headers, msg_ping, msg_pong,
)

sock = socket.create_connection(('seed.bitcoin.sipa.be', 8333), timeout=10)
f = sock.makefile('rb')

# Standard version/verack handshake.
sock.sendall(msg_version().to_bytes())
for _ in range(10):
    m = MsgSerializable.stream_deserialize(f)
    if isinstance(m, msg_version):
        sock.sendall(msg_verack().to_bytes())
    elif isinstance(m, msg_verack):
        break

# Ask for headers starting from genesis.
gh = msg_getheaders()
gh.locator.vHave = [b'\x00' * 32]
sock.sendall(gh.to_bytes())

while True:
    m = MsgSerializable.stream_deserialize(f)
    if isinstance(m, msg_ping):
        pong = msg_pong(); pong.nonce = m.nonce
        sock.sendall(pong.to_bytes())
        continue
    if isinstance(m, msg_headers):
        hs = m.headers
        print("header[0] hash:", hs[0].GetHash()[::-1].hex())
        print("header[1] prev:", hs[1].hashPrevBlock[::-1].hex())
        print("chain OK?     ", hs[1].hashPrevBlock == hs[0].GetHash())
        break

Observed output (against any mainnet peer):

header[0] hash: 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
header[1] prev: 000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb604800
chain OK?      False

Note header[1].hashPrevBlock is header[0].GetHash() shifted left by one byte.

Suggested fix

In bitcoin/messages.py:

class msg_headers(MsgSerializable):
    @classmethod
    def msg_deser(cls, f, protover=PROTO_VERSION):
        c = cls()
        n = VarIntSerializer.stream_deserialize(f)
        for _ in range(n):
            header = CBlockHeader.stream_deserialize(f)
            VarIntSerializer.stream_deserialize(f)  # discard tx-count varint (always 0)
            c.headers.append(header)
        return c

    def msg_ser(self, f):
        VarIntSerializer.stream_serialize(len(self.headers), f)
        for h in self.headers:
            h.stream_serialize(f)
            VarIntSerializer.stream_serialize(0, f)  # tx-count varint

This makes the deserializer match the on-wire format, and the serializer continues to round-trip with the new deserializer.

Some context...

I hit this implementing Bitcoin P2P getheaders SPV-style header fetching for opentimestamps-client (also yours). Current workaround there is to intercept the 'headers' command at the framing layer and parse the body manually. Happy to open a PR with the fix above plus a regression test that round-trips a wire-formatted body.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions