Skip to content

v0.9.4: nil-pointer deref in moduleReload teardown — daemon panics on SIGUSR2 config reload #846

@edave907

Description

@edave907

Upstream issue draft — maddy: nil-pointer dereference in moduleReload teardown (v0.9.4)

Target repo: https://github.com/foxcpp/maddy
Issue type: Bug report
Status (local): filed in our TODO.md as R-4 — defensive systemd override
landed on the affected host 2026-05-19 (drop-in clears ExecReload=);
upstream report below pending submission to GitHub.


Title

SIGSEGV in moduleReload teardown — daemon panics on config reload (v0.9.4)

Summary

systemctl reload maddy (which sends SIGUSR2 to the maddy process via the
unit file's ExecReload=/bin/kill -USR2 $MAINPID) consistently crashes the
running daemon with a nil-pointer dereference in moduleReload.func3 at
maddy.go:520. The new server starts successfully and binds its listeners
before the panic — the crash is in the teardown of the old server, not in
configuration parsing. systemd then transitions the service to
exit-code/INVALIDARGUMENT (status=2), leaving the host with no mail
service running until systemctl restart maddy is issued.

Reproduces 100% of the time on a Debian-based VPS running maddy 0.9.4
with a typical mail-server configuration (port 25 + Tailscale-bound 587 + LMTP
target + rspamd check).

Environment

  • maddy version: 0.9.4 (per journalctl: new server started {"version":"0.9.4"})
  • OS: Debian 12 (bookworm), systemd-based
  • Configuration shape:
    • smtp tcp://0.0.0.0:25 { ... check { dkim spf rspamd } ... }
    • smtp tcp://<tailscale-ip>:587 { ... } (submission on a private interface)
    • target.lmtp inbound_bridge { targets tcp://127.0.0.1:8025 }
    • target.remote outbound_delivery { ... }
    • target.queue outbound_queue { ... }
  • TLS: Let's Encrypt fullchain + key files

Steps to reproduce

  1. Boot a maddy service with any configuration that uses both an SMTP
    listener stanza and a target.lmtp plus target.remote (we have not
    isolated which module triggers the panic).
  2. Make a trivial edit to /etc/maddy/maddy.conf (we changed only the
    top-of-file comment block — no listener or check-block edits).
  3. Run systemctl reload maddy.
  4. Observe the panic in journalctl -u maddy.

Observed log

maddy[160492]: signal received (user defined signal 2), reloading configuration
maddy[160492]: reloading server...
systemd[1]: Reloaded maddy.service - maddy mail server.
maddy[160492]: loading new configuration...
maddy[160492]: configuration loaded
maddy[160492]: starting new server
maddy[160492]: smtp: listening on tcp://0.0.0.0:25
maddy[160492]: smtp: listening on tcp://<tailscale-ip>:587
maddy[160492]: new server started        {"version":"0.9.4"}
maddy[160492]: stopping old server
maddy[160492]: old server stopped
maddy[160492]: panic: runtime error: invalid memory address or nil pointer dereference
maddy[160492]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x7f6d18708689]
maddy[160492]: goroutine 5230 [running]:
maddy[160492]: github.com/foxcpp/maddy.moduleReload.func3()
maddy[160492]:         github.com/foxcpp/maddy/maddy.go:520 +0x109
maddy[160492]: created by github.com/foxcpp/maddy.moduleReload in goroutine 1
maddy[160492]:         github.com/foxcpp/maddy/maddy.go:507 +0x296
systemd[1]: maddy.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: maddy.service: Failed with result 'exit-code'.

Expected

systemctl reload maddy should atomically swap to the new configuration
without dropping the daemon, exactly the case the "new server started" /
"old server stopped" log lines describe.

Actual

The atomic-swap teardown panics. The new server has already bound its
listeners (and accepted no traffic in the brief window), but the panicking
goroutine takes the whole process down with it.

Workaround we use

Set ExecReload= (empty) in a systemd drop-in to make
systemctl reload maddy fail cleanly with "Job type reload is not
applicable"
instead of crashing. Operationally we use systemctl restart maddy for all config changes (including the certbot renewal deploy hook).

Code pointer + likely root cause

maddy.go:520 in moduleReload.func3 (the goroutine spawned at :507)
is the panic site. Reading the v0.9.4 source: line 520 is
oldContainer.DefaultLogger.Out.Close() inside the async goroutine
that runs immediately after the "old server stopped" log message.
The crash matches Out being nil on the old container's logger — most
likely moduleStop (or whatever teardown ran before this goroutine
fires) cleared / closed the logger's underlying writer, leaving Out
nil before .Close() is called on it.

Suggested guard: nil-check oldContainer.DefaultLogger.Out before
the Close() call, or move the close into the synchronous teardown
path so it can't race with whatever zeroed Out.

The SIGUSR2 reload mechanism itself was introduced in #750
(2026-03-25); the bug appears to be in the reload teardown half of
that feature, not in the new-server-start half.

Happy to test a patch or provide a minimal reproducer if helpful.

Why this might be hard to spot in CI

The crash is in teardown of the old server — the new server reports a
healthy start. Anyone testing systemctl reload by observing port
binding would see "yes, listeners are up" before the panic. We caught it
only because we read journalctl -u maddy after the reload and noticed
the daemon had exited.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions