v0.9.4: nil-pointer deref in moduleReload teardown — daemon panics on SIGUSR2 config reload

# Upstream issue draft — maddy: nil-pointer dereference in moduleReload teardown (v0.9.4)

**Target repo:** https://github.com/foxcpp/maddy
**Issue type:** Bug report
**Status (local):** filed in our TODO.md as R-4 — defensive systemd override
landed on the affected host 2026-05-19 (drop-in clears `ExecReload=`);
upstream report below pending submission to GitHub.

---

## Title

`SIGSEGV in moduleReload teardown — daemon panics on config reload` (v0.9.4)

## Summary

`systemctl reload maddy` (which sends `SIGUSR2` to the maddy process via the
unit file's `ExecReload=/bin/kill -USR2 $MAINPID`) consistently crashes the
running daemon with a nil-pointer dereference in `moduleReload.func3` at
`maddy.go:520`. The new server starts successfully and binds its listeners
before the panic — the crash is in the *teardown of the old server*, not in
configuration parsing. systemd then transitions the service to
`exit-code/INVALIDARGUMENT (status=2)`, leaving the host with no mail
service running until `systemctl restart maddy` is issued.

Reproduces 100% of the time on a Debian-based VPS running `maddy 0.9.4`
with a typical mail-server configuration (port 25 + Tailscale-bound 587 + LMTP
target + rspamd check).

## Environment

- maddy version: 0.9.4 (per `journalctl`: `new server started {"version":"0.9.4"}`)
- OS: Debian 12 (bookworm), systemd-based
- Configuration shape:
  - `smtp tcp://0.0.0.0:25 { ... check { dkim spf rspamd } ... }`
  - `smtp tcp://<tailscale-ip>:587 { ... }` (submission on a private interface)
  - `target.lmtp inbound_bridge { targets tcp://127.0.0.1:8025 }`
  - `target.remote outbound_delivery { ... }`
  - `target.queue outbound_queue { ... }`
- TLS: Let's Encrypt fullchain + key files

## Steps to reproduce

1. Boot a maddy service with any configuration that uses both an SMTP
   listener stanza and a `target.lmtp` plus `target.remote` (we have not
   isolated which module triggers the panic).
2. Make a trivial edit to `/etc/maddy/maddy.conf` (we changed only the
   top-of-file comment block — no listener or check-block edits).
3. Run `systemctl reload maddy`.
4. Observe the panic in `journalctl -u maddy`.

## Observed log

```
maddy[160492]: signal received (user defined signal 2), reloading configuration
maddy[160492]: reloading server...
systemd[1]: Reloaded maddy.service - maddy mail server.
maddy[160492]: loading new configuration...
maddy[160492]: configuration loaded
maddy[160492]: starting new server
maddy[160492]: smtp: listening on tcp://0.0.0.0:25
maddy[160492]: smtp: listening on tcp://<tailscale-ip>:587
maddy[160492]: new server started        {"version":"0.9.4"}
maddy[160492]: stopping old server
maddy[160492]: old server stopped
maddy[160492]: panic: runtime error: invalid memory address or nil pointer dereference
maddy[160492]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x7f6d18708689]
maddy[160492]: goroutine 5230 [running]:
maddy[160492]: github.com/foxcpp/maddy.moduleReload.func3()
maddy[160492]:         github.com/foxcpp/maddy/maddy.go:520 +0x109
maddy[160492]: created by github.com/foxcpp/maddy.moduleReload in goroutine 1
maddy[160492]:         github.com/foxcpp/maddy/maddy.go:507 +0x296
systemd[1]: maddy.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: maddy.service: Failed with result 'exit-code'.
```

## Expected

`systemctl reload maddy` should atomically swap to the new configuration
without dropping the daemon, exactly the case the "new server started" /
"old server stopped" log lines describe.

## Actual

The atomic-swap teardown panics. The new server has already bound its
listeners (and accepted no traffic in the brief window), but the panicking
goroutine takes the whole process down with it.

## Workaround we use

Set `ExecReload=` (empty) in a systemd drop-in to make
`systemctl reload maddy` fail cleanly with *"Job type reload is not
applicable"* instead of crashing. Operationally we use `systemctl restart
maddy` for all config changes (including the certbot renewal deploy hook).

## Code pointer + likely root cause

`maddy.go:520` in `moduleReload.func3` (the goroutine spawned at `:507`)
is the panic site. Reading the v0.9.4 source: line 520 is
`oldContainer.DefaultLogger.Out.Close()` inside the async goroutine
that runs immediately after the "old server stopped" log message.
The crash matches `Out` being nil on the old container's logger — most
likely `moduleStop` (or whatever teardown ran before this goroutine
fires) cleared / closed the logger's underlying writer, leaving `Out`
nil before `.Close()` is called on it.

Suggested guard: nil-check `oldContainer.DefaultLogger.Out` before
the `Close()` call, or move the close into the synchronous teardown
path so it can't race with whatever zeroed `Out`.

The SIGUSR2 reload mechanism itself was introduced in #750
(2026-03-25); the bug appears to be in the reload teardown half of
that feature, not in the new-server-start half.

Happy to test a patch or provide a minimal reproducer if helpful.

## Why this might be hard to spot in CI

The crash is in *teardown of the old server* — the new server reports a
healthy start. Anyone testing `systemctl reload` by observing port
binding would see "yes, listeners are up" before the panic. We caught it
only because we read `journalctl -u maddy` after the reload and noticed
the daemon had exited.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.4: nil-pointer deref in moduleReload teardown — daemon panics on SIGUSR2 config reload #846

Upstream issue draft — maddy: nil-pointer dereference in moduleReload teardown (v0.9.4)

Title

Summary

Environment

Steps to reproduce

Observed log

Expected

Actual

Workaround we use

Code pointer + likely root cause

Why this might be hard to spot in CI

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

v0.9.4: nil-pointer deref in moduleReload teardown — daemon panics on SIGUSR2 config reload #846

Description

Upstream issue draft — maddy: nil-pointer dereference in moduleReload teardown (v0.9.4)

Title

Summary

Environment

Steps to reproduce

Observed log

Expected

Actual

Workaround we use

Code pointer + likely root cause

Why this might be hard to spot in CI

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions