Skip to content

test: events sent during remote join handshake should not be lost#847

Open
FrenchGithubUser wants to merge 2 commits intomatrix-org:mainfrom
famedly:remote-join-handshake
Open

test: events sent during remote join handshake should not be lost#847
FrenchGithubUser wants to merge 2 commits intomatrix-org:mainfrom
famedly:remote-join-handshake

Conversation

@FrenchGithubUser
Copy link
Copy Markdown

@FrenchGithubUser FrenchGithubUser commented Mar 4, 2026

Test for element-hq/synapse#19390

Disclaimer

As I'm not super fluent with go, I used a lot of AI to write this test. However, I did review it as best as I could together with someone else from Famedly.

Pull Request Checklist

Signed-off-by: Thomas Traineau t.traineau@famedly.com

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a regression test to ensure events created during the federation join handshake (between /make_join and /send_join) remain discoverable to the joining server, preventing missed events / unreachable forward extremities during joins.

Changes:

  • Added TestEventBetweenMakeJoinAndSendJoinIsNotLost to assert intervening events become discoverable (directly or via prev_events references).
  • Implemented a custom federation /_matrix/federation/v1/send/{transactionID} handler to observe outbound PDUs before the Complement server is in the room.
  • Added io import to read raw transaction bodies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread tests/federation_room_join_test.go
Comment thread tests/federation_room_join_test.go Outdated
Comment on lines +593 to +618
// We track the message event ID sent between make_join and send_join.
// After send_join, we wait for hs1 to send us either:
// - the message event itself, or
// - any event whose prev_events reference the message (e.g. a dummy event)
var messageEventID string
messageDiscoverableWaiter := helpers.NewWaiter()

srv := federation.NewServer(t, deployment,
federation.HandleKeyRequests(),
)
srv.UnexpectedRequestsAreErrors = false

// Custom /send handler: the Complement server won't be in the room until
// send_join completes, so we can't use HandleTransactionRequests (which
// requires the room in srv.rooms). Instead we parse the raw transaction.
srv.Mux().Handle("/_matrix/federation/v1/send/{transactionID}", http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
body, _ := io.ReadAll(req.Body)
txn := gjson.ParseBytes(body)
txn.Get("pdus").ForEach(func(_, pdu gjson.Result) bool {
eventID := pdu.Get("event_id").String()
eventType := pdu.Get("type").String()
t.Logf("Received PDU via /send: type=%s id=%s", eventType, eventID)

if messageEventID == "" {
return true
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a Go expert so I don't know for sure whether this is relevant but it sounds plausible.

Based on https://tip.golang.org/ref/mem, without synchronization, writes are not guaranteed to be observed by other goroutines

the assignment to a is not followed by any synchronization event, so it is not guaranteed to be observed by any other goroutine. In fact, an aggressive compiler might delete the entire go statement.

If the effects of a goroutine must be observed by another goroutine, use a synchronization mechanism such as a lock or channel communication to establish a relative ordering.

Comment thread tests/federation_room_join_test.go
Comment thread tests/federation_room_join_test.go Outdated
Comment thread tests/federation_room_join_test.go
Comment thread tests/federation_room_join_test.go Outdated
Comment thread tests/federation_room_join_test.go Outdated
Comment on lines +626 to +634
// Check if this event's prev_events reference the message
// (e.g. a dummy event tying the forward extremities together).
pdu.Get("prev_events").ForEach(func(_, prevEvent gjson.Result) bool {
if prevEvent.String() == messageEventID {
messageDiscoverableWaiter.Finish()
return false
}
return true
})
Copy link
Copy Markdown
Collaborator

@MadLittleMods MadLittleMods Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment why this is good enough.

I assume because people can backfill to get the message.

But then this brings up the question, what if it's a prev_event further back in a chain? (we currently don't detect that as good enough) - we need to at-least explain this edge case although perhaps we should /backfill up to the beginning of the room and ensure the messageEventID was in there.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is partially addressed below but we should also explain it here:

	// Step 4: hs1 should make the missed message discoverable to the joining
	// server. We accept either receiving the message event directly, or
	// receiving any event whose prev_events reference it (allowing the
	// joining server to backfill).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was updated to the following:

			// Check if this event's prev_events directly reference the message
			// (e.g. a dummy event tying the two forward extremities together).
			// If so, the joining server can backfill from that event and will
			// discover the message. We only check one level of prev_events:
			// if the reference is deeper in the DAG the joining server can
			// still reach the message through backfill.

This sounds pretty good now but I think it needs a slight adjustment to set the right expecations:

			// Check if this event's prev_events directly reference the message (e.g. a dummy
			// event tying the two forward extremities together). If so, the joining server
			// can backfill from that event and will discover the message.
			//
			// XXX: We only check one level of prev_events: if the reference is deeper in the
			// DAG, it's valid and the joining server can still reach the message through
			// backfill but our checks don't account for that yet (feel free to edit this
			// assertion if you run into this)

Comment thread tests/federation_room_join_test.go
Comment thread tests/federation_room_join_test.go
Comment thread tests/federation_room_join_test.go
Copy link
Copy Markdown
Collaborator

@MadLittleMods MadLittleMods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates so far @FrenchGithubUser! This is looking close

Comment on lines +616 to +619
if err != nil {
t.Fatalf("failed to read request body in /send handler: %v", err)
return
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use must.NotError(...) here

Comment thread tests/federation_room_join_test.go Outdated
Comment on lines +626 to +634
// Check if this event's prev_events reference the message
// (e.g. a dummy event tying the forward extremities together).
pdu.Get("prev_events").ForEach(func(_, prevEvent gjson.Result) bool {
if prevEvent.String() == messageEventID {
messageDiscoverableWaiter.Finish()
return false
}
return true
})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was updated to the following:

			// Check if this event's prev_events directly reference the message
			// (e.g. a dummy event tying the two forward extremities together).
			// If so, the joining server can backfill from that event and will
			// discover the message. We only check one level of prev_events:
			// if the reference is deeper in the DAG the joining server can
			// still reach the message through backfill.

This sounds pretty good now but I think it needs a slight adjustment to set the right expecations:

			// Check if this event's prev_events directly reference the message (e.g. a dummy
			// event tying the two forward extremities together). If so, the joining server
			// can backfill from that event and will discover the message.
			//
			// XXX: We only check one level of prev_events: if the reference is deeper in the
			// DAG, it's valid and the joining server can still reach the message through
			// backfill but our checks don't account for that yet (feel free to edit this
			// assertion if you run into this)

Comment on lines +593 to +618
// We track the message event ID sent between make_join and send_join.
// After send_join, we wait for hs1 to send us either:
// - the message event itself, or
// - any event whose prev_events reference the message (e.g. a dummy event)
var messageEventID string
messageDiscoverableWaiter := helpers.NewWaiter()

srv := federation.NewServer(t, deployment,
federation.HandleKeyRequests(),
)
srv.UnexpectedRequestsAreErrors = false

// Custom /send handler: the Complement server won't be in the room until
// send_join completes, so we can't use HandleTransactionRequests (which
// requires the room in srv.rooms). Instead we parse the raw transaction.
srv.Mux().Handle("/_matrix/federation/v1/send/{transactionID}", http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
body, _ := io.ReadAll(req.Body)
txn := gjson.ParseBytes(body)
txn.Get("pdus").ForEach(func(_, pdu gjson.Result) bool {
eventID := pdu.Get("event_id").String()
eventType := pdu.Get("type").String()
t.Logf("Received PDU via /send: type=%s id=%s", eventType, eventID)

if messageEventID == "" {
return true
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a Go expert so I don't know for sure whether this is relevant but it sounds plausible.

Based on https://tip.golang.org/ref/mem, without synchronization, writes are not guaranteed to be observed by other goroutines

the assignment to a is not followed by any synchronization event, so it is not guaranteed to be observed by any other goroutine. In fact, an aggressive compiler might delete the entire go statement.

If the effects of a goroutine must be observed by another goroutine, use a synchronization mechanism such as a lock or channel communication to establish a relative ordering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants