Remote Trigger Registration Checks#21366
Conversation
|
✅ No conflicts with other open PRs targeting |
|
I see you updated files related to
|
|
giving all the problems we had with the other PRs not being able to roll out one node at a time, will this change introduce a breaking deployment when releasing it? |
… into CRE-1520-remote-trigger-registration-checking
CORA - Pending Reviewers
Legend: ✅ Approved | ❌ Changes Requested | 💬 Commented | 🚫 Dismissed | ⏳ Pending | ❓ Unknown For more details, see the full review summary. |
…CRE-1520-remote-trigger-registration-checking
…CRE-1520-remote-trigger-registration-checking
| CapabilityMethod: p.capMethodName, | ||
| Metadata: &types.MessageBody_TriggerEventMetadata{ | ||
| TriggerEventMetadata: &types.TriggerEventMetadata{ | ||
| WorkflowIds: workflowIDs, |
There was a problem hiding this comment.
Great that we batch this! But is there a risk that we'll exceed some max message size?
There was a problem hiding this comment.
I guess that's possible in an extreme case. Just added chunking to avoid this, although the tradeoff is some additional traffic 😄
Seems fine on Staging with this change though, traffic is still low for the checks.
| } | ||
|
|
||
| p.mu.RLock() | ||
| totalRegistrations := len(p.registrations) |
There was a problem hiding this comment.
So the subscriber is still sending registrations in the same way. It looks like we only need to send checks for stuff that is old, correct? We can skip it if we received a fresh registration.
There was a problem hiding this comment.
Yeah that's true, but I don't think adding an optimization is necessary in this PR (don't want to delay this any more). Would need to add some additional complexity to hold onto timestamps of when last register occured.
…CRE-1520-remote-trigger-registration-checking
…CRE-1520-remote-trigger-registration-checking
|




Adds registration checks for Trigger Delivery Guarantees. The Publisher no longer expires registrations and instead send checks to the Subscriber to prompt for explicit unregisters. Also added additional don2don metrics to view traffic.
Also adds ackReplayCache to Subscriber, which is used to re-send ACKs for events that have already been ACKd by the engine. This is needed in case slow nodes miss receiving an ACK and keep resending the event, but would be stuck since Subscriber wont re-deliver to the engine.