feat(pilotctl): add --encoding flag to send-message#8
Open
voidborne-d wants to merge 19 commits intoTeoSlayer:mainfrom
Open
feat(pilotctl): add --encoding flag to send-message#8voidborne-d wants to merge 19 commits intoTeoSlayer:mainfrom
voidborne-d wants to merge 19 commits intoTeoSlayer:mainfrom
Conversation
* Add install section to agent skills documentation Agents and users reading SKILLS.md had no install instructions. Adding the section makes the doc self-contained for onboarding. * Extract named constants and eliminate magic numbers Hardcoded numeric literals scattered across the daemon package made tuning values difficult to find and reason about. This centralizes them into named constants with clear documentation: - Beacon message types (protocol/header.go): single source of truth used by beacon server, tunnel manager, and tests - Dial/retransmission constants: retry counts, RTO bounds, intervals - RTO parameters (RFC 6298): clock granularity, min/max clamp values - Zero-window probe bounds, accept queue, send buffer capacities - Handshake timing: replay reaper interval, recv timeout, close delay - ConnState.String() method replaces inline switch in ConnectionList - Heartbeat loop uses config keepalive interval instead of hardcoded 30s * Improve maintainability with sentinel errors, deduplication, and cleanup Define shared sentinel errors (ErrNodeNotFound, ErrNetworkNotFound, ErrConnClosed, ErrConnRefused, ErrDialTimeout, ErrChecksumMismatch) in protocol/header.go and replace ~25 ad-hoc fmt.Errorf strings across registry, daemon, driver, and beacon — callers can now use errors.Is(). Extract startRecvPusher() in daemon/ipc.go to deduplicate the recv-push goroutine that was copy-pasted between CmdDial and CmdAccept handlers. Add jsonRPC() helper and named sub-command constants in driver/driver.go, reducing 10 methods from 8-12 lines each to 1-5 lines. Fix silent error ignoring in dashboard (w.Write), gateway (io.Copy), and nameserver (conn.Write) with explicit discards or error logging. Convert 7 short mutex patterns to use defer across daemon, driver, and replication packages. Remove redundant replication count() method. Add overflow guard comment in packet.go and document InsecureSkipVerify cert-pinning pattern in registry/client.go. --------- Co-authored-by: Teodor Calin <teodor@vulturelabs.io>
Trust pairs and handshake inboxes were being destroyed when nodes disconnected (reap) or deregistered, causing permanent loss of identity-to-identity trust relationships. This fixes four issues: - cleanupNode no longer deletes trust pairs or handshake inboxes - reapStaleNodes preserves ownerIdx for re-registration reclaim - snapshotJSON includes trust pairs and handshake data for replication - reRegister re-syncs local trust pairs to registry after reconnect
When pilotctl is already present in ~/.pilot/bin, the script now skips config, service, and PATH setup — only replaces binaries and exits with a restart hint. Also removes broken platform-suffix renames from the download path.
parseAddrOrHostname now tries interpreting the argument as a numeric node ID (mapping to backbone address 0:0000.XXXX.XXXX) before falling back to hostname resolution. This fixes send-message, ping, connect, and other commands that previously only accepted full addresses or hostnames.
…ening Webhooks & event system: - Daemon emits real-time events via HTTP POST to configurable endpoint - Events: conn.syn_received, conn.established, conn.fin, conn.rst, message.received, handshake.pending, handshake.approved, handshake.auto_approved, trust.revoked_by_peer, tunnel.peer_added, node.registered, node.deregistered, security.syn_rate_limited - Async delivery with buffered channel, non-blocking, graceful shutdown - Runtime hot-swap via IPC (set-webhook / clear-webhook) - CLI: pilotctl daemon start --webhook <url>, pilotctl set-webhook/clear-webhook Node tags: - Capability tags for node discovery (e.g. "webserver", "analytics") - Validated format (lowercase alphanumeric + hyphens, 1-32 chars, max 8) - CLI: pilotctl set-tags <tag1> [tag2] ..., pilotctl clear-tags - Tags visible in dashboard node list with filtering - Persisted in registry snapshots Beacon relay performance: - Worker pool architecture: single reader goroutine + N workers (NumCPU) - Buffered relay dispatch channel (4096 capacity) for backpressure - sync.Pool for payload buffers, per-worker pre-allocated send buffers - Read-only relay path (RLock only, no write lock contention) - 4MB UDP receive buffer for burst absorption Registry persistence hardening: - pubKeyIdx persisted in snapshots (survives node reap + restart cycles) - Debounced save: async signal + background flush at most once per second - Compact JSON serialization (no indent) reduces write amplification - Known-key re-registrations bypass rate limiter for fast reconnection - Close() guarantees final flush before returning Dashboard improvements: - pprof endpoints (/debug/pprof/) for live CPU/memory profiling - Tag display and filtering in node table - Pagination for large node lists - Total requests stat card, responsive grid layout Logging: - Demoted noisy registration/key-rotation logs from INFO to DEBUG - Demoted beacon relay-not-found from WARN to DEBUG CI: - CodeQL static analysis (Go) on push/PR to main + weekly schedule
- Add full documentation site (14 pages + CSS) at web/docs/ - Add Polo dashboard page and sidebar navigation - Replace dashboard problem/solution diagrams with interactive force-directed trust graph (canvas, pan/zoom, fullscreen) - Add trust edges to /api/stats endpoint - Update SKILLS.md with webhook and tag commands - Update registry/beacon IP from 35.193.106.76 to 34.71.57.205 across all source, configs, docs, and install script
Beacon instances can now form a peer mesh for horizontal scaling behind a load balancer. Each beacon gossips its local node list to peers every 10s (BeaconMsgSync 0x07). The relay worker uses 3-tier routing: local nodes first, then peer beacons, then drop. A /healthz HTTP endpoint supports GCP health checks for managed instance groups. Registry gains beacon_register/beacon_list for dynamic peer discovery with 60s TTL auto-reap. cmd/beacon gets --beacon-id, --peers, --health flags for standalone clustered deployments.
Beacons register with the registry and discover peers every 30s. Uses 4-byte length-prefix framing matching the registry wire protocol. Gossip loop runs unconditionally to support dynamically added peers.
- LaTeX preprint analyzing trust graph topology, capability specialization, and network formation patterns from live registry metadata - Research page on docs site with key findings and PDF download - Research nav section added to all docs pages and homepage - SKILLS.md: network growth encouragement section
- Remove trust graph visualization from Polo dashboard (canvas, force
simulation, fullscreen, tooltip — ~170 lines of CSS/HTML/JS)
- Add self-hosted SVG badge endpoints: /api/badge/{nodes,trust,requests,tags}
with formatted counts and color coding
- Add live Network stats section to pilotprotocol.network website with
3 stat cards fetched from the registry API
- Add 4 dynamic badges to README, remove trust graph from Polo description
/docs/media/pilot.png -- 767.18kb -> 753.48kb (1.79%) Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com> Co-authored-by: ImgBotApp <ImgBotHelp@gmail.com>
…eadiness Full-stack implementation following the SetVisibility pattern: - NodeInfo.TaskExec bool field with snapshot persistence - Registry handler (set_task_exec) with signature verification - Registry client, daemon IPC (0x1D/0x1E), driver API - CLI: pilotctl enable-tasks / disable-tasks - Dashboard: stat card, table column, filter toggle, badge endpoint - Website: Task Executors stat card with live data - 7 tests covering basic, toggle, auth, persistence, dashboard, lookup, IPC
Enables go get, Go proxy resolution, and Go Report Card compatibility.
* init: tweak irrelevant stuff * test: add e2e testing, coverage and pre-commits * feat: add karma implementation * feat: implement task management * feat: implement task submit service --------- Co-authored-by: Alex Godoroja <alex@vulturelabs.io>
- Fix web4/ → github.com/TeoSlayer/pilotprotocol/ imports in tasksubmit package and tests (left over from module rename) - Restrict /metrics and /debug/pprof/ endpoints to localhost only - Fix cumulative histogram bucket counting in Prometheus metrics - Update Makefile release target to include all 7 binaries with correct naming convention (matching v1.2.1 structure)
Add client-side --encoding flag that wraps the payload in a JSON
envelope before sending. No wire protocol changes — the data is
sent as a standard TypeJSON frame.
Usage:
pilotctl send-message target --data '?Uk/co' --encoding lambda
# Sends TypeJSON: {"encoding":"lambda","data":"?Uk/co"}
This enables agents to exchange messages in custom encodings
(e.g. Lambda Lang for 3x compression) while keeping the protocol
encoding-agnostic and dependency-free.
Ref: TeoSlayer#5
Use int64 arithmetic for size computation before allocating the file frame payload buffer. This prevents a potential integer overflow when len(name) + len(Payload) exceeds math.MaxInt on 32-bit platforms. Fixes CodeQL high-severity alert: 'Size computation for allocation may overflow'
TeoSlayer
pushed a commit
that referenced
this pull request
May 3, 2026
Symptom: a peer's daemon dies (port closed, host unreachable, network
removed). Their kernel sends ICMP unreachable in response to our UDP
packets; the Linux kernel queues this on our socket, and the next
WriteToUDP returns ECONNREFUSED (or EHOSTUNREACH / ENETUNREACH for
host-level outages). We currently propagate the error from writeFrame
to the caller and DO NOTHING semantic — the peer stays in tm.peers,
the keepalive loop (iter 7) keeps emitting pings, the blackhole
heuristic (iter 3) takes 24+ s to flip to relay, and any in-flight
Connection's retransmit timer wastes its budget on a dead path.
Real-world impact: peer crashes mid-transfer → our daemon spends ~24 s
pretending the peer is still up; Connection retries burn through their
budget before relay flip happens.
Lands in this RED commit:
- sendErrCount map[uint32]int field on TunnelManager, init in
NewTunnelManager, cleared in RemovePeer
- sendErrThreshold = 3 constant (matches blackhole-miss / direct-clear
thresholds for symmetry)
- handleSendError(nodeID, err) STUB that does nothing — body lands
in GREEN
Platform note: macOS does NOT surface ICMP errors on unconnected UDP
sockets without IP_RECVERR (Linux-only). The integration path is
Linux-only in production. Tests exercise handleSendError directly with
synthesized ECONNREFUSED — portable across platforms.
Tests:
- TestICMPUnreachableIgnoredByWriteFrame: pin stub does nothing.
5 calls with synthesized net.OpError{Err: ECONNREFUSED} → both
sendErrCount and relayPeers stay zero. GREEN flips: 3 calls
flip the peer to relay.
- TestSendErrCountClearedOnPeerRemoval: RemovePeer wipes the entry,
preventing stale counts from triggering phantom flips on re-admission.
TeoSlayer
pushed a commit
that referenced
this pull request
May 3, 2026
Bug fixed: handleSendError now classifies UDP-write errors and tracks
per-peer ICMP-unreachable count. On reaching sendErrThreshold (3)
consecutive matching errors, the peer is flipped to relay mode
immediately — recovery happens in seconds instead of waiting ~24 s
for the lastDirectRecv-based blackhole heuristic. recordInboundDecrypt
clears the counter on any inbound success (peer recovered → forget
the past errors, don't re-flip).
Implementation (pkg/daemon/tunnel.go):
func (tm *TunnelManager) handleSendError(nodeID uint32, err error) {
if !isICMPUnreachable(err) { return }
tm.mu.Lock()
tm.sendErrCount[nodeID]++
count := tm.sendErrCount[nodeID]
flipped := false
if count >= sendErrThreshold && !tm.relayPeers[nodeID] {
if len(tm.relayPeers) < maxRelayPeers {
tm.relayPeers[nodeID] = true
tm.sendErrCount[nodeID] = 0
flipped = true
}
}
tm.mu.Unlock()
if flipped { slog.Info("direct path ICMP-unreachable, flipping to relay", ...) }
}
func isICMPUnreachable(err error) bool {
return errors.Is(err, syscall.ECONNREFUSED) ||
errors.Is(err, syscall.EHOSTUNREACH) ||
errors.Is(err, syscall.ENETUNREACH)
}
writeFrame's two error paths (relay-wrapped + direct) now call
handleSendError on UDP write failure. recordInboundDecrypt clears
sendErrCount on every successful inbound decrypt.
Why these three errors: ECONNREFUSED = peer's stack sent ICMP port
unreachable (process down or port closed). EHOSTUNREACH = router
sent ICMP host unreachable (host down or unreachable). ENETUNREACH =
router sent ICMP net unreachable (network partition). All three are
"reachably-dead" signals from the network. Other errors (EAGAIN,
generic write failures) are ignored.
Why the maxRelayPeers cap is preserved: prevents an attacker who
can spoof ICMP unreachables from exhausting our relay-peer slot
budget.
Tests:
- TestICMPUnreachableIgnoredByWriteFrame: feed 3 synthesized
ECONNREFUSED → 3rd flips peer to relay, counter resets to 0.
- TestNonICMPErrorIgnored: 5 generic write errors → no count
increment, no flip.
- TestRecordInboundDecryptClearsSendErrCount: pre-existing count
of 2 wiped after a single inbound decrypt.
- TestSendErrCountClearedOnPeerRemoval (RED): RemovePeer wipes count.
Race-clean across full pkg/daemon (63s). 12 commits ahead of v1.9.0,
8 bug categories closed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
--encodingflag topilotctl send-messagethat wraps the payload in a JSON envelope before sending. No wire protocol changes — the data is sent as a standardTypeJSONframe.Usage
The receiver can inspect the
encodingfield to decode the payload with the appropriate decoder (e.g. lambda-go).Design
As discussed in #5, this is purely client-side convenience:
--encodingflag wraps--datain{"encoding":"<name>","data":"..."}and forces--type jsonencodingfield when setThis pairs with
pilotctl set-tags lambda-langfor tag-based discovery of Lambda-capable agents.Changes
cmd/pilotctl/main.go: Add--encodingflag tocmdSendMessage, update usage string and context commandTesting
go build ./cmd/pilotctl/✅ compiles cleanly--encodingis not setRef: #5
Standalone encoder library: github.com/voidborne-d/lambda-go