Skip to content

feat(pilotctl): add --encoding flag to send-message#8

Open
voidborne-d wants to merge 19 commits intoTeoSlayer:mainfrom
voidborne-d:feat/encoding-flag
Open

feat(pilotctl): add --encoding flag to send-message#8
voidborne-d wants to merge 19 commits intoTeoSlayer:mainfrom
voidborne-d:feat/encoding-flag

Conversation

@voidborne-d
Copy link
Copy Markdown

Summary

Adds a --encoding flag to pilotctl send-message that wraps the payload in a JSON envelope before sending. No wire protocol changes — the data is sent as a standard TypeJSON frame.

Usage

pilotctl send-message target --data "?Uk/co" --encoding lambda
# Sends TypeJSON: {"encoding":"lambda","data":"?Uk/co"}

The receiver can inspect the encoding field to decode the payload with the appropriate decoder (e.g. lambda-go).

Design

As discussed in #5, this is purely client-side convenience:

  • No new frame types
  • No new dependencies
  • No wire protocol changes
  • The --encoding flag wraps --data in {"encoding":"<name>","data":"..."} and forces --type json
  • Output includes encoding field when set

This pairs with pilotctl set-tags lambda-lang for tag-based discovery of Lambda-capable agents.

Changes

  • cmd/pilotctl/main.go: Add --encoding flag to cmdSendMessage, update usage string and context command

Testing

  • go build ./cmd/pilotctl/ ✅ compiles cleanly
  • Flag is optional — existing behavior unchanged when --encoding is not set

Ref: #5


Standalone encoder library: github.com/voidborne-d/lambda-go

TeoSlayer and others added 18 commits February 9, 2026 14:18
* Add install section to agent skills documentation

Agents and users reading SKILLS.md had no install instructions.
Adding the section makes the doc self-contained for onboarding.

* Extract named constants and eliminate magic numbers

Hardcoded numeric literals scattered across the daemon package made
tuning values difficult to find and reason about. This centralizes
them into named constants with clear documentation:

- Beacon message types (protocol/header.go): single source of truth
  used by beacon server, tunnel manager, and tests
- Dial/retransmission constants: retry counts, RTO bounds, intervals
- RTO parameters (RFC 6298): clock granularity, min/max clamp values
- Zero-window probe bounds, accept queue, send buffer capacities
- Handshake timing: replay reaper interval, recv timeout, close delay
- ConnState.String() method replaces inline switch in ConnectionList
- Heartbeat loop uses config keepalive interval instead of hardcoded 30s

* Improve maintainability with sentinel errors, deduplication, and cleanup

Define shared sentinel errors (ErrNodeNotFound, ErrNetworkNotFound,
ErrConnClosed, ErrConnRefused, ErrDialTimeout, ErrChecksumMismatch) in
protocol/header.go and replace ~25 ad-hoc fmt.Errorf strings across
registry, daemon, driver, and beacon — callers can now use errors.Is().

Extract startRecvPusher() in daemon/ipc.go to deduplicate the recv-push
goroutine that was copy-pasted between CmdDial and CmdAccept handlers.
Add jsonRPC() helper and named sub-command constants in driver/driver.go,
reducing 10 methods from 8-12 lines each to 1-5 lines.

Fix silent error ignoring in dashboard (w.Write), gateway (io.Copy),
and nameserver (conn.Write) with explicit discards or error logging.
Convert 7 short mutex patterns to use defer across daemon, driver,
and replication packages. Remove redundant replication count() method.

Add overflow guard comment in packet.go and document InsecureSkipVerify
cert-pinning pattern in registry/client.go.

---------

Co-authored-by: Teodor Calin <teodor@vulturelabs.io>
Trust pairs and handshake inboxes were being destroyed when nodes
disconnected (reap) or deregistered, causing permanent loss of
identity-to-identity trust relationships. This fixes four issues:

- cleanupNode no longer deletes trust pairs or handshake inboxes
- reapStaleNodes preserves ownerIdx for re-registration reclaim
- snapshotJSON includes trust pairs and handshake data for replication
- reRegister re-syncs local trust pairs to registry after reconnect
When pilotctl is already present in ~/.pilot/bin, the script now
skips config, service, and PATH setup — only replaces binaries
and exits with a restart hint. Also removes broken platform-suffix
renames from the download path.
parseAddrOrHostname now tries interpreting the argument as a numeric
node ID (mapping to backbone address 0:0000.XXXX.XXXX) before falling
back to hostname resolution. This fixes send-message, ping, connect,
and other commands that previously only accepted full addresses or
hostnames.
…ening

Webhooks & event system:
- Daemon emits real-time events via HTTP POST to configurable endpoint
- Events: conn.syn_received, conn.established, conn.fin, conn.rst,
  message.received, handshake.pending, handshake.approved,
  handshake.auto_approved, trust.revoked_by_peer, tunnel.peer_added,
  node.registered, node.deregistered, security.syn_rate_limited
- Async delivery with buffered channel, non-blocking, graceful shutdown
- Runtime hot-swap via IPC (set-webhook / clear-webhook)
- CLI: pilotctl daemon start --webhook <url>, pilotctl set-webhook/clear-webhook

Node tags:
- Capability tags for node discovery (e.g. "webserver", "analytics")
- Validated format (lowercase alphanumeric + hyphens, 1-32 chars, max 8)
- CLI: pilotctl set-tags <tag1> [tag2] ..., pilotctl clear-tags
- Tags visible in dashboard node list with filtering
- Persisted in registry snapshots

Beacon relay performance:
- Worker pool architecture: single reader goroutine + N workers (NumCPU)
- Buffered relay dispatch channel (4096 capacity) for backpressure
- sync.Pool for payload buffers, per-worker pre-allocated send buffers
- Read-only relay path (RLock only, no write lock contention)
- 4MB UDP receive buffer for burst absorption

Registry persistence hardening:
- pubKeyIdx persisted in snapshots (survives node reap + restart cycles)
- Debounced save: async signal + background flush at most once per second
- Compact JSON serialization (no indent) reduces write amplification
- Known-key re-registrations bypass rate limiter for fast reconnection
- Close() guarantees final flush before returning

Dashboard improvements:
- pprof endpoints (/debug/pprof/) for live CPU/memory profiling
- Tag display and filtering in node table
- Pagination for large node lists
- Total requests stat card, responsive grid layout

Logging:
- Demoted noisy registration/key-rotation logs from INFO to DEBUG
- Demoted beacon relay-not-found from WARN to DEBUG

CI:
- CodeQL static analysis (Go) on push/PR to main + weekly schedule
- Add full documentation site (14 pages + CSS) at web/docs/
- Add Polo dashboard page and sidebar navigation
- Replace dashboard problem/solution diagrams with interactive
  force-directed trust graph (canvas, pan/zoom, fullscreen)
- Add trust edges to /api/stats endpoint
- Update SKILLS.md with webhook and tag commands
- Update registry/beacon IP from 35.193.106.76 to 34.71.57.205
  across all source, configs, docs, and install script
Beacon instances can now form a peer mesh for horizontal scaling behind
a load balancer. Each beacon gossips its local node list to peers every
10s (BeaconMsgSync 0x07). The relay worker uses 3-tier routing: local
nodes first, then peer beacons, then drop. A /healthz HTTP endpoint
supports GCP health checks for managed instance groups.

Registry gains beacon_register/beacon_list for dynamic peer discovery
with 60s TTL auto-reap. cmd/beacon gets --beacon-id, --peers, --health
flags for standalone clustered deployments.
Beacons register with the registry and discover peers every 30s.
Uses 4-byte length-prefix framing matching the registry wire protocol.
Gossip loop runs unconditionally to support dynamically added peers.
- LaTeX preprint analyzing trust graph topology, capability specialization,
  and network formation patterns from live registry metadata
- Research page on docs site with key findings and PDF download
- Research nav section added to all docs pages and homepage
- SKILLS.md: network growth encouragement section
- Remove trust graph visualization from Polo dashboard (canvas, force
  simulation, fullscreen, tooltip — ~170 lines of CSS/HTML/JS)
- Add self-hosted SVG badge endpoints: /api/badge/{nodes,trust,requests,tags}
  with formatted counts and color coding
- Add live Network stats section to pilotprotocol.network website with
  3 stat cards fetched from the registry API
- Add 4 dynamic badges to README, remove trust graph from Polo description
/docs/media/pilot.png -- 767.18kb -> 753.48kb (1.79%)

Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com>
Co-authored-by: ImgBotApp <ImgBotHelp@gmail.com>
…eadiness

Full-stack implementation following the SetVisibility pattern:
- NodeInfo.TaskExec bool field with snapshot persistence
- Registry handler (set_task_exec) with signature verification
- Registry client, daemon IPC (0x1D/0x1E), driver API
- CLI: pilotctl enable-tasks / disable-tasks
- Dashboard: stat card, table column, filter toggle, badge endpoint
- Website: Task Executors stat card with live data
- 7 tests covering basic, toggle, auth, persistence, dashboard, lookup, IPC
Enables go get, Go proxy resolution, and Go Report Card compatibility.
* init: tweak irrelevant stuff

* test: add e2e testing, coverage and pre-commits

* feat: add karma implementation

* feat: implement task management

* feat: implement task submit service

---------

Co-authored-by: Alex Godoroja <alex@vulturelabs.io>
- Fix web4/ → github.com/TeoSlayer/pilotprotocol/ imports in tasksubmit
  package and tests (left over from module rename)
- Restrict /metrics and /debug/pprof/ endpoints to localhost only
- Fix cumulative histogram bucket counting in Prometheus metrics
- Update Makefile release target to include all 7 binaries with correct
  naming convention (matching v1.2.1 structure)
Add client-side --encoding flag that wraps the payload in a JSON
envelope before sending. No wire protocol changes — the data is
sent as a standard TypeJSON frame.

Usage:
  pilotctl send-message target --data '?Uk/co' --encoding lambda
  # Sends TypeJSON: {"encoding":"lambda","data":"?Uk/co"}

This enables agents to exchange messages in custom encodings
(e.g. Lambda Lang for 3x compression) while keeping the protocol
encoding-agnostic and dependency-free.

Ref: TeoSlayer#5
@TeoSlayer TeoSlayer self-requested a review March 11, 2026 13:03
@TeoSlayer TeoSlayer self-assigned this Mar 13, 2026
@TeoSlayer TeoSlayer added the enhancement New feature or request label Mar 13, 2026
TeoSlayer
TeoSlayer previously approved these changes Mar 13, 2026
Copy link
Copy Markdown
Owner

@TeoSlayer TeoSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Use int64 arithmetic for size computation before allocating the
file frame payload buffer. This prevents a potential integer overflow
when len(name) + len(Payload) exceeds math.MaxInt on 32-bit platforms.

Fixes CodeQL high-severity alert: 'Size computation for allocation may overflow'
@TeoSlayer TeoSlayer requested a review from Alexgodoroja as a code owner April 30, 2026 01:36
TeoSlayer pushed a commit that referenced this pull request May 3, 2026
Symptom: a peer's daemon dies (port closed, host unreachable, network
removed). Their kernel sends ICMP unreachable in response to our UDP
packets; the Linux kernel queues this on our socket, and the next
WriteToUDP returns ECONNREFUSED (or EHOSTUNREACH / ENETUNREACH for
host-level outages). We currently propagate the error from writeFrame
to the caller and DO NOTHING semantic — the peer stays in tm.peers,
the keepalive loop (iter 7) keeps emitting pings, the blackhole
heuristic (iter 3) takes 24+ s to flip to relay, and any in-flight
Connection's retransmit timer wastes its budget on a dead path.

Real-world impact: peer crashes mid-transfer → our daemon spends ~24 s
pretending the peer is still up; Connection retries burn through their
budget before relay flip happens.

Lands in this RED commit:
  - sendErrCount map[uint32]int field on TunnelManager, init in
    NewTunnelManager, cleared in RemovePeer
  - sendErrThreshold = 3 constant (matches blackhole-miss / direct-clear
    thresholds for symmetry)
  - handleSendError(nodeID, err) STUB that does nothing — body lands
    in GREEN

Platform note: macOS does NOT surface ICMP errors on unconnected UDP
sockets without IP_RECVERR (Linux-only). The integration path is
Linux-only in production. Tests exercise handleSendError directly with
synthesized ECONNREFUSED — portable across platforms.

Tests:
  - TestICMPUnreachableIgnoredByWriteFrame: pin stub does nothing.
    5 calls with synthesized net.OpError{Err: ECONNREFUSED} → both
    sendErrCount and relayPeers stay zero. GREEN flips: 3 calls
    flip the peer to relay.
  - TestSendErrCountClearedOnPeerRemoval: RemovePeer wipes the entry,
    preventing stale counts from triggering phantom flips on re-admission.
TeoSlayer pushed a commit that referenced this pull request May 3, 2026
Bug fixed: handleSendError now classifies UDP-write errors and tracks
per-peer ICMP-unreachable count. On reaching sendErrThreshold (3)
consecutive matching errors, the peer is flipped to relay mode
immediately — recovery happens in seconds instead of waiting ~24 s
for the lastDirectRecv-based blackhole heuristic. recordInboundDecrypt
clears the counter on any inbound success (peer recovered → forget
the past errors, don't re-flip).

Implementation (pkg/daemon/tunnel.go):

  func (tm *TunnelManager) handleSendError(nodeID uint32, err error) {
      if !isICMPUnreachable(err) { return }
      tm.mu.Lock()
      tm.sendErrCount[nodeID]++
      count := tm.sendErrCount[nodeID]
      flipped := false
      if count >= sendErrThreshold && !tm.relayPeers[nodeID] {
          if len(tm.relayPeers) < maxRelayPeers {
              tm.relayPeers[nodeID] = true
              tm.sendErrCount[nodeID] = 0
              flipped = true
          }
      }
      tm.mu.Unlock()
      if flipped { slog.Info("direct path ICMP-unreachable, flipping to relay", ...) }
  }

  func isICMPUnreachable(err error) bool {
      return errors.Is(err, syscall.ECONNREFUSED) ||
             errors.Is(err, syscall.EHOSTUNREACH) ||
             errors.Is(err, syscall.ENETUNREACH)
  }

writeFrame's two error paths (relay-wrapped + direct) now call
handleSendError on UDP write failure. recordInboundDecrypt clears
sendErrCount on every successful inbound decrypt.

Why these three errors: ECONNREFUSED = peer's stack sent ICMP port
unreachable (process down or port closed). EHOSTUNREACH = router
sent ICMP host unreachable (host down or unreachable). ENETUNREACH =
router sent ICMP net unreachable (network partition). All three are
"reachably-dead" signals from the network. Other errors (EAGAIN,
generic write failures) are ignored.

Why the maxRelayPeers cap is preserved: prevents an attacker who
can spoof ICMP unreachables from exhausting our relay-peer slot
budget.

Tests:
  - TestICMPUnreachableIgnoredByWriteFrame: feed 3 synthesized
    ECONNREFUSED → 3rd flips peer to relay, counter resets to 0.
  - TestNonICMPErrorIgnored: 5 generic write errors → no count
    increment, no flip.
  - TestRecordInboundDecryptClearsSendErrCount: pre-existing count
    of 2 wiped after a single inbound decrypt.
  - TestSendErrCountClearedOnPeerRemoval (RED): RemovePeer wipes count.

Race-clean across full pkg/daemon (63s). 12 commits ahead of v1.9.0,
8 bug categories closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants