Skip to content

feat(igc): add on-demand netdump observability#685

Open
ytakano wants to merge 3 commits intotier4:mainfrom
ytakano:igc_observability2
Open

feat(igc): add on-demand netdump observability#685
ytakano wants to merge 3 commits intotier4:mainfrom
ytakano:igc_observability2

Conversation

@ytakano
Copy link
Copy Markdown
Collaborator

@ytakano ytakano commented Apr 27, 2026

Description

This PR adds on-demand IGC observability without changing datapath behavior.

It introduces a default NetDevice::debug_dump() hook, adds awkernel_lib::net::debug_dump_interface(), and wires the shell netdump(interface_id) command to the IGC driver’s existing register and ring dump logic

Related links

How was this PR tested?

Notes for reviewers

This PR is intentionally limited to read-only observability. It does not change queueing, DMA, interrupt handling, or RX buffering behavior.

The shell addition is only netdump(interface_id). Write-path helpers such as add_ipv4, arping4, and set_gateway4 are not included here.

@ytakano ytakano requested a review from Copilot April 27, 2026 03:01
@ytakano ytakano requested a review from atsushi421 April 27, 2026 03:03
@ytakano ytakano marked this pull request as ready for review April 27, 2026 03:03
@ytakano ytakano requested a review from nokosaaan April 27, 2026 03:03
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an on-demand network device “netdump” observability path by introducing a NetDevice::debug_dump() hook, exposing it via awkernel_lib::net::debug_dump_interface(), and wiring it into the Awkernel shell as (netdump interface_id).

Changes:

  • Add a default debug_dump() method to the NetDevice trait.
  • Add awkernel_lib::net::debug_dump_interface(interface_id) to trigger a device dump for a specific interface.
  • Add a netdump shell command and FFI plumbing (with new bigint conversion deps) to invoke the interface dump.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
awkernel_lib/src/net/net_device.rs Introduces a default NetDevice::debug_dump() hook.
awkernel_lib/src/net.rs Adds debug_dump_interface() entry point in the net manager layer.
awkernel_drivers/src/pcie/intel/igc.rs Wires Igc’s NetDevice::debug_dump() to existing inner dump logic.
applications/awkernel_shell/src/lib.rs Adds (netdump interface_id) BLisp export and embedded FFI handler.
applications/awkernel_shell/Cargo.toml Adds num-bigint / num-traits dependencies for the new FFI argument type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread awkernel_lib/src/net.rs Outdated
Comment thread applications/awkernel_shell/Cargo.toml
Clone the selected network device while holding the net manager read lock, then release the lock before invoking the debug dump path.

This keeps potentially slow diagnostic dumping outside the shared manager lock and matches the existing interface operation pattern.
Comment thread awkernel_drivers/src/pcie/intel/igc.rs Outdated
}

fn debug_dump(&self) {
self.inner.read().dump();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug_dump() holds self.inner.read() for the entire dump, which performs many MMIO register reads, an O(N²) format!("{msg}...") build, and a console-locked log write. Concurrent inner.write() callers (up, down, add_multicast_addr, the LSC/poll-link path in intr) block for that whole time, and on writer-preferring RwLocks subsequent read() callers in the datapath (tick/recv/send/can_send) also stall — which contradicts the PR description's claim of not affecting interrupt/queue behavior.

Consider capturing the small subset of state needed under the lock and releasing it before the format/log work, or exposing the fields actually used (info, hw.mac.addr) for lock-free access so the dump matches the read-only intent.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Fixed.

Comment thread awkernel_lib/src/net/net_device.rs Outdated
}

/// Dump device-specific debug state on demand.
fn debug_dump(&self) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to an empty body means (netdump <id>) silently returns Ok(()) for every NetDevice that hasn't overridden this — VirtioNet, Genet, Igb, Ixgbe, E1000eExample. The shell user sees no output and no indication the operation was unsupported, which is misleading observability behavior.

Consider returning Result<(), NetDevError> with a default of Err(NetDevError::Unsupported) so debug_dump_interface can surface it, or have the default emit a clear "debug_dump not implemented for " log line via device_short_name().

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print warning when debug_dump is not supported as follows.

log::warn!("debug_dump not implemented for this device");

Comment thread awkernel_lib/src/net.rs
Ok(if_status)
}

pub fn debug_dump_interface(interface_id: u64) -> Result<(), NetManagerError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new public API has no doc comment, while other pub fns in this module (get_interface, up, down, tick_interface, ...) document behavior and error conditions. A caller cannot tell where the dump is emitted (the IGC implementation uses log::debug!, but that is not part of the contract), what error variants can be returned, or whether it is safe to invoke during active TX/RX.

Please add a /// comment specifying the output channel, the returned errors (today only InvalidInterfaceID), and any timing constraints.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a doc comment as follows.

/// Emit debug state for the interface identified by `interface_id` via `log::debug!`.
///
/// Returns [`NetManagerError::InvalidInterfaceID`] if no interface with that ID exists.
/// The NET_MANAGER read lock is held only to look up and clone the device reference;
/// the device dump runs outside that lock.

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants