Skip to content

[SVLS-8349] Add CPU Enhanced Metrics in Windows Azure Functions#133

Open
kathiehuang wants to merge 7 commits into
mainfrom
kathie.huang/add-windows-cpu-enhanced-metrics
Open

[SVLS-8349] Add CPU Enhanced Metrics in Windows Azure Functions#133
kathiehuang wants to merge 7 commits into
mainfrom
kathie.huang/add-windows-cpu-enhanced-metrics

Conversation

@kathiehuang
Copy link
Copy Markdown
Contributor

@kathiehuang kathiehuang commented May 18, 2026

What does this PR do?

Adds CPU usage enhanced metric azure.functions.enhanced.cpu.usage (nanocores) for Windows Azure Functions. This feature is already available on Linux Azure Functions (see #77).

Motivation

https://datadoghq.atlassian.net/browse/SVLS-8349

Describe how to test/QA your changes

Build with serverless-compat-self-monitoring.

This was deployed with the serverless-compat-self-monitoring pipeline across all runtimes and hosting plans. All hosting plans in Windows were tested to make sure metrics submit correctly.

Testing to compare against Azure Monitor is documented in an internal doc in Enhanced Metrics in the Serverless Compatibility Layer.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Windows support for the Azure Functions enhanced CPU usage metric (azure.functions.enhanced.cpu.usage) by reading cumulative CPU time from the Windows Job Object API and enabling enhanced metrics in Windows Azure Functions.

Changes:

  • Enable DD_ENHANCED_METRICS_ENABLED-controlled enhanced metrics for Azure Functions on Windows (no longer hard-disabled).
  • Implement Windows CPU total collection via QueryInformationJobObject and convert Job Object accounting time to nanoseconds for downstream usage-rate computation.
  • Add the windows crate dependency behind the existing windows-enhanced-metrics feature.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

File Description
crates/datadog-serverless-compat/src/main.rs Enables enhanced metrics in Azure Functions based on DD_ENHANCED_METRICS_ENABLED for all platforms (including Windows).
crates/datadog-metrics-collector/src/azure_windows.rs Implements Job Object CPU time reads for Windows to support enhanced CPU usage metric computation.
crates/datadog-metrics-collector/Cargo.toml Adds optional windows dependency and wires it to windows-enhanced-metrics.
Cargo.lock Locks new windows crate transitive dependencies for Windows builds.
Comments suppressed due to low confidence (1)

crates/datadog-metrics-collector/src/azure_windows.rs:41

  • This adds new Windows-specific CPU collection logic, but there are no unit tests covering the conversion from Job Object 100-ns units to the CpuStats.total nanoseconds value (including overflow/negative handling). Consider extracting the conversion into a pure function and adding a small #[cfg(windows)] test to prevent regressions.
            // TotalUserTime and TotalKernelTime are in 100-nanosecond units - multiply by 100 to get nanoseconds
            let total_ns = (info.TotalUserTime + info.TotalKernelTime) as u64 * 100;
            Some(CpuStats { total: total_ns })
        }

Comment on lines +6 to 10
//! This module provides functionality to read CPU usage from Windows Job Objects.
//!
//! All CPU metrics will be reported in nanocores (1 core = 1,000,000,000 nanocores).
//! All CPU metrics are reported in nanocores (1 core = 1,000,000,000 nanocores).

use crate::azure_cpu::{CpuStats, CpuStatsReader};
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specified that CPU time is stored in nanoseconds and reported in nanocores in 0e97d78

Comment on lines +25 to +37
fn read_cpu_usage_from_job_object() -> Option<CpuStats> {
let mut info = JOBOBJECT_BASIC_ACCOUNTING_INFORMATION::default();
let result = unsafe {
QueryInformationJobObject(
None, // If the handle is None, the current process's job object is used
JobObjectBasicAccountingInformation, // The type of info to retrieve
&mut info as *mut _ as *mut _, // Pointer to the struct that will store the info
std::mem::size_of::<JOBOBJECT_BASIC_ACCOUNTING_INFORMATION>() as u32,
None,
)
};
match result {
Ok(()) => {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match result {
Ok(()) => {
// TotalUserTime and TotalKernelTime are in 100-nanosecond units - multiply by 100 to get nanoseconds
let total_ns = (info.TotalUserTime + info.TotalKernelTime) as u64 * 100;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guarded against overflow by using try_from and checked add/mul in 0e97d78

Comment on lines +42 to +43
Err(_) => {
debug!("Failed to read CPU usage from Job Object");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logged the error in 0e97d78

@kathiehuang
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@kathiehuang kathiehuang marked this pull request as ready for review May 18, 2026 20:22
@kathiehuang kathiehuang requested review from a team as code owners May 18, 2026 20:22
@kathiehuang kathiehuang requested review from apiarian-datadog, duncanpharvey and jchrostek-dd and removed request for a team May 18, 2026 20:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e5cf36972a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

libdd-common = { git = "https://github.com/DataDog/libdatadog", rev = "4ae8ebe252451374c292efd159ce254c3f5a72e0", default-features = false }

[target.'cfg(windows)'.dependencies]
windows = { version = "0.62", features = ["Win32_System_JobObjects"], optional = true }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add Win32_Foundation to the Windows bindings

When building the Windows target with windows-enhanced-metrics, this dependency only enables Win32_System_JobObjects, but the windows bindings gate QueryInformationJobObject/HANDLE behind the Win32_Foundation feature as well. As a result, azure_windows.rs cannot resolve the imported API in the Windows enhanced-metrics build; include Win32_Foundation in this feature list.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Win32_Foundation is already transitively included, defined in the windows crate's Cargo.toml:

Win32_System_JobObjects = ["Win32_System"]
Win32_System = ["Win32"]
Win32 = ["Win32_Foundation"]

The CI build passing confirms this

libdd-common = { git = "https://github.com/DataDog/libdatadog", rev = "4ae8ebe252451374c292efd159ce254c3f5a72e0", default-features = false }

[target.'cfg(windows)'.dependencies]
windows = { version = "0.62", features = ["Win32_System_JobObjects"], optional = true }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like windows pulls in additional dependencies. Are all of these dependencies needed? If you set default-features = false are some or all of the dependencies omitted while still being able to collect cpu stats?

Let's try to minimize the increase of the build artifact as much as possible.

dependencies = [
"windows-collections",
"windows-core",
"windows-future",
"windows-numerics",
]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried setting default-features = false but it didn't change the Cargo.lock - it looks like all four of these are hard dependencies of the windows crate and not features: https://github.com/microsoft/windows-rs/blob/d3a079bfe1bde71bceebfc6628f4c55a044cd501/crates/libs/windows/Cargo.toml#L15

I think if we want to avoid these subcrates we could switch to windows-sys. The code would be slightly more verbose but the compile footprint would be smaller: https://kennykerr.ca/rust-getting-started/windows-or-windows-sys.html

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested windows-sys - output was the same, metrics look right, and binary size went down 11,264 bytes. Updated in c404f84

// SAFETY: `info` is a stack-allocated `JOBOBJECT_BASIC_ACCOUNTING_INFORMATION` initialized via `default()`, so the compiler guarantees its alignment.
// The buffer size argument is `size_of::<JOBOBJECT_BASIC_ACCOUNTING_INFORMATION>()`, which exactly matches `info`, so the API cannot write out of bounds.
// Passing `None` for the job handle is documented to use the current process's job object.
let result = unsafe {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add unit tests for this? do we already have windows test runners in ci? if not, maybe now with "unsafe" in the mix is a good time to add them?

Copy link
Copy Markdown
Contributor Author

@kathiehuang kathiehuang May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we have Windows test runners in CI! I added unit tests for the CPU time conversion logic to check negative values and overflow a75f3b5 but I think the only way I can test the unsafe block would be to just see if the API call works and that read_cpu_usage_from_job_object doesn't return none?

@datadog-datadog-prod-us1-2

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants