Average multi microbenchmarks results by VincentBu · Pull Request #5215 · dotnet/performance

VincentBu · 2026-05-01T07:37:34Z

This PR aims at calculating average value of multiple microbenchmarks results. The work revolves around:

Reduce memory usage.
Change namespace of some classes and rename them for future work.

…ks when creating suites

…ust namespaces

Copilot

Pull request overview

This PR updates the GC microbenchmark infrastructure to support aggregating (averaging) results across multiple microbenchmark runs/iterations, while also renaming/refactoring parts of the analysis/presentation pipeline and introducing an outlier-removal helper.

Changes:

Add configurable microbenchmark iteration count (iterations) and wire it into suite creation and execution.
Replace the previous single-result comparison flow with a new per-benchmark aggregation/comparison pipeline (MicrobenchmarkResultComparison, GCTraceMetrics, GCTraceMetricComparisonResult).
Refactor output generation to primarily emit JSON (markdown generation currently disabled).

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure/Commands/RunCommand/CreateSuiteCommand.cs	Reads configured iteration count and applies it to microbenchmark suite environment.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure/Commands/RunCommand/BaseSuite/MicrobenchmarksToRun.txt	Updates baseline suite benchmark list.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure/Commands/RunCommand/BaseSuite/Microbenchmarks.yaml	Renames environment iteration setting to `iterations`.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure/Commands/Microbenchmark/MicrobenchmarkCommand.cs	Runs microbenchmarks for `iterations` and switches to new aggregation/comparison logic before presenting results.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure/Commands/Microbenchmark/MicrobenchmarkAnalyzeCommand.cs	Updates analysis-only command to use the new aggregation/comparison logic.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Presentation/Microbenchmarks/Presentation.cs	Changes presentation API to accept precomputed grouped results; markdown output path currently disabled.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Presentation/Microbenchmarks/Markdown.cs	Markdown generation code is commented out.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Presentation/Microbenchmarks/Json/JsonOutput.cs	Removes unused placeholder type.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Presentation/Microbenchmarks/Json.cs	Moves JSON generator to Microbenchmarks presentation namespace and updates signature for grouped results.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Configurations/Microbenchmarks.Configuration.cs	Renames `iteration` to `iterations` in microbenchmark environment configuration.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Configurations/InputConfiguration.cs	Adds `iterations` map to input configuration.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/Microbenchmarks/MicrobenchmarkResultsAnalyzer.cs	Removes old analyzer/comparison pipeline.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/Microbenchmarks/MicrobenchmarkResultComparison.cs	Adds new JSON/trace mapping, per-benchmark analysis, and aggregation/grouping logic.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/Microbenchmarks/MicrobenchmarkResult.cs	Introduces new MicrobenchmarkResult model (namespace currently mismatched vs usage).
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/Microbenchmarks/MicrobenchmarkComparisonResult.cs	Updates comparison to support averaged values/outlier removal and new trace-metric comparisons.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/GCTraceMetrics.cs	Adds trace-derived metric extraction (includes reflection/stat bugs).
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/GCTraceMetricComparisonResult.cs	Adds averaged comparison for trace metrics (baseline vs comparand).
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/GCTraceMetricComparison.cs	Adds helper wrapper for metric comparison construction.
src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/BdnJsonResult.cs	Refactors BDN JSON model types; renames top-level to `BdnJsonResult`.
src/benchmarks/gc/GC.Infrastructure/GC.Analysis.API/Statistics.cs	Adds `RemoveOutliers` helper (IQR method).
src/benchmarks/gc/GC.Infrastructure/Configurations/Run.yaml	Adds iteration configuration block (currently mismatched with new `iterations` input model).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 13 comments.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 11 comments.

…chmarks namespace

…ereIsGen1

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 11 comments.

…om/VincentBu/performance into average-microbenchmarks-iterations

Redesign microbenchmark result

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 11 comments.

Comments suppressed due to low confidence (1)

src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Presentation/Microbenchmarks/Markdown.cs:175

AveragedBaselineOtherMetrics/AveragedComparandOtherMetrics are Dictionary<string,double>, so GetValueOrDefault(column) returns 0 when the metric is missing. That makes missing metrics look like real 0 values and can produce incorrect deltas (and divide-by-zero in deltaPercent). Use TryGetValue and only compute delta/delta% when both sides have a value and the baseline is non-zero.

                        foreach (var column in configuration.Output.Columns)
                        {
                            double? baselineValue = lr.AveragedBaselineOtherMetrics.GetValueOrDefault(column);
                            double? comparandValue = lr.AveragedComparandOtherMetrics.GetValueOrDefault(column);

                            string baselineResult = baselineValue.HasValue ? Math.Round(baselineValue.Value, 4).ToString() : string.Empty;
                            string comparandResult = comparandValue.HasValue ? Math.Round(comparandValue.Value, 4).ToString() : string.Empty;
                            double? delta = baselineValue.HasValue && comparandValue.HasValue ? comparandValue.Value - baselineValue.Value : null;
                            string deltaResult = delta.HasValue ? Math.Round(delta.Value, 4).ToString() : string.Empty;

                            double? deltaPercent = delta.HasValue ? (delta / baselineValue.Value) * 100 : null;
                            string deltaPercentResult = deltaPercent.HasValue ? Math.Round(deltaPercent.Value, 4).ToString() : string.Empty;

…eMetric

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (1)

src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Presentation/Microbenchmarks/Json.cs:15

Json.Generate doesn’t use the configuration parameter, and the using GC.Analysis.API; / using GC.Infrastructure.Core.Presentation.GCPerfSim; directives are unused. Consider removing the unused parameter/usings to avoid warnings and keep the API surface minimal.

+                    .ToDictionary();
+
+                OtherMetrics = OtherMetrics.Concat(customStatistics).ToDictionary();
+
+                if (gcData != null)
+                {
+                    var customGCData = columns
+                        .Where(column => CustomAggregateCalculationMap.Keys.Contains(column))
+                        .Select(column => (column, CustomAggregateCalculationMap[column](gcData)))
+                        .ToDictionary();
+
+                    OtherMetrics = OtherMetrics.Concat(customGCData).ToDictionary();


+        public Dictionary<string, double[]> OriginalBaselineOtherMetrics { get; } = new();
+        public Dictionary<string, double[]> OriginalComparandOtherMetrics { get; } = new();
+        public Dictionary<string, double[]> OutliersFreeBaselineOtherMetrics => OriginalBaselineOtherMetrics
+            .Select(kvp => (kvp.Key, API.Statistics.RemoveOutliers(kvp.Value).ToArray()))
+            .ToDictionary();
+        public Dictionary<string, double[]> OutliersFreeComparandOtherMetrics => OriginalComparandOtherMetrics
+            .Select(kvp => (kvp.Key, API.Statistics.RemoveOutliers(kvp.Value).ToArray()))
+            .ToDictionary();


+                    var comparandMicrobenchmarkResults = GoodLinq.Where(microbenchmarkResultsGroup, r => !r.Parent.is_baseline);
+
+                    lock (_lock)
+                    {
+                        comparisonResults.Add(new(baselineMicrobenchmarkResults, comparandMicrobenchmarkResults, includeTraces));


+                    string outputPathForRun = Path.Combine(outputPath, run.Name);
+                    var sortedTraceFiles = Directory.GetFiles(outputPathForRun, $"{traceFileNameTemplate}*.etl.zip", SearchOption.TopDirectoryOnly)
+                        .OrderBy(traceFile => traceFile)
+                        .ToArray();


+                    var ordered = comparisonResult.Comparisons.OrderByDescending(c => c.OtherMetricsDiffPerc[metric]);

                    // Large Regressions
                    sw.WriteLine($"### Large Regressions (>20%): {comparisonResult.LargeRegressions.Count()} \n");
-                    sw.AddTableForSingleCriteria(configuration, GoodLinq.Where(ordered, o => o.GetDiffPercentFromOtherMetrics(metric) > 0.2));
+                    sw.AddTableForSingleCriteria(configuration, GoodLinq.Where(ordered, o => o.OtherMetricsDiffPerc[metric] >= 0.2));
                    sw.WriteLine("\n");

                    // Large Improvements
                    sw.WriteLine($"### Large Improvements (>20%): {comparisonResult.LargeImprovements.Count()} \n");
-                    var largeImprovements = GoodLinq.Where(ordered, o => o.GetDiffPercentFromOtherMetrics(metric) < -0.2);
-                    largeImprovements.Reverse();
-                    sw.AddTableForSingleCriteria(configuration, largeImprovements);
+                    sw.AddTableForSingleCriteria(configuration, GoodLinq.Where(ordered, o => o.OtherMetricsDiffPerc[metric] <= -0.2));
                    sw.WriteLine("\n");

                    // Regressions
                    sw.WriteLine($"### Regressions (5% - 20%): {comparisonResult.Regressions.Count()} \n");
-                    sw.AddTableForSingleCriteria(configuration, GoodLinq.Where(ordered, o => o.GetDiffPercentFromOtherMetrics(metric) > 0.05 && o.GetDiffPercentFromOtherMetrics(metric) < 0.2));
+                    sw.AddTableForSingleCriteria(configuration, GoodLinq.Where(ordered, o => o.OtherMetricsDiffPerc[metric] >= 0.05 && o.OtherMetricsDiffPerc[metric] < 0.2));
                    sw.WriteLine("\n");

                    // Improvements
                    sw.WriteLine($"### Improvements (5% - 20%): {comparisonResult.Improvements.Count()} \n");
-                    var improvements = GoodLinq.Where(ordered, o => o.GetDiffPercentFromOtherMetrics(metric) > 0.05 && o.GetDiffPercentFromOtherMetrics(metric) < 0.2);
-                    improvements.Reverse();
-                    sw.AddTableForSingleCriteria(configuration, improvements);
+                    sw.AddTableForSingleCriteria(configuration, GoodLinq.Where(ordered, o => o.OtherMetricsDiffPerc[metric] <= -0.05 && o.OtherMetricsDiffPerc[metric] > -0.2));
                    sw.WriteLine("\n");


VincentBu and others added 5 commits April 23, 2026 14:18

add iterations section for end-2-end config and set for microbenchmar…

9db4f30

…ks when creating suites

Merge branch 'dotnet:main' into average-microbenchmarks-iterations

a730074

add json-trace map and implement AnalyzeForBenchmark

9802484

Calculate comparison result by benchmark name, rename classes and adj…

f2318ac

…ust namespaces

present list of microbenchmarkresults

e34745b

Copilot AI review requested due to automatic review settings May 1, 2026 07:37

Copilot started reviewing on behalf of VincentBu May 1, 2026 07:38 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

VincentBu commented May 1, 2026

View reviewed changes

Comment thread ...c/GC.Infrastructure/GC.Infrastructure/Commands/RunCommand/BaseSuite/MicrobenchmarksToRun.txt

VincentBu commented May 1, 2026

View reviewed changes

Comment thread src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/GCTraceMetrics.cs

VincentBu commented May 1, 2026

View reviewed changes

Comment thread ...hmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/GCTraceMetricComparisonResult.cs

VincentBu commented May 1, 2026

View reviewed changes

Comment thread src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/BdnJsonResult.cs

rename iteration section to iterations for Run.yaml

e9594b3

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 6, 2026 05:25

Copilot started reviewing on behalf of VincentBu May 6, 2026 05:25 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Potential fix for pull request finding

c3f4b45

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 6, 2026 06:26

Copilot started reviewing on behalf of VincentBu May 6, 2026 06:27 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

VincentBu added 8 commits May 6, 2026 14:41

fix for microbencharmks comparison

240ceb8

fix bugs intrduced in previous commit

1b08bd0

Add json-only comparison

434a17c

extract a shared helper for analyze command

faca628

take trace type into consideration

ab17bf3

move MicrobenchmarkResult to GC.Infrastructure.Core.Analysis.Microben…

70c7f16

…chmarks namespace

validate if run is null

d9b160d

rename PauseDurationSeconds_SumWhereIsGen1 to PauseDurationMSec_SumWh…

2ba9f6b

…ereIsGen1

VincentBu marked this pull request as draft May 7, 2026 09:24

Copilot AI review requested due to automatic review settings May 7, 2026 09:25

assign value for PromotedMB_MeanWhereIsGen1

1dc65f5

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 7, 2026 09:37

Copilot started reviewing on behalf of VincentBu May 7, 2026 09:38 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

VincentBu and others added 9 commits May 9, 2026 13:43

add properties for microbenchmarks comparison

6915009

get value from StatsData if property is not found in GCTraceMetrics

2b51863

filtering to finite values

69ea04d

check possible null references

30c9c91

add console output when analyzing results

3b38059

Merge branch 'average-microbenchmarks-iterations' of https://github.c…

866e09d

…om/VincentBu/performance into average-microbenchmarks-iterations

redesign microbenchmarks result

f9c6d08

redesign microbenchmarkresult

3b78ca3

Merge pull request #1 from VincentBu/redesign-microbenchmark-result

b7cb942

Redesign microbenchmark result

Copilot AI review requested due to automatic review settings May 14, 2026 07:09

Copilot started reviewing on behalf of VincentBu May 14, 2026 07:10 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

VincentBu added 8 commits May 14, 2026 16:33

move presentation to analyze-command

8096656

improve performance of analyzing stage

ec7b882

includes int type properties

6ba2219

check key existence and set parallelism degree to 2 * cpu_count

cc59167

check if metricName is a key of StatsData

a7c1b2c

Filter out null GCTraceMetrics instances before calling CompareGCTrac…

357ce0a

…eMetric

update initialization of OtherMetrics for MicrobenchmarkComparisonResult

3d1c330

comment out cpu_columns related code

eb9f7e3

Copilot AI review requested due to automatic review settings May 15, 2026 08:01

Copilot started reviewing on behalf of VincentBu May 15, 2026 08:01 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

VincentBu requested a review from Copilot May 15, 2026 09:15

Copilot started reviewing on behalf of VincentBu May 15, 2026 09:15 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Conversation

VincentBu commented May 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment