Skip to content

[7/n] track all endpoints from which a schema is accessible, not just the first#19

Open
sunshowers wants to merge 1 commit intosunshowers/spr/main.7n-track-all-endpoints-from-which-a-schema-is-accessible-not-just-the-firstfrom
sunshowers/spr/7n-track-all-endpoints-from-which-a-schema-is-accessible-not-just-the-first
Open

[7/n] track all endpoints from which a schema is accessible, not just the first#19
sunshowers wants to merge 1 commit intosunshowers/spr/main.7n-track-all-endpoints-from-which-a-schema-is-accessible-not-just-the-firstfrom
sunshowers/spr/7n-track-all-endpoints-from-which-a-schema-is-accessible-not-just-the-first

Conversation

@sunshowers
Copy link
Copy Markdown
Contributor

@sunshowers sunshowers commented Feb 6, 2026

Currently, we memoize schema comparisons via the visited set. That means that only the first endpoint from which a type is accessible is recorded and returned.

Introduce a change to track all paths from which a schema is accessible. This is a major change with several components:

  • Restructure Change into Change, ChangePath, and ChangeInfo. A Change now groups all differences within a single component or endpoint: paths lists every route through which the component was reached, and changes lists the individual differences found within it. Previously each Change carried a single path pair and a single difference.

  • Introduce BasePath, ChangeKey, and ChangeRecord in Compare. The visited map remains keyed on full paths (memoizing individual node comparisons), while the new records map is keyed on base paths (grouping changes by owning component). Multiple visited entries feed into a single records entry. The rationale for this dual-keying strategy is documented on the Compare struct.

  • Add base_len tracking to EndpointPath and RefTargetPath. These types can now report their base path (the component or endpoint root) separately from appended segments, enabling the base-path grouping above. JsonPathStack gains corresponding base_and_subpath(), without_subpath(), and endpoint_base() accessors.

  • The meat of the change: add record_path to capture access paths on memoization cache hits. When visited short-circuits a schema comparison, the first visit's changes are already recorded, but the later access path is not. record_path ensures all routes to a schema appear in the final Change, even when the comparison itself is skipped.

Created using spr 1.3.6-beta.1
Comment on lines +70 to +82
ChangePath {
old: [
"#/components/schemas/SubType",
"#/components/schemas/GreetingResponse/properties/via_anyof/0/$ref",
"#/paths/~1hello~1{name}/get/responses/200/content/application~1json/schema/$ref",
],
new: [
"#/components/schemas/SubType",
"#/components/schemas/GreetingResponse/properties/via_anyof/0/$ref",
"#/paths/~1hello~1{name}/get/responses/200/content/application~1json/schema/$ref",
],
comparison: Output,
},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All paths get reported here now -- this is the key change.

@ahl
Copy link
Copy Markdown
Collaborator

ahl commented May 2, 2026

Introduce a change to track all paths from which a schema is accessible

Tracking the first was an intentional change for simplicity; what's the benefit of tracking all paths? Do you have an example in mind where one might encounter this or a workflow that's improved?

@sunshowers
Copy link
Copy Markdown
Contributor Author

Tracking the first was an intentional change for simplicity; what's the benefit of tracking all paths? Do you have an example in mind where one might encounter this or a workflow that's improved?

The context is that I started adding support for #12 to the Dropshot API manager. But then I realized that drift was only tracking a single endpoint -- the goal of #12 is to determine which endpoints to bump versions for, and if a type is reachable from multiple endpoints, showing just one of them would be less than an ideal UX.

How many types are reachable from multiple endpoints? I had Claude do a quick analysis of the Sled Agent API (script gist) and it produced this histogram (full output):

Types:     249

=== Histogram: number of types reachable from N endpoints ===
endpoints   types  distribution
        1     151  ########################################
        2      58  ###############
        3      10  ###
        4       7  ##
        5       9  ##
        9       4  #
       10       1
       11       3  #
       12       1
       13       3  #
       16       1
       76       1

So 98/249 types in the Sled Agent API (39%) are reachable from multiple endpoints. (Some of them such as the UUID types aren't going to change, but there are plenty of types that do change often that are accessible from more than one endpoint, such as the UplinkAddressConfig example in #12:

UplinkAddressConfig (2)
  - POST /switch-ports  [uplink_ensure]
  - PUT /network-bootstore-config  [write_network_bootstore_config]

Based on this I felt it was worth it to track all the endpoints, not just the first one. (I also think I discussed this with you, though I don't think I presented the data/histogram back then.)

@ahl
Copy link
Copy Markdown
Collaborator

ahl commented May 3, 2026

showing just one of them would be less than an ideal UX

Agreed. Can you describe the current state and the improved state? It seems like very very often one would correct the error and then be done. Sometimes, one might correct the error and then see a new one pop up in a different spot. The distribution seems interesting, but I'm not sure what it indicates.

@sunshowers
Copy link
Copy Markdown
Contributor Author

sunshowers commented May 3, 2026

Agreed. Can you describe the current state and the improved state?

Let's say UplinkAddressConfig was last changed at v7 and you're working in v10. The current UX (described in #12) is:

  1. The Dropshot API manager complains, and for each version from v7 to v9 it says that UplinkAddressConfig has changed, though it doesn't say which endpoints have changed.
  2. You try and figure out which endpoints transitively refer to UplinkAddressConfig.
  3. You bump their versions. This is a bit of trial and error and may take a few tries.

Without this stack of PRs, the best that the API manager can do is:

  1. The Dropshot API manager complains, and for each version from v7 to v9 it says that UplinkAddressConfig has changed, and mentions one of the endpoints -- let's say POST /switch-ports.
  2. You bump the version for this endpoint.
  3. You run the API manager again, which will rebuild sled-agent-types-versions up to the omicron-dropshot-apis crate.
  4. You see a complaint for the second endpoint.
  5. You bump the version for the second endpoint.

(This can potentially take up to N build iterations, where N is the number of endpoints that refer to a type as in the histogram above.)

With this stack of PRs:

  1. The Dropshot API manager complains, and for each version from v7 to v9 it says that UplinkAddressConfig has changed and mentions all the endpoints from which it is accessible, along with the paths to get there. (We could even print out a summary: here's which endpoints to bump, and here are all the types that need bumping along the way. Something like cargo tree -i, basically.)
  2. You bump the versions for all affected endpoints in one go.

It seems like very very often one would correct the error and then be done. Sometimes, one might correct the error and then see a new one pop up in a different spot.

Right -- I think what you're describing is what I would consider a frustrating experience for types referred to by many endpoints. For example, NetworkInterface has 5 endpoints which refer to it so you'd have to rebuild and rerun the API manager up to 5 times.

The distribution seems interesting, but I'm not sure what it indicates.

The distribution is just to show that a substantial number of types have more than one endpoint referring to them. Another example is the Nexus external API where more than 50% of types have more than one endpoint. To me, the case for doing this this would be less compelling if, say, only 10% of types are accessible from multiple endpoints.

Nexus external API histogram
Types:     459
Unreachable types: 0


=== Histogram: number of types reachable from N endpoints ===
endpoints   types  distribution
        1     205  ########################################
        2     157  ###############################
        3      33  ######
        4      19  ####
        5      14  ###
        6       9  ##
        7       6  #
        8       2
       10       2
       14       1
       17       1
       23       2
       27       1
       30       1
       38       1
       42       2
      197       1
      233       1
      304       1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants