[7/n] track all endpoints from which a schema is accessible, not just the first#19
Conversation
Created using spr 1.3.6-beta.1
| ChangePath { | ||
| old: [ | ||
| "#/components/schemas/SubType", | ||
| "#/components/schemas/GreetingResponse/properties/via_anyof/0/$ref", | ||
| "#/paths/~1hello~1{name}/get/responses/200/content/application~1json/schema/$ref", | ||
| ], | ||
| new: [ | ||
| "#/components/schemas/SubType", | ||
| "#/components/schemas/GreetingResponse/properties/via_anyof/0/$ref", | ||
| "#/paths/~1hello~1{name}/get/responses/200/content/application~1json/schema/$ref", | ||
| ], | ||
| comparison: Output, | ||
| }, |
There was a problem hiding this comment.
All paths get reported here now -- this is the key change.
Tracking the first was an intentional change for simplicity; what's the benefit of tracking all paths? Do you have an example in mind where one might encounter this or a workflow that's improved? |
The context is that I started adding support for #12 to the Dropshot API manager. But then I realized that drift was only tracking a single endpoint -- the goal of #12 is to determine which endpoints to bump versions for, and if a type is reachable from multiple endpoints, showing just one of them would be less than an ideal UX. How many types are reachable from multiple endpoints? I had Claude do a quick analysis of the Sled Agent API (script gist) and it produced this histogram (full output): So 98/249 types in the Sled Agent API (39%) are reachable from multiple endpoints. (Some of them such as the UUID types aren't going to change, but there are plenty of types that do change often that are accessible from more than one endpoint, such as the Based on this I felt it was worth it to track all the endpoints, not just the first one. (I also think I discussed this with you, though I don't think I presented the data/histogram back then.) |
Agreed. Can you describe the current state and the improved state? It seems like very very often one would correct the error and then be done. Sometimes, one might correct the error and then see a new one pop up in a different spot. The distribution seems interesting, but I'm not sure what it indicates. |
Let's say
Without this stack of PRs, the best that the API manager can do is:
(This can potentially take up to N build iterations, where N is the number of endpoints that refer to a type as in the histogram above.) With this stack of PRs:
Right -- I think what you're describing is what I would consider a frustrating experience for types referred to by many endpoints. For example,
The distribution is just to show that a substantial number of types have more than one endpoint referring to them. Another example is the Nexus external API where more than 50% of types have more than one endpoint. To me, the case for doing this this would be less compelling if, say, only 10% of types are accessible from multiple endpoints. Nexus external API histogram |
Currently, we memoize schema comparisons via the
visitedset. That means that only the first endpoint from which a type is accessible is recorded and returned.Introduce a change to track all paths from which a schema is accessible. This is a major change with several components:
Restructure
ChangeintoChange,ChangePath, andChangeInfo. AChangenow groups all differences within a single component or endpoint:pathslists every route through which the component was reached, andchangeslists the individual differences found within it. Previously eachChangecarried a single path pair and a single difference.Introduce
BasePath,ChangeKey, andChangeRecordinCompare. Thevisitedmap remains keyed on full paths (memoizing individual node comparisons), while the newrecordsmap is keyed on base paths (grouping changes by owning component). Multiplevisitedentries feed into a singlerecordsentry. The rationale for this dual-keying strategy is documented on theComparestruct.Add
base_lentracking toEndpointPathandRefTargetPath. These types can now report their base path (the component or endpoint root) separately from appended segments, enabling the base-path grouping above.JsonPathStackgains correspondingbase_and_subpath(),without_subpath(), andendpoint_base()accessors.The meat of the change: add
record_pathto capture access paths on memoization cache hits. Whenvisitedshort-circuits a schema comparison, the first visit's changes are already recorded, but the later access path is not.record_pathensures all routes to a schema appear in the finalChange, even when the comparison itself is skipped.