Extract all logic for running and judging a submission from verifyproblem to judge module #398#411
Merged
pehrsoderman merged 13 commits intoKattis:masterfrom Apr 27, 2026
Conversation
…some of the output)
…by problemtools.judge)
bcba5b0 to
c6b61e1
Compare
pehrsoderman
approved these changes
Apr 27, 2026
Contributor
pehrsoderman
left a comment
There was a problem hiding this comment.
Very hard to review, but I can't find any obvious problems.
This was referenced Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR ended up huge, but I don't think there was any way to avoid that.
This is a reimplementation of the logic for running a submission, moving everything from
verifyproblemto the moduleproblemtools.judge. Earlier code was quite confusing (IMHO), where it kept passing round 3 time limits, and returned the results for those 3 different time limits. The basic idea was good though - if we know what happened at some time limit t, we can cheaply compute what happens for t* <= t by toggling test cases to TLE and running the grader.In the new implementation, the core piece is the
ResultStoreclass, which achieves 3 things:ResultStoreis a cache, and the cache key is what determines if we can reuse a result (so, sha256 of input, output, validator flags, ...).The
ResultStoreis currently instantiated per submission, but we'll probably want to extend this later and add some persistence, cf. #379.Test case reuse logic is now much improved, and symlinks between test cases are no longer a magical way to enable result reuse. If two test cases are identical, we will reuse results (with or without symlinks). It is also perfectly legal to add a symlink to save space when files are identical, even if the entire test case isn't.
The main interface to the
judgemodule is the classSubmissionJudgewhich knows how to judge a submission. Instead of just returning an aggregated result, it returns all intermediary results for the entire test case tree. This allowsSubmissionResultto be a much simpler class, as the caller can easily compute various warnings (e.g., failing on sample cases) based on the tree instead of needing that information to be aggregated.Fixes #397
Fixes #383
Progress on #382
Progress on #379
Fixes #244