Skip to content

Extract all logic for running and judging a submission from verifyproblem to judge module #398#411

Merged
pehrsoderman merged 13 commits intoKattis:masterfrom
gkreitz:398_refactor_submission_runs
Apr 27, 2026
Merged

Extract all logic for running and judging a submission from verifyproblem to judge module #398#411
pehrsoderman merged 13 commits intoKattis:masterfrom
gkreitz:398_refactor_submission_runs

Conversation

@gkreitz
Copy link
Copy Markdown
Contributor

@gkreitz gkreitz commented Apr 27, 2026

This PR ended up huge, but I don't think there was any way to avoid that.

This is a reimplementation of the logic for running a submission, moving everything from verifyproblem to the module problemtools.judge. Earlier code was quite confusing (IMHO), where it kept passing round 3 time limits, and returned the results for those 3 different time limits. The basic idea was good though - if we know what happened at some time limit t, we can cheaply compute what happens for t* <= t by toggling test cases to TLE and running the grader.

In the new implementation, the core piece is the ResultStore class, which achieves 3 things:

  • Reuse results for identical test cases - the ResultStore is a cache, and the cache key is what determines if we can reuse a result (so, sha256 of input, output, validator flags, ...).
  • Deal with multithreading - workers claim entries into the cache, and readers can get a future back when looking up items
  • Implements simulating what would happen at lower time limits.

The ResultStore is currently instantiated per submission, but we'll probably want to extend this later and add some persistence, cf. #379.

Test case reuse logic is now much improved, and symlinks between test cases are no longer a magical way to enable result reuse. If two test cases are identical, we will reuse results (with or without symlinks). It is also perfectly legal to add a symlink to save space when files are identical, even if the entire test case isn't.

The main interface to the judge module is the class SubmissionJudge which knows how to judge a submission. Instead of just returning an aggregated result, it returns all intermediary results for the entire test case tree. This allows SubmissionResult to be a much simpler class, as the caller can easily compute various warnings (e.g., failing on sample cases) based on the tree instead of needing that information to be aggregated.

Fixes #397
Fixes #383
Progress on #382
Progress on #379
Fixes #244

Comment thread problemtools/judge/submission_judge.py Outdated
@gkreitz gkreitz force-pushed the 398_refactor_submission_runs branch from bcba5b0 to c6b61e1 Compare April 27, 2026 12:37
Copy link
Copy Markdown
Contributor

@pehrsoderman pehrsoderman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very hard to review, but I can't find any obvious problems.

@pehrsoderman pehrsoderman merged commit f8cdc4b into Kattis:master Apr 27, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Behavior on empty testgroups We leak non-integer time limits Symlinks for reuse of input files with different output validator flags

2 participants