Extract all logic for running and judging a submission from verifyproblem to judge module #398 by gkreitz · Pull Request #411 · Kattis/problemtools

gkreitz · 2026-04-27T12:11:46Z

This PR ended up huge, but I don't think there was any way to avoid that.

This is a reimplementation of the logic for running a submission, moving everything from verifyproblem to the module problemtools.judge. Earlier code was quite confusing (IMHO), where it kept passing round 3 time limits, and returned the results for those 3 different time limits. The basic idea was good though - if we know what happened at some time limit t, we can cheaply compute what happens for t* <= t by toggling test cases to TLE and running the grader.

In the new implementation, the core piece is the ResultStore class, which achieves 3 things:

Reuse results for identical test cases - the ResultStore is a cache, and the cache key is what determines if we can reuse a result (so, sha256 of input, output, validator flags, ...).
Deal with multithreading - workers claim entries into the cache, and readers can get a future back when looking up items
Implements simulating what would happen at lower time limits.

The ResultStore is currently instantiated per submission, but we'll probably want to extend this later and add some persistence, cf. #379.

Test case reuse logic is now much improved, and symlinks between test cases are no longer a magical way to enable result reuse. If two test cases are identical, we will reuse results (with or without symlinks). It is also perfectly legal to add a symlink to save space when files are identical, even if the entire test case isn't.

The main interface to the judge module is the class SubmissionJudge which knows how to judge a submission. Instead of just returning an aggregated result, it returns all intermediary results for the entire test case tree. This allows SubmissionResult to be a much simpler class, as the caller can easily compute various warnings (e.g., failing on sample cases) based on the tree instead of needing that information to be aggregated.

Fixes #397
Fixes #383
Progress on #382
Progress on #379
Fixes #244

…some of the output)

…by problemtools.judge)

…ify function

pehrsoderman

Very hard to review, but I can't find any obvious problems.

gkreitz added 8 commits April 24, 2026 17:32

Add mildly cleaned up grader to judge/

ba7e571

Add new implementation of judging

2a2f038

Clean up old unused methods all_datasets()

0a6b8ca

Plug in new SubmissionJudge to run submissions (temporarily breaking …

ee88f21

…some of the output)

Rip out old code for running submissions from verifyproblem (relaced …

d1ea37d

…by problemtools.judge)

Remove TimeLimits - time limits are now just a simple float

5595247

Clean up flow a bit when there are no test cases to run on

5e30b00

Rename test_item => test_node which is at least a tad more descriptive

3714f55

sentry Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread problemtools/judge/submission_judge.py Outdated

gkreitz added 4 commits April 27, 2026 14:26

Restore details when printing a SubmissionResult

8f59661

Repair Runing sub on testcase... message when stdout is a tty

9589d67

Refactor - extract _aggregate_group_result from _judge_group to simpl…

4191fae

…ify function

Extract checks to simplify check_submission a bit

c6b61e1

gkreitz force-pushed the 398_refactor_submission_runs branch from bcba5b0 to c6b61e1 Compare April 27, 2026 12:37

Avoid deadlock in multithreaded runs if worker crashes

ca58ac9

gkreitz mentioned this pull request Apr 27, 2026

Allow symlinks to test cases with other output validator flags and… #276

Closed

pehrsoderman approved these changes Apr 27, 2026

View reviewed changes

pehrsoderman merged commit f8cdc4b into Kattis:master Apr 27, 2026
5 checks passed

This was referenced Apr 27, 2026

Refactor verifyproblem.py #398

Open

Don't propagate JE on empty test groups #176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract all logic for running and judging a submission from verifyproblem to judge module #398#411

Extract all logic for running and judging a submission from verifyproblem to judge module #398#411
pehrsoderman merged 13 commits intoKattis:masterfrom
gkreitz:398_refactor_submission_runs

gkreitz commented Apr 27, 2026

Uh oh!

Uh oh!

pehrsoderman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gkreitz commented Apr 27, 2026

Uh oh!

Uh oh!

pehrsoderman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants