gh-146073: Add fitness/exit quality mechanism for JIT trace frontend#148089
gh-146073: Add fitness/exit quality mechanism for JIT trace frontend#148089markshannon merged 38 commits intopython:mainfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
It appears that the current parameters do not yet guarantee runtime safety; I will continue to work on fixes and optimizations. |
|
I've commented on the issue #146073 (comment) |
|
I ran some tests on macOS, and performance on the fitness branch appears to have dropped significantly. use fastbench: Machine: main branch: fitness branch: |
|
We seem to be going around in circles a bit here. @cocolato can you try out this script https://github.com/python/cpython/pull/148840/changes#diff-7d8d989c9e02ccababda3709e44e2465010f9aa25843f4764e4e742adcfaf39b to see if it offers any insight? I don't know if you can extract some of the key features of the benchmarks that are slower to find out why? |
|
Regarding performance. We also need to consider the interplay between trace fitness/length and warmup. Ideally we want to cover the hot part of the program fairly quickly, not trace any cold parts and not cover the same piece of code with multiple traces unless there is genuine polymorphism. Easier said than done though. I would prefer good traces, even it appears a little slower on one or two benchmarks and the performance is more likely to be consistent. |
|
@markshannon I run the new tests, this is the result:
So I think we should reduce |
|
Can you tell why richards is so different? I don't see how reducing Also, instead of reducing the fitness for every uop, only start decreasing after the trace is getting long but decrease it more rapidly in that case? We could add the fitness to the dumps, for more information. Once again, thanks for doing this. |
|
This comment was marked as outdated.
This comment was marked as outdated.
|
By setting
Here, the trace hasn’t reached the end of the loop yet, but due to a
|
|
I tried reducing their penalty and lowered
|
|
@markshannon gentle ping |
|
What are all the ovals with "executor_0x..." in the image for richards on this PR?
Why does lowering |
|
I think we should merge this. There are a number of flaws in implementation, largely down to lack of visibility of fitness, but there are also many flaws in the current ad-hoc approach on main. Once this is merged, we can more easier investigate issues and improve the fitness, quality and penalty numbers. The two things that we need to fix in order to fine tune the numbers are these:
With that in mind, we should:
All these can be done in different PRs. |
Sorry, They are side-exit executors because I skipped rendering them for debug: https://github.com/cocolato/cpython/blob/8ce4b53fc649dfe0e92d5ca0dc71db40ca1feacb/Python/optimizer.c#L2208-L2210 |
I lowered both branch penalty and underflow frame penalty. Lowering |
|
Never mind, I'm being dumb. By lowering So it is sort of equivalent to raising the initial fitness. |
|
Assuming the perf isnt too bad, I'm happy with the PR and I think we can merge it |
markshannon
left a comment
There was a problem hiding this comment.
This looks good now and a clear improvement in terms of maintainability and understanding over the ad-hoc approach we had before.
This not only cleans up the code, but gives a way to get better traces while keeping the code maintainable.
Now that performance is roughly on par, we can merge this.
Reopen: #147966