Skip to content

HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates#6477

Open
rubenada wants to merge 7 commits into
apache:masterfrom
rubenada:HIVE-29479
Open

HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates#6477
rubenada wants to merge 7 commits into
apache:masterfrom
rubenada:HIVE-29479

Conversation

@rubenada
Copy link
Copy Markdown

See HIVE-29479.

What changes were proposed in this pull request?

This PR adapts FilterSelectivityEstimator so that histogram statistics are used for all types of range predicates (so far it was only done for single-sided range predicates and bounded ranges).

Why are the changes needed?

This PR allows the CBO planner to use histogram statistics for all types of range predicates.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests added.

@rubenada rubenada marked this pull request as ready for review May 12, 2026 13:44
@rubenada rubenada changed the title [WIP] HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates May 12, 2026
Copy link
Copy Markdown
Contributor

@soumyakanti3578 soumyakanti3578 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, but maybe we should add tests for IS [NOT] NULL with SEARCH, and also a couple of tests for SEARCH containing both ranges and points?

@soumyakanti3578
Copy link
Copy Markdown
Contributor

Please go through the Sonar issues as well. I think some of them are good suggestions. :)

Copy link
Copy Markdown
Contributor

@thomasrebele thomasrebele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! The overall approach looks good. I've made a few suggestions and requests.

selectivityList.add(rexBuilder.makeCall(HiveIn.INSTANCE, operands).accept(FilterSelectivityEstimator.this));
}

return selectivityList.size() == 1 ? selectivityList.get(0) : computeDisjunctionSelectivity(selectivityList);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand computeDisjunctionSelectivity, it combines the selectivities by multiplying their complements. To use the information from the histogram, we would need to add the selectivities (but making sure we stay within the possible values [0,1]).
I wonder how accurate estimating selectivity of individual values are compared to the ranges. This also depends on the type of histogram (I had created a ticket about the shortcoming of KLL) . I think we can just add the selectivities for individual values and ranges together, and revisit this in the future in case of need.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I have applied the change to add the ranges selectivities (and it gives better results in certain tests, as expected).
I have kept the "disjunction logic" for the combination with the other expressions (EQ/IN, IS_NULL), since it seems more aligned with how this class works in general when computing OR-combined RexNodes. But I agree this can be revisits in the future.

rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, ref).accept(FilterSelectivityEstimator.this));
}

RangeSets.forEach(sarg.rangeSet, new RangeSets.Consumer<C>() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the code could be simplified with for (Range<C> range : sarg.rangeSet.asRanges()) { ... }? It might be possible to treat a range as-is, without differentiating all the different kinds of ranges.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I have applied the suggested refactoring.

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants