fix: pass elem writer to JSON array membership dialect methods#20
Open
richardwooding wants to merge 1 commit into
Open
fix: pass elem writer to JSON array membership dialect methods#20richardwooding wants to merge 1 commit into
richardwooding wants to merge 1 commit into
Conversation
The two Dialect methods writeJSONArrayMembership and writeNestedJSONArrayMembership previously received only the array expression, forcing the caller (Converter.visitInJSONArray) to emit `elem = <dialect-output>` inline. This worked accidentally for PostgreSQL (ANY(ARRAY(...))), but was broken for every other dialect: - SQLite/DuckDB emitted a scalar subquery that silently returns the last row for multi-element arrays. - BigQuery emitted `= UNNEST(...)` which is invalid; BigQuery requires `IN UNNEST(...)`. - MySQL emitted `elem = JSON_CONTAINS(arr, CAST(? AS JSON))`, comparing the element to 0/1 rather than testing membership. - Spark had to throw ConversionException at conversion time because it couldn't build a correct boolean predicate without the element. The fix widens both method signatures to accept a `writeElem` writer alongside the existing `writeArray` writer. The converter no longer emits `elem = ` itself; each dialect now owns the full predicate: PostgreSQL: elem = ANY(ARRAY(SELECT jsonFunc(arr))) [unchanged semantics] MySQL: JSON_OVERLAPS(JSON_ARRAY(elem), arr) [consistent with writeArrayMembership] SQLite: EXISTS (SELECT 1 FROM json_each(arr) WHERE value = elem) DuckDB: EXISTS (SELECT 1 FROM json_each(arr) WHERE value = elem) BigQuery: elem IN UNNEST(JSON_VALUE_ARRAY(arr)) Spark: array_contains(from_json(arr, 'ARRAY<STRING>'), elem) The Spark test that asserted the throw is replaced with an assertion that the correct SQL is emitted. Co-authored-by: Claude <claude@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Dialect.writeJSONArrayMembershipandDialect.writeNestedJSONArrayMembershipreceived only the array expression. The caller (Converter.visitInJSONArray) emittedelem = <dialect-output>inline, which worked accidentally for PostgreSQL but was silently broken for every other dialect:elem = ANY(ARRAY(SELECT jsonb_array_elements_text(arr)))elem = JSON_CONTAINS(arr, CAST(? AS JSON))— compares elem to 0/1elem = (SELECT value FROM json_each(arr))— scalar subquery returns last row for multi-element arrayselem = (SELECT value FROM json_each(arr))— same multi-row scalar subquery bugelem = UNNEST(JSON_VALUE_ARRAY(arr))—= UNNEST()is invalid syntaxConversionException— couldn't build a boolean predicate without the elementThe Spark throw (added in PR #10) was the canary that revealed the contract was broken: it needed the element to emit
array_contains(...)but the interface didn't provide it.Fix
Widen both method signatures to accept
writeElemalongsidewriteArray. The converter no longer emitselem =itself; each dialect now owns the full boolean predicate.New signatures:
Generated SQL per dialect (after)
elem = ANY(ARRAY(SELECT jsonb_array_elements_text(arr)))JSON_OVERLAPS(JSON_ARRAY(elem), arr)— consistent withwriteArrayMembershipEXISTS (SELECT 1 FROM json_each(arr) WHERE value = elem)EXISTS (SELECT 1 FROM json_each(arr) WHERE value = elem)elem IN UNNEST(JSON_VALUE_ARRAY(arr))array_contains(from_json(arr, 'ARRAY<STRING>'), elem)Changes
Dialect.java— interface signature widened (breaking change for external implementors)Converter.java::visitInJSONArray— removed inlineelem =prefix, passes() -> visit(elem)as new argCel2SqlSparkTest.jsonArrayMembershipThrowsClearlyrenamed tojsonArrayMembershipEmitsArrayContainsand rewritten to assert correct SQL instead of asserting a throwTesting
./gradlew buildpasses (compile + all unit tests green).Note: there are no existing unit tests that exercise the JSON-array-membership code path via
visitInJSONArray(those paths require aSchemaor ajsonVariablesoption to route throughvisitInJSONArray). The Spark dialect-level test directly exercises the new method signature. Integration tests for full end-to-end coverage of the MySQL/SQLite/DuckDB paths can be added in a follow-up.Generated by Claude Code