Skip to content

fix: pass elem writer to JSON array membership dialect methods#20

Open
richardwooding wants to merge 1 commit into
mainfrom
feat/dialect-json-membership-elem
Open

fix: pass elem writer to JSON array membership dialect methods#20
richardwooding wants to merge 1 commit into
mainfrom
feat/dialect-json-membership-elem

Conversation

@richardwooding
Copy link
Copy Markdown
Contributor

Problem

Dialect.writeJSONArrayMembership and Dialect.writeNestedJSONArrayMembership received only the array expression. The caller (Converter.visitInJSONArray) emitted elem = <dialect-output> inline, which worked accidentally for PostgreSQL but was silently broken for every other dialect:

Dialect Generated SQL (before) Correct?
PostgreSQL elem = ANY(ARRAY(SELECT jsonb_array_elements_text(arr)))
MySQL elem = JSON_CONTAINS(arr, CAST(? AS JSON)) — compares elem to 0/1
SQLite elem = (SELECT value FROM json_each(arr)) — scalar subquery returns last row for multi-element arrays
DuckDB elem = (SELECT value FROM json_each(arr)) — same multi-row scalar subquery bug
BigQuery elem = UNNEST(JSON_VALUE_ARRAY(arr))= UNNEST() is invalid syntax
Spark throws ConversionException — couldn't build a boolean predicate without the element

The Spark throw (added in PR #10) was the canary that revealed the contract was broken: it needed the element to emit array_contains(...) but the interface didn't provide it.

Fix

Widen both method signatures to accept writeElem alongside writeArray. The converter no longer emits elem = itself; each dialect now owns the full boolean predicate.

New signatures:

void writeJSONArrayMembership(StringBuilder w, String jsonFunc, SqlWriter writeElem, SqlWriter writeArray) throws ConversionException;
void writeNestedJSONArrayMembership(StringBuilder w, SqlWriter writeElem, SqlWriter writeArray) throws ConversionException;

Generated SQL per dialect (after)

Dialect Generated SQL
PostgreSQL elem = ANY(ARRAY(SELECT jsonb_array_elements_text(arr)))
MySQL JSON_OVERLAPS(JSON_ARRAY(elem), arr) — consistent with writeArrayMembership
SQLite EXISTS (SELECT 1 FROM json_each(arr) WHERE value = elem)
DuckDB EXISTS (SELECT 1 FROM json_each(arr) WHERE value = elem)
BigQuery elem IN UNNEST(JSON_VALUE_ARRAY(arr))
Spark array_contains(from_json(arr, 'ARRAY<STRING>'), elem)

Changes

  • Dialect.java — interface signature widened (breaking change for external implementors)
  • Converter.java::visitInJSONArray — removed inline elem = prefix, passes () -> visit(elem) as new arg
  • All 6 dialect implementations updated
  • Cel2SqlSparkTest.jsonArrayMembershipThrowsClearly renamed to jsonArrayMembershipEmitsArrayContains and rewritten to assert correct SQL instead of asserting a throw

Testing

./gradlew build passes (compile + all unit tests green).

Note: there are no existing unit tests that exercise the JSON-array-membership code path via visitInJSONArray (those paths require a Schema or a jsonVariables option to route through visitInJSONArray). The Spark dialect-level test directly exercises the new method signature. Integration tests for full end-to-end coverage of the MySQL/SQLite/DuckDB paths can be added in a follow-up.


Generated by Claude Code

The two Dialect methods writeJSONArrayMembership and
writeNestedJSONArrayMembership previously received only the array
expression, forcing the caller (Converter.visitInJSONArray) to
emit `elem = <dialect-output>` inline. This worked accidentally for
PostgreSQL (ANY(ARRAY(...))), but was broken for every other dialect:

- SQLite/DuckDB emitted a scalar subquery that silently returns the
  last row for multi-element arrays.
- BigQuery emitted `= UNNEST(...)` which is invalid; BigQuery requires
  `IN UNNEST(...)`.
- MySQL emitted `elem = JSON_CONTAINS(arr, CAST(? AS JSON))`,
  comparing the element to 0/1 rather than testing membership.
- Spark had to throw ConversionException at conversion time because
  it couldn't build a correct boolean predicate without the element.

The fix widens both method signatures to accept a `writeElem` writer
alongside the existing `writeArray` writer. The converter no longer
emits `elem = ` itself; each dialect now owns the full predicate:

  PostgreSQL:  elem = ANY(ARRAY(SELECT jsonFunc(arr)))  [unchanged semantics]
  MySQL:       JSON_OVERLAPS(JSON_ARRAY(elem), arr)     [consistent with writeArrayMembership]
  SQLite:      EXISTS (SELECT 1 FROM json_each(arr) WHERE value = elem)
  DuckDB:      EXISTS (SELECT 1 FROM json_each(arr) WHERE value = elem)
  BigQuery:    elem IN UNNEST(JSON_VALUE_ARRAY(arr))
  Spark:       array_contains(from_json(arr, 'ARRAY<STRING>'), elem)

The Spark test that asserted the throw is replaced with an assertion
that the correct SQL is emitted.

Co-authored-by: Claude <claude@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants