SPANDigital · richardwooding · Apr 28, 2026 · Apr 28, 2026
diff --git a/.claude/skills/add-cel-feature/SKILL.md b/.claude/skills/add-cel-feature/SKILL.md
@@ -0,0 +1,115 @@
+---
+name: add-cel-feature
+description: Adds a new CEL function, operator, macro, or comprehension form to pycel2sql by extending Converter visit methods in src/pycel2sql/_converter.py, deciding whether a new Dialect ABC abstract method is needed, implementing it across all six dialects when SQL diverges, and adding parametrized test cases. Use when wiring a new CEL string method (toLowerCase, isAlpha), timestamp method (toISOString), free-function helper, operator overload, or comprehension form across PostgreSQL, MySQL, SQLite, DuckDB, BigQuery, and Apache Spark.
+---
+
+# Add CEL Feature
+
+Adding a new CEL surface area (function, operator, macro, comprehension) is the second-most-common contribution shape after adding a dialect. The work is the same shape every time — this skill captures it.
+
+Recent examples in pycel2sql's history: per-dialect `format()` dispatch (PR #8), `validate_schema` parameter (PR #1), `json_variables` / `column_aliases` / `param_start_index` options (PR #8). Each followed the pattern below.
+
+## Quick start
+
+1. **Classify the feature** — function call (e.g. `string.lowerAscii()`), operator (e.g. `&&`), macro (e.g. `has(x.y)`), or comprehension (e.g. `list.exists(v, pred)`).
+2. **Identify the visitor** in `_converter.py` that owns the surface area — see [references/converter-file-map.md](references/converter-file-map.md).
+3. **Decide whether a new `Dialect` ABC method is needed** — see "New method or inline?" below.
+4. **If a new method is needed**, walk the dialect-method checklist at `.claude/skills/add-sql-dialect/references/dialect-method-checklist.md` (shared between both skills — open it directly).
+5. **Add a test case** to the appropriate `tests/test_*.py` with parametrized coverage across `ALL_DIALECTS` from `tests/conftest.py`.
+6. **Run** `uv run pytest tests/ --ignore=tests/integration -v` and `uv run ruff check src/ tests/`.
+
+## New method or inline?
+
+The `Dialect` ABC at `src/pycel2sql/dialect/_base.py` is already large (~50 abstract methods). Add a new method only when needed; otherwise inline the SQL in the visitor.
+
+| Situation | Verdict |
+|---|---|
+| The SQL is identical across all six dialects | Inline in the visitor (e.g. `LOWER(x)` for `lowerAscii`, `MOD(a, b)` for `%`). |
+| Any dialect needs different syntax | New `Dialect` ABC `@abstractmethod`. |
+| A subset of dialects can't support the feature at all | New method + each unsupported dialect raises `UnsupportedDialectFeatureError` in its implementation. |
+| The feature is JSON-related, regex-related, or array-related | Look for an existing `write_…` method first — JSON / array / regex method coverage is broad. |
+
+If you add an `@abstractmethod`, the **`Dialect` ABC enforces it at instantiation time** — calling `<Dialect>()` with any abstract method missing raises `TypeError: Can't instantiate abstract class`. CI's `tests/conftest.py` instantiates every dialect in `ALL_DIALECTS`, so a missing method is caught on the first test collection. mypy strict mode also flags missing methods.
+
+## Where features live
+
+`_converter.py` is large (~80 KB / ~2000 lines) and uses Lark `Interpreter` visitor methods. The grammar rule names map to method names — `member_dot_arg` for method calls (`obj.method(args)`), `ident_arg` for free-function calls (`func(args)`), `addition_add` for `+`, `relation_eq` for `==`, etc.
+
+For a method-call feature like `string.lowerAscii()`:
+- Entry point is `member_dot_arg` (`_converter.py:607`).
+- The method name dispatches via the long if/elif chain at lines 619–691.
+- `LOWER(...)` is universal SQL → inline (no new `Dialect` method).
+
+For a method-call feature like `string.format([args])`:
+- Entry point is also `member_dot_arg`.
+- SQL diverges — Postgres/BigQuery use `FORMAT`, SQLite/DuckDB use `printf`, Spark uses `format_string`, MySQL has no equivalent → new `Dialect.write_format()` method (added in PR #8). Each dialect implements its own form; MySQL raises.
+
+For a free-function feature like `has(x.y)`:
+- Entry point is `ident_arg` (`_converter.py:757`).
+- Dispatch is via the `func_name` if/elif at lines 763–820.
+- `has()` is dialect-divergent (Postgres JSONB `?` operator vs others' `IS NOT NULL`) → uses `Dialect.write_json_existence`.
+
+For an operator like `+`:
+- Entry point is `addition` (`_converter.py:415`).
+- Operator-name dispatch via `op_name` (`addition_add` / `addition_sub`).
+- Cross-dialect handling routes through `Dialect.write_string_concat` for string contexts, plain `+` for numeric.
+
+For a macro like `exists` / `all` / `filter` / `map`:
+- Entry point is `member_dot_arg`, but the chain immediately routes to `_visit_comprehension` (`_converter.py:615`).
+- Comprehensions use the `Dialect` `write_unnest` / `write_array_subquery_open` / `write_array_subquery_expr_close` triple plus per-method scaffolding.
+
+For the file-by-file map of which CEL surface area each section of `_converter.py` owns, see [references/converter-file-map.md](references/converter-file-map.md).
+
+## Parser-level vs visitor-level
+
+CEL features that pycel2sql can support are limited by what `cel-python`'s Lark grammar parses. If the grammar doesn't recognise a syntax (e.g. some CEL-extension functions), the feature can't be added at the visitor level — `_parser.parse(cel_expr)` will fail before reaching the converter. Test the parse first:
+
+```python
+from celpy.celparser import CELParser
+CELParser().parse('"hello".isAlpha()')  # raises CELParseError if unsupported
+```
+
+If parsing fails, the feature requires upstream work in `cel-python`, not in pycel2sql.
+
+## Test coverage
+
+Add a representative test to the appropriate file in `tests/`:
+
+| CEL surface | Test file |
+|---|---|
+| String methods | `tests/test_string_functions.py` |
+| Timestamps / durations | `tests/test_timestamps.py` |
+| JSON access / `has()` | `tests/test_json.py` |
+| Comprehensions (`exists`, `all`, `filter`, `map`) | `tests/test_comprehensions.py` |
+| Type casts (`int`, `string`, `bool`, etc.) | `tests/test_convert.py` (general) or a dialect-specific file |
+| Universal SQL (operators, comparisons) | `tests/test_dialect_parametrized.py` |
+| Parameterized output | `tests/test_parameterized.py` |
+| New conversion options | `tests/test_options.py` |
+
+When the SQL is identical across dialects, parametrize over `ALL_DIALECTS` from `tests/conftest.py`:
+
+```python
+import pytest
+from tests.conftest import ALL_DIALECTS
+
+@pytest.mark.parametrize("dialect", ALL_DIALECTS)
+def test_lower_ascii(dialect):
+    assert convert("name.lowerAscii()", dialect=dialect) == "LOWER(name)"
+```
+
+When the SQL diverges, write per-dialect cases (mirror `tests/test_string_functions.py::TestFormatPerDialect`).
+
+## Verification
+
+```bash
+uv run ruff check src/ tests/
+uv run mypy src/pycel2sql/  # pre-existing lark errors expected
+uv run pytest tests/ --ignore=tests/integration -v
+
+python .claude/skills/skill-authoring/scripts/lint_skill.py .claude/skills/add-cel-feature/
+```
+
+## References
+
+- [references/converter-file-map.md](references/converter-file-map.md) — which section of `_converter.py` owns which CEL surface area, with line-number anchors.
+- The `Dialect` ABC method checklist lives in the `add-sql-dialect` skill at `.claude/skills/add-sql-dialect/references/dialect-method-checklist.md` — when you add a new `@abstractmethod`, open and update that file too.
diff --git a/.claude/skills/add-cel-feature/references/converter-file-map.md b/.claude/skills/add-cel-feature/references/converter-file-map.md
@@ -0,0 +1,174 @@
+# Converter File Map
+
+A map of which CEL surface area each section of `src/pycel2sql/_converter.py` owns. Line numbers are approximate (the file evolves) — search by method name to find the current location.
+
+## Contents
+
+- Top-level entry
+- Logical operators
+- Comparison operators
+- `in` operator
+- Arithmetic
+- Unary
+- Member access (dot / index / dot-arg)
+- Identifiers and free-function calls
+- Literals
+- Compound literals (lists, maps)
+- String methods (contains, startsWith, endsWith, etc.)
+- Size, has, type cast
+- Timestamp / duration
+- Comprehensions
+
+## Top-level entry
+
+| Method | Owns |
+|---|---|
+| `expr` | `?:` ternary; otherwise pass-through to single child. |
+| `visit` | Override of Lark `Interpreter.visit` to handle bare `Token` children. |
+| `_visit_child` | Depth-tracking visitor; every internal recursion goes through this. |
+
+## Logical operators
+
+| Method | Owns |
+|---|---|
+| `conditionalor` | `lhs \|\| rhs` → `lhs OR rhs`. |
+| `conditionaland` | `lhs && rhs` → `lhs AND rhs`. |
+
+## Comparison operators
+
+The `relation` method dispatches to operator-specific helpers. Each `relation_<op>` method handles null-aware emission (`IS NULL` / `IS NOT NULL` / `IS TRUE` / etc.) and JSON-text-extraction numeric coercion.
+
+| Method | Owns |
+|---|---|
+| `relation` | Dispatches to per-operator helpers. |
+| `relation_eq`, `relation_ne` | `==`, `!=`. Handles null/bool comparisons specially. |
+| `relation_lt`, `relation_le`, `relation_gt`, `relation_ge` | `<`, `<=`, `>`, `>=`. |
+| `relation_in` | `x in [list]` / `x in arr`; routes through `_visit_in` and `Dialect.write_array_membership`. |
+
+## `in` operator
+
+| Method | Owns |
+|---|---|
+| `_visit_in` | Routes `x in arr` to `Dialect.write_array_membership`. Currently does not yet route JSON-array membership through the dedicated `Dialect.write_json_array_membership` method (those abstract methods exist in the ABC but no call site invokes them yet). |
+
+## Arithmetic
+
+| Method | Owns |
+|---|---|
+| `addition` | `+`, `-` dispatch. **Critical**: timestamp-arithmetic check happens BEFORE string-concat check (temporal functions contain string literals; CLAUDE.md notes this). |
+| `addition_add`, `addition_sub` | Trivial pass-throughs (the work is in `addition`). |
+| `multiplication` | `*`, `/`, `%` dispatch. `%` becomes `MOD(a, b)` (universal SQL). |
+| `multiplication_mul`, `multiplication_div`, `multiplication_mod` | Trivial pass-throughs. |
+
+## Unary
+
+| Method | Owns |
+|---|---|
+| `unary`, `unary_not`, `unary_neg` | `!x` → `NOT x`; `-x` → `-x`. |
+
+## Member access
+
+The most-trafficked area. Three entry points:
+
+| Method | Owns |
+|---|---|
+| `member` | Pass-through. |
+| `member_dot` | `obj.field` access. Routes through `_emit_json_path` for JSON-path access (schema-declared JSON or `json_variables`); otherwise emits plain `.field`. Applies `validate_field_name`. |
+| `member_dot_arg` | `obj.method(args)` — the **method-call dispatcher**. The long if/elif chain checks `method_name` and routes to the appropriate `_visit_<feature>` helper. New string/timestamp/comprehension methods land here. |
+| `member_index` | `obj[idx]` and `obj["key"]`. String literal index → JSON path (when root is a `json_variable`) or `.key` plain access. Integer index → `Dialect.write_list_index_const`. Dynamic index → `Dialect.write_list_index`. |
+| `member_object` | Type construction `T{f: v}` (rare). |
+
+When adding a new CEL **method** like `string.toLowerCase()`, the entry point is the `member_dot_arg` if/elif chain.
+
+## Identifiers and free-function calls
+
+| Method | Owns |
+|---|---|
+| `primary` | Pass-through. |
+| `ident` | Bare identifier emission. Applies `column_aliases` (PR #8) before `validate_field_name`. |
+| `ident_arg` | `func(args)` — the **free-function-call dispatcher**. Long if/elif chain on `func_name`. Owns `has`, `size`, `matches`, `int`/`uint`/`double`/`string`/`bool`/`bytes`/`timestamp`/`duration` casts, `now()`, `getInterval()`, `dyn()`. |
+| `dot_ident_arg`, `dot_ident` | `.name` (rare CEL syntax). |
+| `paren_expr` | `(expr)` parenthesisation. |
+
+When adding a new CEL **free function** like `format(...)` or `has(...)`, the entry point is the `ident_arg` if/elif chain.
+
+## Literals
+
+| Method | Owns |
+|---|---|
+| `literal` | All scalar literals — strings, ints, uints, floats, bools, null, bytes. Routes through dialect-specific `write_string_literal` / `write_bytes_literal`. Parameterized mode replaces literals with placeholders via `_add_param`. |
+
+## Compound literals
+
+| Method | Owns |
+|---|---|
+| `list_lit` | `[1, 2, 3]` array literals — calls `Dialect.write_array_literal_open` / `write_array_literal_close`. |
+| `map_lit` | `{a: 1, b: 2}` map/struct literals — calls `Dialect.write_struct_open` / `write_struct_close`. |
+| `exprlist` | Comma-separated expression list (used inside list/map/format args). |
+| `mapinits` | Map initialiser list. |
+| `fieldinits` | Struct initialiser list. |
+
+## String methods
+
+Each method lives in a `_visit_<name>` helper. Most route through a `Dialect.write_<name>` method to absorb dialect divergence.
+
+| Helper | CEL surface | Notes |
+|---|---|---|
+| `_visit_contains` | `s.contains(needle)` | → `Dialect.write_contains`. |
+| `_visit_starts_with` | `s.startsWith(prefix)` | → `LIKE 'prefix%'` + dialect-specific ESCAPE clause. |
+| `_visit_ends_with` | `s.endsWith(suffix)` | → `LIKE '%suffix'`. |
+| `_visit_matches_method` | `s.matches(pattern)` | → `Dialect.convert_regex` + `write_regex_match`. |
+| `_visit_matches_func` | `matches(s, pattern)` | Same as above, function form. |
+| `_visit_char_at` | `s.charAt(idx)` | → `SUBSTRING(s, idx + 1, 1)`. |
+| `_visit_index_of` | `s.indexOf(n)` / `s.indexOf(n, off)` | Inline `POSITION(...)` (universal). |
+| `_visit_last_index_of` | `s.lastIndexOf(n)` | Inline. |
+| `_visit_substring` | `s.substring(start)` / `s.substring(start, end)` | `SUBSTRING(...)`. |
+| `_visit_replace` | `s.replace(old, new)` | `REPLACE(...)`. |
+| `_visit_split` | `s.split(d)` / `s.split(d, n)` | → `Dialect.write_split` / `write_split_with_limit`. |
+| `_visit_join` | `arr.join(d)` | → `Dialect.write_join`. |
+| `_visit_format` | `s.format([args])` | → `Dialect.write_format` (added in PR #8). The `%d`/`%f` → `%s` normalization happens unconditionally before dispatch. |
+
+## Size, has, type cast
+
+| Helper | CEL surface | Notes |
+|---|---|---|
+| `_visit_size_method` | `x.size()` | Routes to `Dialect.write_array_length` for arrays, plain `LENGTH(x)` for strings. |
+| `_visit_size_func` | `size(x)` | Same as above, function form. |
+| `_visit_has` | `has(a.b)` / `has(jsonvar.k)` | Routes to `Dialect.write_json_existence` for JSON paths; otherwise emits `... IS NOT NULL`. The `json_variables` audit (PR #8) added the JSON-variable branch. |
+| `_visit_type_cast` | `int(x)`, `uint(x)`, `double(x)`, `string(x)`, `bool(x)`, `bytes(x)` | Special-case for `int(timestamp)` → `Dialect.write_epoch_extract`. Other casts emit `CAST(x AS <dialect_type>)` via `Dialect.write_type_name`. |
+
+## Timestamp / duration
+
+| Helper | CEL surface | Notes |
+|---|---|---|
+| `_visit_timestamp_func` | `timestamp("...")` | → `Dialect.write_timestamp_cast`. |
+| `_visit_duration_func` | `duration("...")` | Parses Go-style duration string (`24h`, `30m`, `1500ms`). Limits: only h/m/s/ms/us/ns recognised. → `Dialect.write_duration`. |
+| `_visit_interval_func` | `interval(value, unit)` | Dynamic-value variant. → `Dialect.write_interval`. |
+| `_visit_datetime_constructor` | `date(...)`, `time(...)`, etc. | Constructor-style date/time helpers. |
+| `_visit_current_datetime` | `now()`, `today()` | → `CURRENT_TIMESTAMP` / `CURRENT_DATE` (universal). |
+| `_visit_timestamp_extract` | `t.getFullYear()`, `t.getMonth()`, `t.getDayOfWeek()`, etc. | Routes to `Dialect.write_extract`. **Critical**: DOW maps differ (Sunday=0 vs Sunday=1) — handled per-dialect in `write_extract`. |
+
+## Comprehensions
+
+| Helper | CEL surface |
+|---|---|
+| `_visit_comprehension` | Top-level dispatch on macro name. |
+| `_visit_comp_all` | `list.all(v, pred)` |
+| `_visit_comp_exists` | `list.exists(v, pred)` |
+| `_visit_comp_exists_one` | `list.exists_one(v, pred)` |
+| `_visit_comp_map` | `list.map(v, transform)` |
+| `_visit_comp_map_filter` | `list.map(v, pred, transform)` |
+| `_visit_comp_filter` | `list.filter(v, pred)` |
+
+All comprehension helpers use the `Dialect.write_unnest` + `write_array_subquery_open` + `write_array_subquery_expr_close` triple. Spark's variant (`(SELECT collect_list(...))`) needs the close to emit `)` because `collect_list` is an aggregator wrapper; other dialects close with empty.
+
+## Common patterns
+
+When adding a new CEL feature, the workflow is almost always:
+
+1. Find the entry point in `member_dot_arg` (for methods) or `ident_arg` (for free functions).
+2. Add an `elif method_name == "<name>": self._visit_<name>(obj, args); return`.
+3. Add a `_visit_<name>` helper near the existing string-method helpers (or wherever fits semantically).
+4. Decide universal vs dialect-divergent (see SKILL.md "New method or inline?").
+5. Add a parametrized test in the appropriate file under `tests/`.
+6. If a new `Dialect` `@abstractmethod` is added, update every dialect file AND the dialect-method-checklist in the `add-sql-dialect` skill.