Skip to content

Scope the primary-key → _id rename to the outermost document#204

Open
scottmessinger wants to merge 2 commits into
masterfrom
scope-pk-rename-to-top-level
Open

Scope the primary-key → _id rename to the outermost document#204
scottmessinger wants to merge 2 commits into
masterfrom
scope-pk-rename-to-top-level

Conversation

@scottmessinger
Copy link
Copy Markdown
Collaborator

The MongoDB convention of renaming the schema's primary-key field to
_id is currently applied at every level of recursion. Once
Conversions.from_ecto_pk/2 descends into a nested map (for example an
embeds_one sub-document), the parent schema's pk is still threaded
through, so any nested field named :id is also rewritten to :_id.

That means an embedded value like

%{jurisdiction: %{id: "MD", title: "Maryland"}}

gets stored as

{ jurisdiction: { _id: "MD", title: "Maryland" } }

even when the embedded schema declared its own
@primary_key {:id, :string, autogenerate: false}. Queries that filter
on jurisdiction.id (the common shape for apps that put a string
identifier on a sub-document) return nothing, and the wire format
diverges from what other Mongo clients (the Ruby driver, the Node
driver, plain mongosh inserts) would write for the same model.

The pk → _id rename is only meaningful at the top of a document.
Below that, the field names are user-controlled and should be passed
through unchanged. This patch makes the recursive calls inside
document/2 and document/3 pass nil for the pk, so the rename
only fires for the outermost document.

Repro (before this patch)

defmodule MySet do
  use Ecto.Schema
  @primary_key {:id, :string, autogenerate: false}
  schema "my_sets" do
    field :title, :string
    embeds_one :jurisdiction, Jurisdiction
  end

  defmodule Jurisdiction do
    use Ecto.Schema
    @primary_key {:id, :string, autogenerate: false}
    embedded_schema do
      field :title, :string
    end
  end
end

%MySet{}
|> Ecto.Changeset.cast(%{id: "X", title: "t", jurisdiction: %{id: "MD", title: "Maryland"}}, [:id, :title])
|> Ecto.Changeset.cast_embed(:jurisdiction)
|> Repo.insert!()

Mongo.Ecto.command(Repo, find: "my_sets", filter: %{})
# => firstBatch with: %{"_id" => "X", "jurisdiction" => %{"_id" => "MD", ...}}

After the patch the nested key is id, not _id.

Read path

Conversions.to_ecto_pk/2 (the read path) is left as-is for now. Its
rename of nested "_id"Atom.to_string(pk) only matters when a
sub-document actually contains an _id field. Mongo only auto-creates
_id on top-level documents, so this is rare in practice. A symmetric
read-side change can follow separately if there's appetite — happy to
add it to this PR if you'd prefer.

Tests

The existing mongodb_ecto test suite passes against this change.
Downstream verification: the CSP API
(commonstandardsproject/api, vendored copy at elixir/vendor_deps/mongodb_ecto/)
goes from 17 broken read-side contract tests to a clean run of 34/34
once jurisdiction.id queries stop missing.

The MongoDB convention of renaming the schema's primary-key field to
`_id` was being applied recursively. Once `Conversions.from_ecto_pk/2`
descended into a nested map (for example an `embeds_one` sub-document),
the parent schema's `pk` value was still threaded through, so any
nested field named `:id` was also rewritten to `:_id`.

That meant an embedded value like

    %{jurisdiction: %{id: "MD", title: "Maryland"}}

was stored as

    {jurisdiction: {_id: "MD", title: "Maryland"}}

even when the embedded schema declared its own `@primary_key {:id, :string,
autogenerate: false}`. Queries that filter on `jurisdiction.id` (the
common shape for apps that put a string identifier on a sub-document)
return nothing, and the wire format diverges from what other clients
(Ruby driver, Node driver, etc.) write for the same model.

The pk -> _id rename is only meaningful at the top of a document. Below
that, the field names are user-controlled and should be passed through
unchanged. This patch makes the recursive call inside `document/2` and
`document/3` pass `nil` for the pk argument, so the rename only fires
for the outermost document.

`Conversions.to_ecto_pk/2` (the read path) is left as-is for now. Its
rename of nested `"_id"` -> `Atom.to_string(pk)` only matters when a
sub-document actually contains an `_id` field, which is uncommon in
hand-written documents (Mongo only auto-adds `_id` to top-level docs).
A symmetric read-side change can follow separately if there's appetite.
Copilot AI review requested due to automatic review settings May 20, 2026 14:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes MongoDB serialization so the schema primary-key → _id rename only applies to the outermost document, preventing nested sub-documents (e.g., embeds_one) from having their own :id fields incorrectly rewritten to :_id.

Changes:

  • Update document/2 to recurse into nested values with pk = nil, avoiding accidental :id:_id rewrites inside embedded/nested documents.
  • Update document/3 (used via inject_params/3) with the same pk = nil recursion behavior for parameter injection paths.
  • Add explanatory inline comments describing the bug and rationale.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/mongo_ecto/conversions.ex
Pure unit tests on `Mongo.Ecto.Conversions.from_ecto_pk/2` and
`inject_params/3` covering the recursion fix:

- top-level `:id` is renamed to `:_id`
- a nested map's `:id` is left alone even when the parent pk is `:id`
- the same holds inside a list of nested maps
- `pk: nil` is a no-op

Verified by reverting `conversions.ex` to the pre-patch version: 3 of
the 5 tests fail (nested `:id` ends up as `:_id`). With the patch they
all pass.
Copy link
Copy Markdown
Collaborator

@mweidner037 mweidner037 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale sounds valid, though I am not familiar with the actual code.

I'm surprised we have not run into this issue before. It looks like our StandardSet.jurisdiction fields are at least saved correctly already (the DB has jurisdiction.id instead of jurisdiction._id).

I believe this counts as a breaking change, in case consumers depend on the old functionality (e.g., they renamed their embedded document IDs to _id).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants