fix(TermCache): log+skip anchorless orphan terms instead of throwing by sriram-atlan · Pull Request #2523 · atlanhq/atlan-java

sriram-atlan · 2026-05-22T06:24:54Z

Summary

When TermCache.refreshCache() encounters a single GlossaryTerm whose anchor relationship can't be resolved (e.g. all anchor edges are soft-deleted), it currently throws and aborts the entire cache initialisation. Every downstream test that depends on the cache then fails with an unrelated stack trace.

Wrap the resolve call in identityForAssetOrLog() — catch IllegalStateException, log a structured warning, return null. Both call sites (refreshCache and lookupById) treat null as "skip this term, continue with the rest of the cache." getIdentityForAsset itself still throws (right contract for callers that actually need a resolved identity); the safe variant is opt-in at bulk-scan call sites.

Trigger

Daily Test (leangraph-test) workflow run 26269147160 — 10 failures, of which 5 were asset-import: chunk 0/1/2/3/4 all blowing up with:

java.lang.IllegalStateException: Term found with no anchor: {
  "guid":"98ef065d-0a0c-449c-8e86-25f66e2b4199",
  "name":"move-catterm-1779369779-d92b04",
  "status":"ACTIVE",
  "attributes":{"qualifiedName":"V281NCOPyFunNTc096In9@OSO7b3NpeKujuYgwZCrW7", "name":"move-catterm-1779369779-d92b04"}
}
    at com.atlan.pkg.cache.TermCache.getIdentityForAsset(TermCache.kt:109)

Root cause (data side — not in this PR)

Direct probes against leangraph-test:

The orphan term IS in ES with __state = ACTIVE
Its anchor relationship in the entity API points to a glossary but with "relationshipStatus": "DELETED"
16 such move-* entities (4 glossaries + 6 terms + 6 categories) were residue from atlas-metastore's nightly dev-support/test-harness cron at 04:30 UTC, which had partially-failed cleanup
The atlan-java daily workflow at 04:53 UTC then picked them up

The 16 orphans have been purged manually on the tenant to unblock the next workflow run. The cron-collision and harness cleanup gaps are tracked separately for follow-up:

Stagger workflow schedules so atlan-java + atlas-metastore test-harness don't overlap on the same shared tenant
Make test_glossary_qn_moves.py cleanup use ?deleteType=PURGE and run unconditionally on test failure

But the SDK should not blow up everyone else's tests when it encounters a single tenant-side anomaly — that's what this PR addresses.

What this PR changes

TermCache.refreshCache() and TermCache.lookupById() now call identityForAssetOrLog(term) instead of getIdentityForAsset(term) directly:

private fun identityForAssetOrLog(asset: GlossaryTerm): String? =
    try {
        getIdentityForAsset(asset)
    } catch (e: IllegalStateException) {
        logger.warn { "Skipping term ${asset.guid} (name='${asset.name}') with no resolvable anchor — ..." }
        null
    }

getIdentityForAsset itself is unchanged — still throws on inconsistent data, so code that needs a resolved identity will fail loudly. The wrapper is opt-in at bulk-scan call sites where one bad term shouldn't kill the whole refresh.

Test plan

PR CI green (existing unit tests should still pass; this is purely additive)
After merge, the next Test (leangraph-test) workflow run completes asset-import without the "Term found with no anchor" crash, even if new orphan terms appear in the tenant. Skipped terms appear as WARN lines in the test logs.
No regression in any existing TermCache behaviour for tenants without orphan data — terms still cache by name@glossaryName

🤖 Generated with Claude Code

TermCache.refreshCache() scans every active GlossaryTerm in the tenant and calls getIdentityForAsset(term) on each. That method throws IllegalStateException("Term found with no anchor: ...") if the term's anchor relationship can't be resolved to a glossary name. The throw is unconditional — a single inconsistent term aborts the entire cache refresh, and every downstream test that depends on the cache being initialised then fails with the same stack trace. This is exactly what happened on the leangraph-test daily workflow run 26269147160: another nightly job (atlas-metastore dev-support/test-harness suite test_glossary_qn_moves.py, cron 04:30 UTC) created `move-*` terms, moved them between glossaries, and a partially-failed cleanup pass left 6 ACTIVE terms whose anchor edge had relationshipStatus=DELETED. The atlan-java workflow (dispatched 22 min later) then crashed every asset-import chunk with: java.lang.IllegalStateException: Term found with no anchor: { ... } The anchor inconsistency is real data and there are deeper fixes warranted elsewhere (test harness should fully PURGE its residue, workflow schedules shouldn't overlap on the shared tenant). But the SDK should not blow up everyone else's tests when it encounters a single tenant-side anomaly. Wrap the call in identityForAssetOrLog() — catch IllegalStateException, log a structured warning identifying the offending term's guid + name, return null. Both call sites (refreshCache + lookupById) treat null as "skip this term, continue with the rest of the cache." getIdentityForAsset itself still throws (it's the right contract — the data IS inconsistent and code that depends on a resolved identity should fail loudly); the new safe variant is opt-in at the call sites that perform bulk scans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sriram-atlan requested a review from cmgrote as a code owner May 22, 2026 06:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(TermCache): log+skip anchorless orphan terms instead of throwing#2523

fix(TermCache): log+skip anchorless orphan terms instead of throwing#2523
sriram-atlan wants to merge 1 commit into
mainfrom
fix-termcache-skip-anchorless

sriram-atlan commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sriram-atlan commented May 22, 2026

Summary

Trigger

Root cause (data side — not in this PR)

What this PR changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant