feat: Target Gene Mapping Table#719
Draft
bencap wants to merge 3 commits intofeature/bencap/627/job-traceabilityfrom
Draft
feat: Target Gene Mapping Table#719bencap wants to merge 3 commits intofeature/bencap/627/job-traceabilityfrom
bencap wants to merge 3 commits intofeature/bencap/627/job-traceabilityfrom
Conversation
Collaborator
Author
|
API support for VariantEffect/dcd_mapping2#97 |
23cc43b to
97579c4
Compare
12ee80d to
d20f2ee
Compare
The variant recoder phase was processing batches sequentially, making it impossible to complete large score sets within the 2-hour job timeout (~60 variants processed). - Replace sequential recoder loop with asyncio.gather + Semaphore capped at _RECODER_CONCURRENCY=5 concurrent batches - Add per-batch debug logging inside _recoder_with_semaphore - Demote chatty info logs (per-batch VEP progress, "prepared batches") to debug to reduce log noise at scale - Demote expected per-variant recoder miss from warning to debug; summary counts in the final info log are the right signal
…arated concerns Move VRSMap client code, type schemas, metadata utilities, and constants into separate modules within a mapping package. Maintain backward compatibility through re-exports in __init__.py so existing imports continue to work without changes. Co-authored-by: Copilot <copilot@github.com>
…) QC and provenance Add a new `target_gene_mappings` table that records alignment QC and provenance for each (target gene, annotation layer) pair produced by dcd-mapping. Replaces flat QC fields on `mapped_variants` with a normalized FK relationship. - Add `TargetGeneMapping` model, view model, and `AnnotationLayer` enum - Extend `MappedVariant` with `target_gene_mapping_id`, `alignment_level`, `at_mismatched_locus`, and `near_gap` columns - Update mapping worker to persist `TargetGeneMapping` rows and link variants - Add Alembic migration (`8c4a2f1d9e6b`) for schema changes - Add manual backfill script to populate new columns for existing mapped variants - Drop `variants_failed_pre_layer_selection` and `variants_with_mapping_warnings` QC counts from the schema (not recoverable for existing data) Co-authored-by: Copilot <copilot@github.com>
d20f2ee to
0e26356
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new per-(target gene, alignment level) mapping QC and provenance model, refactors the mapping library for better modularity, and updates the database schema and ORM models to support richer mapping provenance and annotation. The changes enable more detailed tracking of variant mapping quality and provenance, and lay the groundwork for improved downstream analysis and data integrity.
Database schema and model enhancements:
target_gene_mappingstable to store per-(target gene, alignment level) QC and provenance information, and extended themapped_variantstable with new columns (target_gene_mapping_id,alignment_level,at_mismatched_locus,near_gap) to link variants to their mapping QC and annotation details.TargetGeneMappingSQLAlchemy model and established relationships fromTargetGeneandMappedVarianttoTargetGeneMappingfor ORM-level access to mapping QC records. [1] [2] [3]AnnotationLayerenum to standardize annotation layer values and provide translation from dcd-mapping wire codes.Mapping library refactor:
client.py,constants.py,metadata.py,schema.py), with a new public API inmapping/__init__.pyfor backward compatibility. This modularizes code for maintainability and clarity. [1] [2] [3] [4] [5]API and script updates:
MappedVariantWithMappingDetailsmodel, exposing richer mapping QC and provenance information.Other improvements:
target_gene_mappingto the public model exports for easier access in other modules.These changes collectively provide a robust foundation for tracking, querying, and analyzing variant mapping provenance and quality throughout the application.