Skip to content

BigramDetector#149

Open
ernstleierzopf wants to merge 17 commits into
developmentfrom
feature/bigram-detector
Open

BigramDetector#149
ernstleierzopf wants to merge 17 commits into
developmentfrom
feature/bigram-detector

Conversation

@ernstleierzopf
Copy link
Copy Markdown
Contributor

@ernstleierzopf ernstleierzopf commented May 18, 2026

Fixes #60

@ernstleierzopf
Copy link
Copy Markdown
Contributor Author

@viktorbeck98 could you please review the code and make suggestions specifically to the persistency of the class? How is the data persisted and loaded? Do I have to somehow add self.freq and self.total_freq to the persistency class?

I do not really know if it works as it should and it should be treated as a first implementation; further testing is required.

ernstleierzopf and others added 14 commits May 18, 2026 19:16
Resolved conflicts in config/pipeline_config_default.yaml and
docs/detectors.md by keeping both BigramFrequencyDetector and
CharsetDetector entries.
…cker

Adds an opaque Dict[str, Any] field that detectors can use to stash
per-variable model state across save/load cycles, with full round-trip
support via to_state()/from_state() and backward-compatible defaults.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Intermediate commit — train/detect rewrites land in subsequent commits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cy object

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nce drops

Rewrites train/train_helper so bigrams are computed for all configured
variables (not just pre-existing ones), fixing the first-occurrence drop
for new EventIDs. Snapshots unique_set before ingest so skip_repetitions
correctly skips repeats without skipping first occurrences.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants