Conversation
Collaborator
fffoivos
commented
Mar 17, 2026
- Fixes HTML interstitial handling so challenge/viewer pages are not recorded as successful downloads.
- Adds browser-gated download support with standard, auto, and browser routes plus policy-driven selection: Closes Standard static scraping methods are failing to retrieve documents, due to non-static URL structures. #90
- Adds a guided installer and browser dependency wiring.
- Simplifies the DeepSeek OCR stack.
- Fixes editable installs and chunk merging.
- Expands docs with pipeline architecture, stage references, and Pages support.
docs: document pipeline artifact contract and runtime outputs
Flip DEFAULT_RUNTIME_BACKEND from "transformers" to "vllm". The transformers backend is currently broken: DeepSeek-OCR-2's bundled modeling_deepseekv2.py imports LlamaFlashAttention2 from transformers.models.llama.modeling_llama, which was removed upstream in transformers >= 4.46. vllm 0.18.0 transitively pulls a transformers >= 4.57, so anyone running the documented setup with the old default hits an ImportError before any OCR happens. Drop the explicit transformers and tokenizers pins from the deepseek extra (both come transitively via vllm at the version vllm requires; explicit pins were redundant). Add docs/operations/deepseek_runtime_contract.md documenting the supported backend, the page-level skip guard, and how to add a new backend. Verified on 2× A100 SXM4 40GB: 10 OpenArchives PDFs, 683 pages, exact_fill scheduling, vLLM wall time 276 s (0.65-0.76 s/page per GPU). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the RuntimeError that aborted the entire batch when a single PDF produced empty markdown with a per-document empty_markdown=True metric and a warning log. The other documents in the batch now finish successfully and the empty case is observable downstream via the metrics JSON. Forward-port of 6ce0d9c from codex/ocr-env-fix, rebased onto current dev's runner.py layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Forward-port test_runner_resolves_standard_vllm_defaults_when_omitted from codex/ocr-vllm-defaults-refactor. Asserts that calling run_for_files with runtime_backend="vllm" and explicit None for render_dpi / gpu_memory_utilization resolves to DEFAULT_RENDER_DPI and DEFAULT_GPU_MEMORY_UTILIZATION respectively. Pins the contract that "None means default" rather than "None means unset". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r-pipeline-20260425 Brings 8 new Rust modules into the cleaner crate as additive code. The modules compile and their internal tests can run, but they are NOT yet wired into Corpus.clean()'s production call path — that's Stage 3. New modules: - charset_module.rs (+707) — analyze_charset, non_empty_line_stats - cmark_gfm_oracle.rs (+408) — cmark-gfm subprocess oracle for verification - latex_module.rs (+1674) — LaTeX-syntax-aware detection + cropping - md_format.rs (+595) — Pilot A (format_parsed) + dual_verify - md_format_surgical.rs (+1053) — Pilot B (parser-backed surgical Phase A) - md_module.rs (+1641) — MD-syntax-aware Phase A detectors - md_verify.rs (+1158) — pulldown-cmark equivalence verifier - normalize.rs (+2022) — separated normalize passes (fold, bucket, etc.) New deps in Cargo.toml: comrak 0.26 (no default features), pulldown-cmark 0.11 (html-only). Both are pure-Rust, no system deps. Removes the duplicate [tool.maturin] table that was triggering "unused manifest key" warnings (lives in pyproject.toml). lib.rs adjustments: - New module declarations (mod foo;) so the modules compile. - #![allow(dead_code)] at crate level — many of the new modules' helpers are unused until Stage 3 wires them in; suppresses the noise. - New module-doc-comment header explaining the cleaner/noise crate boundary and the production Phase A choice (Pilot B + dual_verify). - Does NOT add new PyO3 surface registrations — those land in Stage 3 alongside the cleaning_module.rs rewrite that exposes them. - Pilot A's format_parsed_py and the dev-only dual_verify_py exports are intentionally NOT registered — they will not ship to production. Build status: - cargo check passes with 2 pre-existing dev warnings (DetailedTableIssueReportEntry privacy + non_local_definitions in table_analysis_module). These warnings were on dev before this commit; they will be cleaned up in Stage 1.2. Stage 1.2 (next commit on this branch): excise dead alternatives from the imported modules — Pilot A's format_parsed (md_format.rs) and the LineBased path's normalize_md_syntax (md_module.rs). dual_verify stays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ython tooling + docs
This is the visible behavior change: Corpus.clean()'s Phase A now defaults
to Pilot B (PhaseAMode::ParserSurgicalVerified → format_surgical_checked),
parser-backed and dual-verifier-protected. Per CLEANER_PIPELINE_CLEANUP_PLAN_2026-04-25.
## Rust crate changes
cleaning_module.rs: 744 → 3063 lines. Major rewrite per the cleanup plan:
- Per-char ops collapsed to 2 groups (Group 1 STRIP / Group 2 FOLD).
fold_codepoint absorbs Adobe Symbol PUA decode + µ→μ. soft-hyphen strip
absorbed into is_unicode_noise_char (per-line). Pre-pass shrinks to
HTML entities + base64 image strip + Phase A.
- Group 1 STRIP narrowed to non-European / extraction-noise ranges. Latin-1
Supp + Latin-Ext-A + Cyrillic + Cyrillic-Supp now KEPT entirely.
Latin-Ext-B kept except Romanian comma-below {Ș, ș, Ț, ț}.
- Unified Rule B regex covers GLYPH<…>, glyph<c=…,font=/…>, /[A-Z]{6}+FontName,
/uniXXXX, /g(id)?N. Rule A's 50 PostScript-name literals contribute to the
same count+coverage gate (≥10 hits AND ≥9% coverage → line-drop). Bare-
word matchers (GLYPH, hyphenminus, font, glyph as plain words) deleted.
- R1∪R2 residue range narrowed to U+0180..U+024F minus Romanian to match
Group 1's policy.
- Per-rule counters in CleanStats: rule_a_match_count, rule_b_match_count,
residue_line_drop_count. Production drivers source these directly,
eliminating second matcher invocation per row.
- Per-doc 4-way char accounting: content_chars_kept,
chars_dropped_by_{line_drop, normalization, per_char_filter}, plus marker
passthrough/added split.
- PhaseAMode enum + core_clean_text_with_stats_with_mode + PyO3 phase_a_mode
arg. Default flipped to ParserSurgicalVerified.
- format_surgical_checked populates phase_a_fallback_reason and
phase_a_dialect_ambiguous_input in CleanStats.
- Corpus.clean / clean_text policy parity: both call build_script_char_sets.
Fixes silent bug where directory pipeline stripped punct/digits when
callers passed restricted scripts_to_keep.
- Post-loop \n{3+} → \n\n collapse (CommonMark renders any blank-line run
as one block separator; bytes go into chars_dropped_by_normalization).
- Bug 1 fix: token-category exporter byte-vs-char offsets (Greek-prefixed
input was silently dropping rows). Now emits CHAR offsets at the export
boundary; internal byte offsets retained for Rust slicing.
lib.rs: full PyO3 surface registration for the new modules:
clean_text, clean_text_with_stats, analyze_charset, non_empty_line_stats,
crop_latex_repetitions_py, verify_md_preview_equivalent_py,
verify_md_structural_py, phase_a_alteration_stats, apply_phase_a,
phase_a_stats_jsonl_line, cmark_gfm_verify_py, format_surgical_py,
format_surgical_checked_py, phase_a_policy_py.
DELIBERATELY EXCLUDED (per user direction "only keep Pilot B"): the dev-only
format_parsed_py (Pilot A) and dual_verify_py (dev-only oracle exposure)
PyO3 registrations. The Rust dual_verify function STAYS — it's used by
format_surgical_checked.
noise crate: +1,360 lines (token-category review/debug exports + 3-counter
infrastructure used by the new cleaning_scripts/). Cleaner crate has zero
Cargo.toml dep on noise — boundary enforced at compile time.
## Python orchestration tooling (cleaning_scripts/ — 8 new files, +1933 LOC)
- analyze_cleaning_concentration.py — per-dataset / per-doc cleaning concentration
- analyze_cleaning_distributions.py
- analyze_quality_vs_deletions.py
- clean_and_stats_rowsharded.py — production HPLT driver, per-row clean+stats
- pull_deletion_band_samples.py — stratified band sampler
- regenerate_samples.py
- smoke_tests/test_rust_extensions_smoke.py — exercises new PyO3 surface
- validate_gzipped_shards.py — verifies post-clean shards byte-identical to
squash(clean_text_with_stats(raw, …))
## Token-category review tooling (src/glossapi/scripts/ — 6 new files, +3003 LOC)
- aggregate_token_category_reviews.py
- build_token_category_review_bundle.py
- export_token_category_debug.py
- export_token_category_debug_parquet.py
- review_token_category_with_gemini.py — Gemini review driver
- token_category_debug_common.py
google-genai>=1.30.0 added to core deps for the Gemini reviewer.
## Architecture / changelog docs (6 files, +2683 LOC)
- rust/glossapi_rs_cleaner/CHANGES_2026_04_22.md (3-counter wave)
- rust/glossapi_rs_cleaner/CHANGES_2026_04_25.md (Pilot B + cleanup wave)
- rust/glossapi_rs_cleaner/docs/MD_MODULE_ARCHITECTURE.md
- rust/glossapi_rs_cleaner/docs/MD_MODULE_ARCHITECTURE_IMPLEMENTATION_REVIEW_2026-04-24.md
- rust/glossapi_rs_cleaner/docs/PHASE_A_PARSER_BACKED_IMPLEMENTATION_REVIEW_2026-04-24.md
- rust/glossapi_rs_cleaner/docs/PHASE_A_PARSER_BACKED_INDEX.md
## phase_clean.py: +240 lines
Python-side wiring for PhaseAMode arg, clean_text_with_stats call,
build_script_char_sets policy parity. Does NOT touch clean_ocr() — that's
intentional per noise-profile rule (DeepSeek OCR doesn't produce Docling-
class glyph/mojibake noise; routing the new Docling-tuned machinery through
clean_ocr would be wrong by default).
## tests/test_corpus_clean_enhancements.py
Cleanup branch's version (1912 lines vs dev's 1724) — 188 line additions
covering the new policy parity + per-rule counters + Pilot B fallback
shape + the new build_token_category_review_bundle + the new score schema.
## Phase 1 changes preserved
- DEFAULT_RUNTIME_BACKEND = "vllm" stays (defaults.py untouched by cleanup branch)
- pyproject.toml's deepseek extra stays cleaned-up (transformers/tokenizers
pins remain dropped — vllm pulls them transitively)
## What's NOT in this commit (deferred)
- Stage 4 (clean_ocr surgical carve-outs — surface empty_markdown field):
separate commit.
- Stage B (cleaner-side ocr_render.py extraction from faa1362): separate PR.
- Final dead-code excision (delete format_parsed body in md_format.rs +
normalize_md_syntax in md_module.rs): final cleanup pass after integration.
## Build status
cargo check: both crates build with 2 pre-existing dev warnings (table_analysis
private-interface + non_local_definitions in pymethods macro). Production
behavior verified compiles; runtime validation requires a real corpus run
(per user's Stage 3 acceptance: 100-doc end-to-end on
openarchives.gr.part-00000.parquet showing 0 doc drops, ≥17% chars removed,
no Greek-text quality regressions on a 5-page spot-check).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spotted by cargo test failure on test_empty_content_with_remove_op: 1. table_remover_module.rs (4-line fix): add early-return on empty input so empty file with a remove op yields empty output instead of `<!-- table-removed -->` marker. Per cleanup wave's Bug 2 set. 2. directory_processor.rs: route Corpus.clean()'s analysis report path through cleaning_module::build_script_char_sets — same policy builder as clean_text / clean_text_with_stats. Fixes Point 8 silent bug where directory pipeline stripped ASCII punct + digits when callers passed restricted scripts_to_keep. Plus pub(crate) on DetailedTableIssueReportEntry to fix the privacy-leak warning. 3. table_analysis_module.rs: add #![allow(non_local_definitions)] for the pyo3 0.19 #[pymethods] macro. Per cleanup wave's lint-posture fix — silences the warning until pyo3 is upgraded. cargo test --release on the cleaner crate now reports: 385 passed; 0 failed; 3 ignored (matches cleanup branch's measured test outcome.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per user direction "We dont need Gemini reviewer scripts. This is still bloat for me. Just keep the basics." — remove the 6 src/glossapi/scripts/ files that are exclusively for the Gemini-driven review-bundle workflow, plus the google-genai dep and the test that imported them. Removed: - src/glossapi/scripts/aggregate_token_category_reviews.py (456 LOC) - src/glossapi/scripts/build_token_category_review_bundle.py (574 LOC) - src/glossapi/scripts/export_token_category_debug.py (71 LOC) - src/glossapi/scripts/export_token_category_debug_parquet.py (184 LOC) - src/glossapi/scripts/review_token_category_with_gemini.py (1429 LOC) - src/glossapi/scripts/token_category_debug_common.py (289 LOC) Dependency removed: - google-genai>=1.30.0 from pyproject.toml core deps (was only pulled in for the Gemini reviewer) Test removed: - test_build_token_category_review_bundle_materializes_cases (referenced the dropped script directly) What stays: - glossapi_rs_noise crate's match_token_category_debug_text PyO3 surface (kept for any future debug/discovery caller; no current Python script uses it after this commit) - Corpus.clean_token_category_debug Python method (uses the PyO3 surface for per-page category breakdown) - test_clean_token_category_debug_exports_synthetic_pages test (covers the PyO3 surface end-to-end) - cleaning_scripts/clean_and_stats_rowsharded.py's mention of token_category in a code comment (no actual call — describes prior matcher behavior that was eliminated by Point 7 of the cleanup plan) Net delta: 3003 LOC removed across 6 script files + 88 LOC test removed + 1 dep removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… Pilot B remains) Per user direction "only keep Pilot B for md reformatting not a etc", remove all dead Phase-A alternatives now that Pilot B (format_surgical_checked) is the production default. ## md_format.rs (594 → 299 lines) Removed: - format_parsed (Pilot A function — abandoned per CHANGES_2026_04_25.md after the 2026-04-24 90-doc audit found 50/66 failures from comrak's whole-doc round-trip over-normalizing list markers, link forms, escapes; Pilot B's surgical approach supersedes it). - format_parsed_py (PyO3 export of Pilot A). - dual_verify_py (PyO3 export — dev-only oracle exposure; the underlying dual_verify Rust function STAYS as crate-internal because format_surgical_checked depends on it). - All ~30 Pilot-A test fixtures. Kept (the only reason this module still exists): - dual_verify + DualVerifyReport + pulldown_render / comrak_render / collapse_ws helpers + common_prefix_len. Used by md_format_surgical::format_surgical_checked. ## md_module.rs (1641 → 1071 lines) Removed: - normalize_md_syntax (the LineBased Phase A orchestrator). - normalize_md_syntax_with_stats (instrumented LineBased variant). - apply_phase_a (PyO3 wrapper of normalize_md_syntax). - phase_a_stats_jsonl_line (PyO3 export — JSONL writer using LineBased stats; the bench script that called it was already dropped from the cleanup branch's "production-essential triage"). - phase_a_alteration_stats (PyO3 export — dict writer for LineBased). - push_json_str / fmt_finite_f64 (helpers used only by the LineBased PyO3 exports). - PhaseAStats struct (return type of normalize_md_syntax_with_stats). - 25 LineBased tests. - pyo3 imports (no PyO3 surface remains in this module). Kept: - non_destructive_canonicalize: used by md_verify. Updated its final Phase A step to call format_surgical (Pilot B unchecked) directly instead of normalize_md_syntax — keeps the same "maximal canonical form" purpose. - is_code_fence_marker: called from cleaning_module's per-line loop. - All MD-syntax-aware helpers (leading_columns, normalize_separator_line, scan_gfm_table_separators, parse_gfm_separator_row, count_gfm_row_cells, collapse_blank_line_runs, reflow_paragraphs, reflow_paragraphs_with_count, can_join_lines, line_is_hard_break) — used by Pilot B's machinery internally. ## cleaning_module.rs - core_clean_text_with_stats_with_mode's match arm collapsed: only Pilot B (format_surgical_checked) is called now. - phase_a_mode parameter renamed to _phase_a_mode (kept in signature for back-compat with PyO3 callers + tests; accepts any value, ignores it). - 6 LineBased-pinned tests removed (accounting_normalization_tracks_separator_collapse, accounting_escaped_underscore_run_buckets_but_stays_as_underscores, accounting_long_escaped_underscore_run_buckets_to_20, accounting_mixed_doc_invariant_holds, core_clean_text_composite_roundtrip, core_clean_text_normalizes_separator_line) plus the linebased_clean_text + linebased_clean_text_with_stats test helpers. PhaseAMode enum is intentionally KEPT (3 variants stay) for back-compat with: (a) PyO3 callers still passing a `phase_a_mode` kwarg, (b) the phase_a_mode kwarg signature on clean_text + clean_text_with_stats. All variants now route to the same Pilot B path. A follow-up PR can collapse the enum + drop the kwarg cleanly. ## lib.rs PyO3 registrations removed for the dropped functions: - format_parsed_py, dual_verify_py (dropped earlier in this PR). - format_surgical_py (Pilot B WITHOUT oracle — dev-only, not appropriate for production exposure). - apply_phase_a, phase_a_alteration_stats, phase_a_stats_jsonl_line (LineBased instrumentation — function bodies are gone). Production PyO3 surface is now: clean_text, clean_text_with_stats, analyze_charset, non_empty_line_stats, crop_latex_repetitions_py, verify_md_preview_equivalent_py, verify_md_structural_py, cmark_gfm_verify_py, format_surgical_checked_py, phase_a_policy_py, plus the existing pipeline + table + directory_processor surfaces. ## Tests cargo test --release: 325 passed; 0 failed; 3 ignored (was 385 passed before excision; the 60 removed tests covered LineBased and Pilot A code paths that no longer exist.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…om phase_clean.py Per the cleaner-integration plan's Stage B + the cleaner correctness axes memory rule "latest changes are canonical". ## What's extracted 11 functions moved from `src/glossapi/corpus/phase_clean.py` into the new `src/glossapi/corpus/ocr_render.py`: - _gap_has_at_most_n_nonwhitespace_chars - _clean_fill_for_removed_span - _merge_labeled_raw_spans - _summarize_merged_labeled_spans - _render_page_from_merged_labeled_spans - _render_page_with_labeled_spans_result - _render_page_with_labeled_spans - _annotate_page_with_labeled_spans - _utf8_prefix_byte_offsets - _span_repeat_count - _build_match_index_rows These collectively own the analyzer/renderer separation: phase_clean.py decides WHAT spans exist; ocr_render.py renders HOW those spans become page text and debug sidecars. ## Body resolution Per the function-by-function comparison done in the cleaner-integration audit, 5 functions were EXACT match vs faa1362 and 6 were DIFF. Per the "latest changes canonical" rule, this PR uses **dev's bodies for all 11** (dev's Apr 14 OCR speedup wave is later than faa1362's Apr 12 work). The faa1362-only helper `_build_debug_match_open_tag` is NOT brought over because dev's `_render_page_from_merged_labeled_spans` (73 lines vs faa1362's 44) inlines the equivalent logic and doesn't need the helper. ## Also added `src/glossapi/corpus/text_surface_metrics.py` (48 lines, from faa1362 verbatim — new module): `sanitized_char_count` + `_strip_latex_envs_for_char_count`. Shared "published-surface" metric helpers used by export-facing metadata refresh. ## phase_clean.py changes - 11 function definitions removed (~330 lines). - New imports: `from .ocr_render import (...)` (11 names) and `from .text_surface_metrics import sanitized_char_count`. - Net: phase_clean.py 4929 → 4597 lines. ## docs/architecture/ocr_cleaning_runtime.md Padded from 118 → 186 lines with the previously-missing sections: - Code Layout (now accurate — describes the modules that exist) - Stage Boundary: clean_ocr() vs clean() - Field Ownership (OCR-owned vs clean/export-owned parquet fields) ## Verification - python3 ast.parse: ocr_render.py OK, text_surface_metrics.py OK, phase_clean.py OK. - Direct module import + sanitized_char_count smoke: works correctly. - pytest tests/test_corpus_clean_enhancements.py: 67 passed, 2 failed. Both failures (test_clean_flags_uppercase_glyph_noise and test_clean_token_category_debug_exports_synthetic_pages) are PRE-EXISTING dev failures — verified by running the same tests against unmodified origin/development. NOT introduced by this extraction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote the previously-uncommitted survey from the cleanup branch worktree
into docs/architecture/, matching the project's lowercase-with-underscores
naming convention (was MD_LIBRARY_SURVEY_LEARNINGS_2026-04-24.md, now
markdown_library_survey.md).
Content rounded for fit:
- Title is now noun-phrase Title Case ("Markdown Library Survey") with no
date stamp, matching peers (ocr_cleaning_runtime.md, etc.).
- Reframed from "addendum + recommendations" to "design rationale +
outcomes" — every recommendation now annotated as ✅ landed or ⏳ open
so a future reader sees what shipped vs what remains.
- Section "Strategic value of a wholesale parser-backed direction" trimmed
(the question it asked has been answered — Pilot B shipped).
- "Open implementation directions" trimmed to only items still open
(pseudo-table semantic transform, raw-readability metrics, lint-style
diagnostics).
- Added "See also" with cross-references to the production files
(md_format_surgical.rs, md_format.rs, cmark_gfm_oracle.rs,
ocr_cleaning_runtime.md).
Updated docs/architecture/index.md to link to the new doc under "pressure
points are documented separately in".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… word-repeat window default Forward-port of two Apr 17 uncommitted edits from the deleted ocr-env-fix worktree. ## 1. Skip chunk markdown when canonical doc exists When the OCR runner emits both a canonical merged `doc.md` and per-page-range chunk outputs `doc__pNNNNN-NNNNN.md`, the cleaner used to clean BOTH — double-counting the same content in metadata. Both `Corpus.clean()` and `Corpus.clean_ocr()` now skip `__p`-suffixed chunks when the canonical doc is present in the same input directory. Test: test_clean_ocr_ignores_chunk_markdown_when_canonical_doc_exists. ## 2. Widen OCR word-repeat window default 96 → 520 The legacy `word_window=96` default missed accent-shifted Greek repetitions where the period is wider than 32 chars. Two new constants: DEFAULT_OCR_WORD_REPEAT_MAX_PERIOD = 130 DEFAULT_OCR_WORD_REPEAT_WINDOW = DEFAULT_OCR_WORD_REPEAT_MAX_PERIOD * 4 # 520 are now the default for `word_window` in `Corpus.clean_ocr()`, `Corpus.clean_ocr_debug()`, and `Corpus.clean_ocr_numeric_debug()`. Regression test: test_long_accent_shift_repeat_needs_wider_default_window proves the legacy 96 misses a real accent-shift case while 520 catches it. ## Verification - pytest test_clean_ocr_ignores_chunk_markdown_when_canonical_doc_exists: passed. - pytest test_long_accent_shift_repeat_needs_wider_default_window: passed. - python3 -m ast: src/glossapi/corpus/phase_clean.py + tests parse OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.