Skip to content

HyperlinksSpace/TinyModel

Repository files navigation

TinyModel

Tiny, deployable text classification baseline for rapid product iteration

Model Space Hub Live preview

TinyModel is a practical starter model line for text classification. End users consume deployed Hugging Face model and Space endpoints. Maintainer deployment policy lives in texts/HUGGING_FACE_DEPLOYMENT_INTERNAL.md.

Repository: HyperlinksSpace/TinyModel

TinyModel1 on Hugging Face

Availability in Russia

Some features may not work reliably from Russia—for example live preview or other flows that depend on third-party hosts or regions that are blocked or throttled. If you hit that, you can try third-party tools such as the free tier of 1VPN (browser extension or app), or Happ (paid subscription). One place people buy Happ subscriptions is this Telegram bot. These are all third-party services; use at your own discretion and follow applicable laws.

Model card (README) — On the Hub, the model card is the README.md file at the root of the model repo (same URL as the model). In this repository, the template is implemented by write_model_card() in scripts/train_tinymodel1_classifier.py; training writes README.md, artifact.json, and eval_report.json next to the weights. We do not run CI that downloads full model weights into the repo or runner caches for republish; update the card by retraining and publishing, or edit README.md on the Hub and keep weights unchanged.

1) Local testing

Train locally after cloning the repo:

python scripts/train_tinymodel1_agnews.py --output-dir .tmp/TinyModel-local

Quick local inference sanity check:

python -c "from transformers import pipeline; p=pipeline('text-classification', model='.tmp/TinyModel-local', tokenizer='.tmp/TinyModel-local'); print(p('Stocks rallied after central bank comments', top_k=None))"

Phase 1 presets and comparison matrix

scripts/phase1_compare.py standardizes run profiles and prevents ad-hoc parameter drift. It executes matching-seed runs and writes a comparison matrix with accuracy, macro_f1, and per-class F1 for each run.

Presets:

  • smoke: quickest reproducibility/health check (120/80, 1 epoch)
  • dev: day-to-day iteration (1000/300, 2 epochs)
  • full: heavier baseline (6000/1200, 3 epochs)

Run full Phase 1 baseline comparison (scratch vs pretrained) on both AG News and Emotion:

python scripts/phase1_compare.py --preset smoke --seed 42

Outputs:

  • artifacts/phase1/runs/<preset>/<dataset>/<model>/... (model artifacts per run)
  • artifacts/phase1/reports/phase1_<preset>_seed<seed>.md (human-readable table)
  • artifacts/phase1/reports/phase1_<preset>_seed<seed>.csv (spreadsheet-friendly)
  • artifacts/phase1/reports/phase1_<preset>_seed<seed>.json (machine-readable)

CI smoke check (no heavy pretrained download by default):

python scripts/phase1_compare.py \
  --preset smoke \
  --models scratch \
  --datasets ag_news,emotion \
  --seed 42

This same default check is wired in .github/workflows/phase1-smoke.yml.

Phase 2: Evaluation quality (datasets, errors, calibration)

Training and pretrained fine-tuning now emit richer evaluation artifacts so reports support decisions beyond headline accuracy.

Artifact What it contains
eval_report.json Existing reproducibility + metrics, plus dataset_quality.class_distribution (train/eval counts and proportions per label on the capped subsets), error_analysis.top_confusions (largest off-diagonal confusion pairs), calibration.max_prob_histogram (bins over the winner softmax probability per eval example), and routing (documented fallback behavior for low-confidence routing; thresholds are not fixed by training).
misclassified_sample.jsonl Up to --max-misclassified-examples wrong predictions with text, true_label, predicted_label, max_prob (one JSON object per line). Use 0 to skip writing the file content beyond an empty run.

Routing threshold example (Phase 2 exit): a worked min_confidence + fallback policy for triage is documented in texts/phase2-routing-threshold-scenario.md (tune on your own validation data).

CLI knobs (scratch and finetune_pretrained_classifier.py):

  • --max-misclassified-examples (default 100)
  • --confidence-histogram-bins (default 10)
  • --top-confusions (default 20)

Third reference dataset (SST-2) — binary sentiment on GLUE, useful as an additional domain check:

python scripts/train_tinymodel1_sst2.py \
  --output-dir .tmp/TinyModel-sst2 \
  --max-train-samples 500 \
  --max-eval-samples 200 \
  --epochs 1 \
  --batch-size 8 \
  --seed 42

Quick Phase 2 smoke (AG News, small caps):

python scripts/train_tinymodel1_classifier.py \
  --output-dir .tmp/phase2-smoke \
  --max-train-samples 64 \
  --max-eval-samples 32 \
  --epochs 1 \
  --batch-size 8 \
  --seed 42 \
  --max-misclassified-examples 20

Then inspect .tmp/phase2-smoke/eval_report.json (new sections) and .tmp/phase2-smoke/misclassified_sample.jsonl.

Expected local output folder:

  • .tmp/TinyModel-local/model.safetensors
  • .tmp/TinyModel-local/config.json
  • .tmp/TinyModel-local/tokenizer.json
  • .tmp/TinyModel-local/README.md
  • .tmp/TinyModel-local/artifact.json
  • .tmp/TinyModel-local/eval_report.json — evaluation metrics, confusion matrix, reproducibility, and Phase 2 fields (class distribution, top confusions, calibration histogram, routing notes)
  • .tmp/TinyModel-local/misclassified_sample.jsonl — optional sample of errors for review (see Phase 2 section)

Phase 3: ONNX, CPU benchmarks, reference HTTP API

Optional dependencies: optional-requirements-phase3.txt (ONNX, ONNX Runtime, onnxscript for export, fastapi/uvicorn for the reference server). PyTorch 2.6+ uses torch.onnx.export(..., dynamo=True).

  1. Export — from a training output directory or Hub id:

    python scripts/phase3_export_onnx.py --model artifacts/phase1/runs/smoke/ag_news/scratch
    # or: --model HyperlinksSpace/TinyModel1

    On Windows Git Bash, do not use a Unix-style placeholder like /path/to/checkpoint — the shell rewrites it under C:/Program Files/Git/.... Use a relative path from the repo or a c:/... path.

    Writes onnx/classifier.onnx (logits) and onnx/encoder.onnx (pooled token for embeddings). The default dynamo path traces at batch size 1; use tokenizer padding to max_seq_length (e.g. 128) to match. Optional --dynamic-quantize attempts INT8 sidecars (may be skipped on some graphs).

  2. Parity (PyTorch vs ONNX Runtime):

    python scripts/phase3_onnx_parity.py --model artifacts/phase1/runs/smoke/ag_news/scratch
  3. CPU benchmark report (PyTorch TinyModelRuntime vs ORT, classify / embed / retrieve patterns):

    python scripts/phase3_benchmark.py --model artifacts/phase1/runs/smoke/ag_news/scratch --compare-model .tmp/phase3-smoke

    Artifacts: artifacts/phase3/reports/benchmark_<name>.{json,md}. (Example report may be present under that folder after a run.)

  4. Serving contract + minimal APItexts/phase3-serving-profile.md (GET /healthz, POST /v1/classify, POST /v1/retrieve). Reference process:

    pip install -r optional-requirements-phase3.txt
    python scripts/phase3_reference_server.py --model HyperlinksSpace/TinyModel1
  5. CI.github/workflows/phase3-smoke.yml trains a tiny model, exports ONNX, runs parity, and writes a benchmark under artifacts/phase3/reports/.

Optional R&D spike ideas (not part of the release path) — see texts/optional-rd-backlog.md.

Horizon 1 (short term): one-shot verify, three tasks, RAG smoke

This is the A–C tranche from texts/further-development-universe-brain.md (baseline closure, multi-dataset eval breadth, minimal FAQ-style retrieval). Full commands, what gets written, and how to test manually: texts/horizon1-short-term-handbook.md.

Block What you run Why it helps
A — Verify Two commands (do not put then on the pip line): pip install -r optional-requirements-phase3.txt then a new line: python scripts/horizon1_verify_short_term_a.py. Or: pip install -r optional-requirements-phase3.txt && python scripts/horizon1_verify_short_term_a.py (Git Bash / PowerShell 7+). Add --skip-phase3 to skip ONNX. Proves Phases 1–2 plus export/parity/benchmark in one local pass, aligned with phase1-smoke / phase3-smoke CI.
B — Three tasks python scripts/horizon1_three_datasets.py (use --offline-datasets if Hugging Face download times out but data is already cached) AG News, Emotion, and SST-2 with shared caps; summary table: texts/horizon1-three-tasks-summary.md. Weights go under artifacts/horizon1/three-tasks/ (gitignored; commit the texts/ summary).
C — RAG smoke python scripts/rag_faq_smoke.py (optional --model; defaults to a local checkpoint if present, else HyperlinksSpace/TinyModel1 on the Hub) Hybrid lexical + TinyModelRuntime retrieval over texts/rag_faq_corpus.md; template for support/FAQ products.

Horizon 2: generative core (open causal LM, JSON runs, optional API)

What it is: a local transformers path that turns text into new text: summarize, reformulate, and grounded (RAG context + answer) — aligned with the “Generative core” line in texts/further-development-universe-brain.md. It does not replace your classifier; it complements Horizon 1 (retrieval) and your Phase 1–3 stack.

Piece What you run Why it helps
Install pip install -r optional-requirements-horizon2.txt (plus your existing torch) Picks up transformers / accelerate for AutoModelForCausalLM.
Smoke verify python scripts/horizon2_generative.py --verify One greedy generation with sshleifer/tiny-gpt2 → proves downloads + wiring (not demo quality).
Real run python scripts/horizon2_generative.py or set HORIZON2_MODEL=HuggingFaceTB/SmolLM2-360M-Instruct Writes horizon2 JSON under .tmp/horizon2/last_run.json with per-sample latency and token counts for cost and tier planning.
Side-by-side add --compare-with <other-hf-id> Same inputs, two model outputs in one JSON (Horizon 2 exit shaped like “A/B on domain tasks”).
+ RAG --task grounded --context-file <chunk> (or --context "...") Pairs FAQ / retrieval with generation.
HTTP pip install -r optional-requirements-phase3.txt then python scripts/horizon2_server.py --smoke GET / lists routes; Swagger UI: http://127.0.0.1:8766/docsPOST /v1/generate (same product pattern as the Phase 3 reference server).

Benefits (product / engineering):

  • Drafts and summaries on top of the same org data and policies you already use for classification.
  • One JSON contract per run (horizon2_generative_run/1.0) for dashboards and regression checks (see texts/horizon2-handbook.md).
  • Tier awareness: smoke vs. default Instruct vs. your own API — documented in the handbook; latencies are recorded in the artifact.

CI: .github/workflows/horizon2-smoke.yml runs --verify on pushes to main (requires Hub access in GitHub’s network; local verify is the fallback).

Horizon 3: persistent mind (session + long-term memory, audit, DSR-shaped export)

What it is: a local SQLite memory layer for org/user scope_key, with session vs long_term rows, optional TTL + prune, an audit log, export (access-shaped JSON), and forget-scope (delete all data for a scope). See texts/further-development-universe-brain.mdPersistent mind.

Piece What you run Why it helps
Self-test python scripts/horizon3_memory_cli.py --verify No network; validates CRUD, export, session clear, TTL prune, forget.
Daily use `python scripts/horizon3_memory_cli.py put get
Optional HTTP pip install -r optional-requirements-phase3.txt then python scripts/horizon3_memory_api.py http://127.0.0.1:8767/docsput / list / export / forget (default port 8767; set HORIZON3_DB).

Benefits

  • Product: carry continuity across sessions (long-term) while dropping chat noise (session clear) or expiring junk (TTL + prune).
  • Governance: audit trail for creates/updates/deletes; export supports access requests; forget-scope supports erasure for a tenant id (you still own legal review and scope design).
  • Engineering: stdlib-only store and CLI — no new pip packages for the core; optional FastAPI matches Phase 3 patterns.

Manual test recipe: texts/horizon3-handbook.md.

CI: .github/workflows/horizon3-smoke.yml runs horizon3_memory_cli.py --verify (offline).

Horizon 4: multimodal grounding (image + text, CLIP alignment)

What it is: a CLIP-style path (Hugging Face transformers: image + caption → one alignment logit) for “does this picture go with this text?” — a narrow slice of multimodal grounding from texts/further-development-universe-brain.md. Audio and automated moderation are not in this script; add them in product layers.

Piece What you run Why it helps
Install pip install -r optional-requirements-horizon4.txt (and torch + transformers for real Hub models) Adds Pillow; reuses the same PyTorch stack as the rest of the repo.
CI / offline verify python scripts/horizon4_multimodal.py --verify No Hub download — random CLIPConfig + CLIPModel forward; proves the wiring. On Windows this uses a subprocess and OpenMP env defaults to avoid native crashes; if PyTorch still fails, see the handbook.
Pretrained check python scripts/horizon4_multimodal.py --verify-pretrained Loads HORIZON4_CLIP_MODEL (default openai/clip-vit-base-patch32) if cached/online.
Real photo + text python scripts/horizon4_multimodal.py --image <file> --text "<caption>" JSON under .tmp/horizon4/last_run.json with logit_image_text for triage, QA, or internal benchmarks.

Benefits: one concrete image–text score next to your text-only classifiers; governance still needs human/review for abuse; smoke stays offline and fast in CI.

Manual steps: texts/horizon4-handbook.md

CI: .github/workflows/horizon4-smoke.yml runs horizon4_multimodal.py --verify (no network).

Horizon 6: converged stack (chain H2 + H3 + H4)

What it is: a thin smoke orchestrator from texts/further-development-universe-brain.md (Converged stack)—one command runs the existing generative, memory, and CLIP smokes in order and writes one JSON file (horizon6_converged_run/1.0).

Piece What you run Why it helps
Install pip install torch (CPU is fine) + pip install -r optional-requirements-horizon2.txt + pip install -r optional-requirements-horizon4.txt H2 and H4 share a transformers stack; H3 stays stdlib-only.
Converged verify python scripts/horizon6_converged_smoke.py --verify Chains: horizon2_generative.py --verifyhorizon3_memory_cli.py --verifyhorizon4_multimodal.py --verify. Output: .tmp/horizon6-converge/run.json.
Optional RAG same command with --with-rag Also runs rag_faq_smoke.py (needs a trained config.json dir or Hub download; can fail in air-gapped envs).

What is still not H6 (full exit): a single production runtime and router, one auth/tenant story, and a real incident runbook—this repo only proves component smokes in sequence.

How to test (local): install deps as above, then python scripts/horizon6_converged_smoke.py --verify. Expect exit 0 and ok: true in the JSON; H2 may hit the Hub once for sshleifer/tiny-gpt2 if not cached. Faster one-offs: run each horizon’s --verify alone (see Horizon 2–4 sections above).

CI: .github/workflows/horizon6-smoke.yml runs the same command on CPU in GitHub Actions.

Horizon 7: assured platform (tenant isolation smoke)

What it is: a stdlib-only check that two separate SQLite files (two “tenants”) do not share memory rows or exports, using the same Horizon 3 store as the rest of the repo. This is a toy for H7 isolation from texts/further-development-universe-brain.md—not legal/compliance by itself.

Piece What you run Why it helps
Self-test python scripts/horizon7_assured_smoke.py --verify No torch; output .tmp/horizon7-assured/run.json (horizon7_assured_run/1.0) with per-check ok flags.

What is still not H7 (full exit): repeatable tenant onboarding, regulatory evidence packs, external audit, SLAs, quotas—treat the script as a developer check only.

How to test (local): python scripts/horizon7_assured_smoke.py --verify — should print horizon7 verify: OK and write JSON with all checks ok: true.

CI: .github/workflows/horizon7-smoke.yml runs the same command (no extra pip deps).

Horizon 8: observability probe bundle (environment + H7 probe)

What it is: a single JSON “build + health” snapshot from texts/further-development-universe-brain.md (Observability & probe bundle)—Python/platform, optional git short SHA, and a real run of Horizon 7’s verify as a dependency probe.

Piece What you run Why it helps
Probe verify python scripts/horizon8_observability_probe.py --verify Writes .tmp/horizon8-probe/run.json (horizon8_probe_run/1.0). No torch; needs git only for git_rev when available.

What is still not H8 (full exit): SLOs, alerting, streaming metrics, and dashboards—this is a file-shaped probe for CI and manual triage.

How to test (local): python scripts/horizon8_observability_probe.py --verify — expect horizon8 verify: OK and ok: true with a probes list.

CI: .github/workflows/horizon8-smoke.yml.

Horizon 9: declarative policy (allow / deny matrix)

What it is: a versioned sample policy (texts/horizon9_policy_sample.json) and a smoke that checks deny-over-allow precedence and default deny—from the Declarative policy & capability gates horizon in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Policy verify python scripts/horizon9_policy_smoke.py --verify Writes .tmp/horizon9-policy/run.json (horizon9_policy_run/1.0). Optional --policy path.json. Stdlib only.

What is still not H9 (full exit): AuthN, OPA, signed policy, dynamic attributes, audit of policy edits in production.

How to test (local): python scripts/horizon9_policy_smoke.py --verify — expect horizon9 verify: OK and all case rows ok: true.

CI: .github/workflows/horizon9-smoke.yml.

Horizon 10: budget & unit caps (FinOps-shaped)

What it is: a sample budget (texts/horizon10_budget_sample.json) and smoke that accumulates abstract units per action until deny—see Resource & cost envelopes in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Budget verify python scripts/horizon10_budget_smoke.py --verify Writes .tmp/horizon10-budget/run.json (horizon10_budget_run/1.0). Stdlib only.

What is still not H10 (full exit): live metering, distributed quotas, billing reconciliation.

How to test (local): python scripts/horizon10_budget_smoke.py --verify.

CI: .github/workflows/horizon10-smoke.yml.

Horizon 11: human feedback capture (JSONL)

What it is: validated newline-delimited JSON for label corrections (horizon11_feedback_record/1.0)—see Human outcome capture in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Feedback verify python scripts/horizon11_feedback_smoke.py --verify Writes .tmp/horizon11-feedback/ (sample_feedback.jsonl + run.json, horizon11_feedback_run/1.0). Stdlib only.

What is still not H11 (full exit): secure pipelines, PII policy, automated retraining.

How to test (local): python scripts/horizon11_feedback_smoke.py --verify.

CI: .github/workflows/horizon11-smoke.yml.

Horizon 12: provenance manifest (SHA-256 of pinned configs)

What it is: Integrity fingerprints for committed sample policies/budgets—see Provenance & integrity manifest in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Provenance verify python scripts/horizon12_provenance_smoke.py --verify Writes .tmp/horizon12-provenance/run.json (horizon12_provenance_run/1.0). Stdlib only.

What is still not H12 (full exit): signing, timestamp authorities, in-toto/Sigstore.

How to test (local): python scripts/horizon12_provenance_smoke.py --verify.

CI: .github/workflows/horizon12-smoke.yml.

Horizon 13: circuit breaker (resilience demo)

What it is: a state-machine exercise for OPEN / HALF_OPEN / CLOSED around a failing upstream—see Resilience: circuit breaker in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Circuit verify python scripts/horizon13_circuit_smoke.py --verify Writes .tmp/horizon13-circuit/run.json (horizon13_circuit_run/1.0). Stdlib only.

What is still not H13 (full exit): async middleware, distributed coordination, production metrics.

How to test (local): python scripts/horizon13_circuit_smoke.py --verify.

CI: .github/workflows/horizon13-smoke.yml.

Horizon 14: workflow DAG (topological order)

What it is: a linear inference DAG plus cycle detection and parallel-root sanity checks—see Orchestrated workflows in texts/further-development-universe-brain.md.

Piece What you run Why it helps
DAG verify python scripts/horizon14_workflow_smoke.py --verify Writes .tmp/horizon14-workflow/run.json (horizon14_workflow_run/1.0). Stdlib only.

What is still not H14 (full exit): retries, sagas, production orchestrator integration.

How to test (local): python scripts/horizon14_workflow_smoke.py --verify.

CI: .github/workflows/horizon14-smoke.yml.

Horizon 15: export envelope (field allow-lists)

What it is: texts/horizon15_export_envelope_sample.json defines allowed keys per export kind; the smoke rejects extra fields—see Data minimization & export envelopes in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Envelope verify python scripts/horizon15_export_smoke.py --verify Writes .tmp/horizon15-export/run.json (horizon15_export_run/1.0). Stdlib only.

What is still not H15 (full exit): encryption, legal sign-off, automated redaction pipelines.

How to test (local): python scripts/horizon15_export_smoke.py --verify.

CI: .github/workflows/horizon15-smoke.yml.

Horizon 16: semver compatibility (artifact readers)

What it is: a manifest of declared artifact versions vs a minimum reader — see Compatibility & versioning in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Semver verify python scripts/horizon16_semver_smoke.py --verify Writes .tmp/horizon16-semver/run.json (horizon16_semver_run/1.0). Stdlib only (numeric x.y.z).

What is still not H16 (full exit): full PEP 440, automated matrix across all consumers.

How to test (local): python scripts/horizon16_semver_smoke.py --verify.

CI: .github/workflows/horizon16-smoke.yml.

Horizon 17: degradation tiers (health → FULL…OFFLINE)

What it is: a deterministic ladder from a 0–100 health score to FULL / DEGRADED / MINIMAL / OFFLINE — see Graceful degradation in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Degrade verify python scripts/horizon17_degrade_smoke.py --verify Writes .tmp/horizon17-degrade/run.json (horizon17_degrade_run/1.0). Stdlib only.

What is still not H17 (full exit): wired probes, status pages, per-SKU docs.

How to test (local): python scripts/horizon17_degrade_smoke.py --verify.

CI: .github/workflows/horizon17-smoke.yml.

Horizon 18: operational readiness (phased checklist)

What it is: launch / game-day gates from structured phases and checks — see Operational readiness in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Readiness verify python scripts/horizon18_readiness_smoke.py --verify Loads texts/horizon18_readiness_sample.json; writes .tmp/horizon18-readiness/run.json (horizon18_readiness_run/1.0). Stdlib only.

What is still not H18 (full exit): CI wiring for every check, paging, rehearsal calendars.

How to test (local): python scripts/horizon18_readiness_smoke.py --verify.

CI: .github/workflows/horizon18-smoke.yml.

Horizon 19: audit hash chain (tamper detection)

What it is: a linear SHA-256 chain over synthetic audit events — see Tamper-evident audit trail in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Chain verify python scripts/horizon19_audit_chain_smoke.py --verify Writes .tmp/horizon19-audit-chain/run.json (horizon19_audit_chain_run/1.0). Stdlib only.

What is still not H19 (full exit): signing keys, Merkle batches, WORM storage.

How to test (local): python scripts/horizon19_audit_chain_smoke.py --verify.

CI: .github/workflows/horizon19-smoke.yml.

Horizon 20: feature flags (deterministic rollout)

What it is: hash bucket vs rollout_percent for staged releases — see Feature flags & staged rollout in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Flags verify python scripts/horizon20_flags_smoke.py --verify Loads texts/horizon20_flags_sample.json; writes .tmp/horizon20-flags/run.json (horizon20_flags_run/1.0). Stdlib only.

What is still not H20 (full exit): hosted flag services, experiments UI, audit exports.

How to test (local): python scripts/horizon20_flags_smoke.py --verify.

CI: .github/workflows/horizon20-smoke.yml.

Horizon 21: retention tiers (purge eligibility)

What it is: category TTL vs synthetic records at a fixed as_of date — see Data retention & purge eligibility in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Retention verify python scripts/horizon21_retention_smoke.py --verify Loads texts/horizon21_retention_sample.json; writes .tmp/horizon21-retention/run.json (horizon21_retention_run/1.0). Stdlib only.

What is still not H21 (full exit): legal holds, distributed backups, scheduler integration.

How to test (local): python scripts/horizon21_retention_smoke.py --verify.

CI: .github/workflows/horizon21-smoke.yml.

Horizon 22: token bucket (rate limiting smoke)

What it is: discrete tick / consume simulation with golden allows — see Rate limiting & fairness in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Bucket verify python scripts/horizon22_token_bucket_smoke.py --verify Loads texts/horizon22_token_bucket_sample.json; writes .tmp/horizon22-token-bucket/run.json (horizon22_token_bucket_run/1.0). Stdlib only.

What is still not H22 (full exit): wall-clock refill, distributed quotas, per-route limits.

How to test (local): python scripts/horizon22_token_bucket_smoke.py --verify.

CI: .github/workflows/horizon22-smoke.yml.

Horizon 23: blast radius (dependency impact)

What it is: failure propagation to a fixed point over depends_on edges — see Blast radius in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Blast verify python scripts/horizon23_blast_radius_smoke.py --verify Loads texts/horizon23_blast_sample.json; writes .tmp/horizon23-blast-radius/run.json (horizon23_blast_radius_run/1.0). Stdlib only.

What is still not H23 (full exit): redundancy paths, partial failures, regional graphs.

How to test (local): python scripts/horizon23_blast_radius_smoke.py --verify.

CI: .github/workflows/horizon23-smoke.yml.

Horizon 24: canary regression gates (baseline vs candidate)

What it is: bounded metric regression before promoting a candidate — see Canary promotion & regression gates in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Canary verify python scripts/horizon24_canary_gate_smoke.py --verify Loads texts/horizon24_canary_gate_sample.json; writes .tmp/horizon24-canary-gate/run.json (horizon24_canary_gate_run/1.0). Stdlib only.

What is still not H24 (full exit): shadow traffic wiring, CI ingestion of benchmark JSON from runners.

How to test (local): python scripts/horizon24_canary_gate_smoke.py --verify.

CI: .github/workflows/horizon24-smoke.yml.

Horizon 25: regional failover (preference order)

What it is: first healthy region from an ordered list — see Regional failover & traffic steering in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Failover verify python scripts/horizon25_failover_smoke.py --verify Loads texts/horizon25_failover_sample.json; writes .tmp/horizon25-failover/run.json (horizon25_failover_run/1.0). Stdlib only.

What is still not H25 (full exit): latency-aware steering, residency constraints, sticky routing.

How to test (local): python scripts/horizon25_failover_smoke.py --verify.

CI: .github/workflows/horizon25-smoke.yml.

Horizon 26: SLO error budget (observed vs allowed failures)

What it is: compare errors_observed to ⌊window × failure_budget_pct⌋ — see SLO error budget in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Budget verify python scripts/horizon26_error_budget_smoke.py --verify Loads texts/horizon26_error_budget_sample.json; writes .tmp/horizon26-error-budget/run.json (horizon26_error_budget_run/1.0). Stdlib only.

What is still not H26 (full exit): composite SLIs, burn-rate alerting, paging integrations.

How to test (local): python scripts/horizon26_error_budget_smoke.py --verify.

CI: .github/workflows/horizon26-smoke.yml.

Horizon 27: prompt injection gate (substring deny lists)

What it is: case-insensitive substring rules — see Prompt injection resistance in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Gate verify python scripts/horizon27_prompt_gate_smoke.py --verify Loads texts/horizon27_prompt_gate_sample.json; writes .tmp/horizon27-prompt-gate/run.json (horizon27_prompt_gate_run/1.0). Stdlib only.

What is still not H27 (full exit): ML moderation, multilingual tuning, tokenizer-aware scans.

How to test (local): python scripts/horizon27_prompt_gate_smoke.py --verify.

CI: .github/workflows/horizon27-smoke.yml.

Horizon 28: idempotency ledger (dedupe keyed requests)

What it is: ordered stream with first-seen keys — see Idempotent side-effects in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Idempotency verify python scripts/horizon28_idempotency_smoke.py --verify Loads texts/horizon28_idempotency_sample.json; writes .tmp/horizon28-idempotency/run.json (horizon28_idempotency_run/1.0). Stdlib only.

What is still not H28 (full exit): durable stores, conflict detection on mismatched payloads.

How to test (local): python scripts/horizon28_idempotency_smoke.py --verify.

CI: .github/workflows/horizon28-smoke.yml.

Horizon 29: SBOM semver bounds (pinned vs allowed interval)

What it is: [min, max_exclusive) numeric semver tuples — see Supply chain bounds in texts/further-development-universe-brain.md.

Piece What you run Why it helps
SBOM verify python scripts/horizon29_sbom_bounds_smoke.py --verify Loads texts/horizon29_sbom_bounds_sample.json; writes .tmp/horizon29-sbom-bounds/run.json (horizon29_sbom_bounds_run/1.0). Stdlib only.

What is still not H29 (full exit): PEP 440, prereleases, signed attestations.

How to test (local): python scripts/horizon29_sbom_bounds_smoke.py --verify.

CI: .github/workflows/horizon29-smoke.yml.

Horizon 30: leases & TTL (coordination smoke)

What it is: active leases vs wall-clock check_at — see Distributed coordination (leases) in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Lease verify python scripts/horizon30_lease_smoke.py --verify Loads texts/horizon30_lease_sample.json; writes .tmp/horizon30-lease/run.json (horizon30_lease_run/1.0). Stdlib only.

What is still not H30 (full exit): fencing tokens, renewal loops, skew budgets.

How to test (local): python scripts/horizon30_lease_smoke.py --verify.

CI: .github/workflows/horizon30-smoke.yml.

Horizon 31: cardinality budgets (distinct values per dimension)

What it is: batch distinct-count caps — see Cardinality & observability budgets in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Cardinality verify python scripts/horizon31_cardinality_smoke.py --verify Loads texts/horizon31_cardinality_sample.json; writes .tmp/horizon31-cardinality/run.json (horizon31_cardinality_run/1.0). Stdlib only.

What is still not H31 (full exit): approximate sketches, streaming windows, tenant isolation.

How to test (local): python scripts/horizon31_cardinality_smoke.py --verify.

CI: .github/workflows/horizon31-smoke.yml.

Horizon 32: consumer lag (streaming backlog)

What it is: lag_units vs max_lag_allowed — see Streaming backlog & consumer lag in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Lag verify python scripts/horizon32_consumer_lag_smoke.py --verify Loads texts/horizon32_consumer_lag_sample.json; writes .tmp/horizon32-consumer-lag/run.json (horizon32_consumer_lag_run/1.0). Stdlib only.

What is still not H32 (full exit): Kafka-specific semantics, consumer groups, checkpoint protocols.

How to test (local): python scripts/horizon32_consumer_lag_smoke.py --verify.

CI: .github/workflows/horizon32-smoke.yml.

Horizon 33: purpose limitation matrix (lawful basis × purpose)

What it is: explicit allowed_pairs for (legal_basis, processing_purpose) — see Purpose limitation in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Purpose verify python scripts/horizon33_purpose_matrix_smoke.py --verify Loads texts/horizon33_purpose_matrix_sample.json; writes .tmp/horizon33-purpose-matrix/run.json (horizon33_purpose_matrix_run/1.0). Stdlib only.

What is still not H33 (full exit): legal review, DPIAs, transfers, sector rules.

How to test (local): python scripts/horizon33_purpose_matrix_smoke.py --verify.

CI: .github/workflows/horizon33-smoke.yml.

Horizon 34: quorum (strict majority votes)

What it is: votes_yes × 2 > replicas_total — see Distributed quorum in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Quorum verify python scripts/horizon34_quorum_smoke.py --verify Loads texts/horizon34_quorum_sample.json; writes .tmp/horizon34-quorum/run.json (horizon34_quorum_run/1.0). Stdlib only.

What is still not H34 (full exit): Byzantine quorum math, weighted voters, raft semantics.

How to test (local): python scripts/horizon34_quorum_smoke.py --verify.

CI: .github/workflows/horizon34-smoke.yml.

Horizon 35: cryptographic suite policy (algorithm + minimum key bits)

What it is: allow-list algorithm plus key_bits_min — see Cryptographic suite policy in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Crypto verify python scripts/horizon35_crypto_suite_smoke.py --verify Loads texts/horizon35_crypto_suite_sample.json; writes .tmp/horizon35-crypto-suite/run.json (horizon35_crypto_suite_run/1.0). Stdlib only.

What is still not H35 (full exit): TLS negotiation order, PQ hybrids, HSM attestation.

How to test (local): python scripts/horizon35_crypto_suite_smoke.py --verify.

CI: .github/workflows/horizon35-smoke.yml.

Horizon 36: maintenance freeze windows (UTC intervals)

What it is: frozen inside [start, end) intervals — see Maintenance freeze windows in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Freeze verify python scripts/horizon36_maintenance_freeze_smoke.py --verify Loads texts/horizon36_maintenance_freeze_sample.json; writes .tmp/horizon36-maintenance-freeze/run.json (horizon36_maintenance_freeze_run/1.0). Stdlib only.

What is still not H36 (full exit): RRULE calendars, regional tz overlays, break-glass audit hooks.

How to test (local): python scripts/horizon36_maintenance_freeze_smoke.py --verify.

CI: .github/workflows/horizon36-smoke.yml.

Horizon 37: pair cardinality (dim_a × dim_b explosion guard)

What it is: cap distinct pairs across dim_a × dim_b — see Pair cardinality in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Pair cardinality verify python scripts/horizon37_pair_cardinality_smoke.py --verify Loads texts/horizon37_pair_cardinality_sample.json; writes .tmp/horizon37-pair-cardinality/run.json (horizon37_pair_cardinality_run/1.0). Stdlib only.

What is still not H37 (full exit): three-way tuples, streaming sketches.

How to test (local): python scripts/horizon37_pair_cardinality_smoke.py --verify.

CI: .github/workflows/horizon37-smoke.yml.

Horizon 38: monotonic watermarks (checkpoint series)

What it is: adjacent integer series never decreases — see Monotonic checkpoints in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Watermark verify python scripts/horizon38_watermark_smoke.py --verify Loads texts/horizon38_watermark_sample.json; writes .tmp/horizon38-watermark/run.json (horizon38_watermark_run/1.0). Stdlib only.

What is still not H38 (full exit): per-partition vectors, Kafka ISR semantics.

How to test (local): python scripts/horizon38_watermark_smoke.py --verify.

CI: .github/workflows/horizon38-smoke.yml.

Horizon 39: job mutex pairs (schedule conflicts)

What it is: mutex_pairs forbid scheduling both jobs together — see Mutually exclusive jobs in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Mutex verify python scripts/horizon39_job_mutex_smoke.py --verify Loads texts/horizon39_job_mutex_sample.json; writes .tmp/horizon39-job-mutex/run.json (horizon39_job_mutex_run/1.0). Stdlib only.

What is still not H39 (full exit): durations, capacities, orchestrator backends.

How to test (local): python scripts/horizon39_job_mutex_smoke.py --verify.

CI: .github/workflows/horizon39-smoke.yml.

Horizon 40: composite policy AND (all gates)

What it is: composite_ok iff every gate passes — see Composite policy AND in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Policy AND verify python scripts/horizon40_policy_and_smoke.py --verify Loads texts/horizon40_policy_and_sample.json; writes .tmp/horizon40-policy-and/run.json (horizon40_policy_and_run/1.0). Stdlib only.

What is still not H40 (full exit): OR groups, weighted scores, dynamic gate lists.

How to test (local): python scripts/horizon40_policy_and_smoke.py --verify.

CI: .github/workflows/horizon40-smoke.yml.

Horizon 41: geo-fence / data residency (region allow-list)

What it is: allowed iff region ∈ allowed_regions — see Geo-fence / data residency in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Geo-fence verify python scripts/horizon41_geo_fence_smoke.py --verify Loads texts/horizon41_geo_fence_sample.json; writes .tmp/horizon41-geo-fence/run.json (horizon41_geo_fence_run/1.0). Stdlib only.

What is still not H41 (full exit): multi-region failover semantics, lineage proofs, private interconnect routing.

How to test (local): python scripts/horizon41_geo_fence_smoke.py --verify.

CI: .github/workflows/horizon41-smoke.yml.

Horizon 42: egress allow-list (tool / outbound URL hostname gate)

What it is: outbound url hostnames must match allowed_hosts (exact or registrable suffix when the rule contains .) — see Egress allow-list in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Egress verify python scripts/horizon42_egress_allow_smoke.py --verify Loads texts/horizon42_egress_allow_sample.json; writes .tmp/horizon42-egress-allow/run.json (horizon42_egress_allow_run/1.0). Stdlib only.

What is still not H42 (full exit): glob patterns, IP allow-lists, DNS rebinding defenses.

How to test (local): python scripts/horizon42_egress_allow_smoke.py --verify.

CI: .github/workflows/horizon42-smoke.yml.

Horizon 43: credential / session freshness (max age ceiling)

What it is: valid iff age_seconds ≤ max_age_seconds — see Credential freshness in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Credential age verify python scripts/horizon43_credential_age_smoke.py --verify Loads texts/horizon43_credential_age_sample.json; writes .tmp/horizon43-credential-age/run.json (horizon43_credential_age_run/1.0). Stdlib only.

What is still not H43 (full exit): narrow JWT validity windows, replay caches, rotation webhooks.

How to test (local): python scripts/horizon43_credential_age_smoke.py --verify.

CI: .github/workflows/horizon43-smoke.yml.

Horizon 44: optimistic concurrency (revision match)

What it is: apply_ok iff client_revision == stored_revision — see Optimistic concurrency in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Revision gate verify python scripts/horizon44_optimistic_lock_smoke.py --verify Loads texts/horizon44_optimistic_lock_sample.json; writes .tmp/horizon44-optimistic-lock/run.json (horizon44_optimistic_lock_run/1.0). Stdlib only.

What is still not H44 (full exit): vector clocks, CRDT merges, merge UX.

How to test (local): python scripts/horizon44_optimistic_lock_smoke.py --verify.

CI: .github/workflows/horizon44-smoke.yml.

Horizon 45: payload size ceiling (ingress max-bytes)

What it is: allowed iff bytes ≤ max_bytes — see Payload size ceiling in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Payload size verify python scripts/horizon45_payload_size_smoke.py --verify Loads texts/horizon45_payload_size_sample.json; writes .tmp/horizon45-payload-size/run.json (horizon45_payload_size_run/1.0). Stdlib only.

What is still not H45 (full exit): per-tenant quotas, compression bombs, WebSocket frame limits.

How to test (local): python scripts/horizon45_payload_size_smoke.py --verify.

CI: .github/workflows/horizon45-smoke.yml.

Horizon 46: latency tail budget (approximate p99 vs ceiling)

What it is: sort samples_ms, approximate p99 (ceil(0.99·n)−1 rank), compare to max_p99_ms against expect_under_budget — see Latency tail budget in texts/further-development-universe-brain.md.

Piece What you run Why it helps
p99 verify python scripts/horizon46_latency_p99_smoke.py --verify Loads texts/horizon46_latency_p99_sample.json; writes .tmp/horizon46-latency-p99/run.json (horizon46_latency_p99_run/1.0). Stdlib only.

What is still not H46 (full exit): HDR histograms, weighted SLIs, multi-region tails.

How to test (local): python scripts/horizon46_latency_p99_smoke.py --verify.

CI: .github/workflows/horizon46-smoke.yml.

Horizon 47: kill switch (global deny)

What it is: allowed iff the kill switch is off and policy_allow is true — see Kill switch in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Kill switch verify python scripts/horizon47_kill_switch_smoke.py --verify Loads texts/horizon47_kill_switch_sample.json; writes .tmp/horizon47-kill-switch/run.json (horizon47_kill_switch_run/1.0). Stdlib only.

What is still not H47 (full exit): scoped kills, gradual drains, audited expiry.

How to test (local): python scripts/horizon47_kill_switch_smoke.py --verify.

CI: .github/workflows/horizon47-smoke.yml.

Horizon 48: dual control (distinct approvers)

What it is: pass_gate iff unique approver count ≥ min_distinct_approvers — see Dual control in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Dual control verify python scripts/horizon48_dual_control_smoke.py --verify Loads texts/horizon48_dual_control_sample.json; writes .tmp/horizon48-dual-control/run.json (horizon48_dual_control_run/1.0). Stdlib only.

What is still not H48 (full exit): SSO-bound identities, role matrices, duty calendars.

How to test (local): python scripts/horizon48_dual_control_smoke.py --verify.

CI: .github/workflows/horizon48-smoke.yml.

Horizon 49: pinned artifact digest (promotion gate)

What it is: allow iff artifact_sha256 equals pinned_sha256 (hex, case-insensitive) — see Pinned artifact digest in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Digest pin verify python scripts/horizon49_digest_pin_smoke.py --verify Loads texts/horizon49_digest_pin_sample.json; writes .tmp/horizon49-digest-pin/run.json (horizon49_digest_pin_run/1.0). Stdlib only.

What is still not H49 (full exit): Sigstore attestations, OCI digest locks, SBOM linkage.

How to test (local): python scripts/horizon49_digest_pin_smoke.py --verify.

CI: .github/workflows/horizon49-smoke.yml.

Horizon 50: wire-format major-version compatibility

What it is: compatible iff server_schema_major ≥ required_minimum_major — see Wire-format major-version compatibility in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Schema compat verify python scripts/horizon50_schema_compat_smoke.py --verify Loads texts/horizon50_schema_compat_sample.json; writes .tmp/horizon50-schema-compat/run.json (horizon50_schema_compat_run/1.0). Stdlib only.

What is still not H50 (full exit): minor negotiation, feature bits, codegen bridges.

How to test (local): python scripts/horizon50_schema_compat_smoke.py --verify.

CI: .github/workflows/horizon50-smoke.yml.

Horizon 51: quota headroom (storage utilization ceiling)

What it is: under_budget iff utilization_pct ≤ max_utilization_pct — see Quota headroom in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Quota headroom verify python scripts/horizon51_quota_headroom_smoke.py --verify Loads texts/horizon51_quota_headroom_sample.json; writes .tmp/horizon51-quota-headroom/run.json (horizon51_quota_headroom_run/1.0). Stdlib only.

What is still not H51 (full exit): predictive capacity, inode caps, replication slack models.

How to test (local): python scripts/horizon51_quota_headroom_smoke.py --verify.

CI: .github/workflows/horizon51-smoke.yml.

Horizon 52: RBAC subset gate (required roles covered by grant)

What it is: allow iff every required role appears in granted roles — see RBAC subset gate in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Role subset verify python scripts/horizon52_role_subset_smoke.py --verify Loads texts/horizon52_role_subset_sample.json; writes .tmp/horizon52-role-subset/run.json (horizon52_role_subset_run/1.0). Stdlib only.

What is still not H52 (full exit): ABAC, hierarchical roles, JIT elevation.

How to test (local): python scripts/horizon52_role_subset_smoke.py --verify.

CI: .github/workflows/horizon52-smoke.yml.

Horizon 53: dry-run gate (no mutating work in simulation)

What it is: allow unless both dry_run and mutating_operation are true — see Dry-run gate in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Dry-run verify python scripts/horizon53_dry_run_gate_smoke.py --verify Loads texts/horizon53_dry_run_gate_sample.json; writes .tmp/horizon53-dry-run-gate/run.json (horizon53_dry_run_gate_run/1.0). Stdlib only.

What is still not H53 (full exit): shadow environments, mixed batches, replay audits.

How to test (local): python scripts/horizon53_dry_run_gate_smoke.py --verify.

CI: .github/workflows/horizon53-smoke.yml.

Horizon 54: backup recency (RPO-style age ceiling)

What it is: compliant iff backup_age_hours ≤ max_allowed_age_hours — see Backup recency in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Backup recency verify python scripts/horizon54_backup_recency_smoke.py --verify Loads texts/horizon54_backup_recency_sample.json; writes .tmp/horizon54-backup-recency/run.json (horizon54_backup_recency_run/1.0). Stdlib only.

What is still not H54 (full exit): replication lag proofs, immutable vaults, restore drills.

How to test (local): python scripts/horizon54_backup_recency_smoke.py --verify.

CI: .github/workflows/horizon54-smoke.yml.

Horizon 55: encryption required for sensitive tier

What it is: allow if not sensitive or encryption at rest is enabled — see Encryption required in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Encryption tier verify python scripts/horizon55_encryption_required_smoke.py --verify Loads texts/horizon55_encryption_required_sample.json; writes .tmp/horizon55-encryption-required/run.json (horizon55_encryption_required_run/1.0). Stdlib only.

What is still not H55 (full exit): CMKs, field-level crypto, tenant KMS isolation.

How to test (local): python scripts/horizon55_encryption_required_smoke.py --verify.

CI: .github/workflows/horizon55-smoke.yml.

Horizon 56: TLS version allow-list (ingress)

What it is: allow iff offered_tls_version is listed under allowed_tls_versions — see TLS version allow-list in texts/further-development-universe-brain.md.

Piece What you run Why it helps
TLS version verify python scripts/horizon56_tls_version_smoke.py --verify Loads texts/horizon56_tls_version_sample.json; writes .tmp/horizon56-tls-version/run.json (horizon56_tls_version_run/1.0). Stdlib only.

What is still not H56 (full exit): live handshake probes, per-listener matrices, downgrade monitors.

How to test (local): python scripts/horizon56_tls_version_smoke.py --verify.

CI: .github/workflows/horizon56-smoke.yml.

Horizon 57: severity routing (pager for critical severities)

What it is: compliant iff severity does not require paging or pager_sent is true — see Severity routing in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Severity pager verify python scripts/horizon57_sev_pager_smoke.py --verify Loads texts/horizon57_sev_pager_sample.json; writes .tmp/horizon57-sev-pager/run.json (horizon57_sev_pager_run/1.0). Stdlib only.

What is still not H57 (full exit): rotation calendars, multi-channel paging, customer status bridges.

How to test (local): python scripts/horizon57_sev_pager_smoke.py --verify.

CI: .github/workflows/horizon57-smoke.yml.

Horizon 58: vulnerability budget (critical/high ceilings)

What it is: compliant iff critical_open and high_open counts stay within max_critical_open / max_high_open — see Vulnerability budget in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Vuln budget verify python scripts/horizon58_vuln_budget_smoke.py --verify Loads texts/horizon58_vuln_budget_sample.json; writes .tmp/horizon58-vuln-budget/run.json (horizon58_vuln_budget_run/1.0). Stdlib only.

What is still not H58 (full exit): scanner quorum, reachability proofs, waiver workflows.

How to test (local): python scripts/horizon58_vuln_budget_smoke.py --verify.

CI: .github/workflows/horizon58-smoke.yml.

Horizon 59: signature gate (release channels)

What it is: allow iff the channel does not require signatures or signature_valid is true — see Signature gate in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Signature gate verify python scripts/horizon59_signature_gate_smoke.py --verify Loads texts/horizon59_signature_gate_sample.json; writes .tmp/horizon59-signature-gate/run.json (horizon59_signature_gate_run/1.0). Stdlib only.

What is still not H59 (full exit): Sigstore transparency, threshold signing, attestations.

How to test (local): python scripts/horizon59_signature_gate_smoke.py --verify.

CI: .github/workflows/horizon59-smoke.yml.

Horizon 60: SPDX license allow-list (supply-chain policy)

What it is: compliant iff each dependency_license (normalized) appears in allowed_license_ids — see SPDX license allow-list in texts/further-development-universe-brain.md.

Piece What you run Why it helps
License allow verify python scripts/horizon60_license_allow_smoke.py --verify Loads texts/horizon60_license_allow_sample.json; writes .tmp/horizon60-license-allow/run.json (horizon60_license_allow_run/1.0). Stdlib only.

What is still not H60 (full exit): composite SPDX expressions, transitive graphs, counsel workflows.

How to test (local): python scripts/horizon60_license_allow_smoke.py --verify.

CI: .github/workflows/horizon60-smoke.yml.

Horizon 61: maintainer quorum (bus-factor floor)

What it is: compliant iff maintainer_count ≥ min_maintainers — see Maintainer quorum in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Maintainer quorum verify python scripts/horizon61_maintainer_quorum_smoke.py --verify Loads texts/horizon61_maintainer_quorum_sample.json; writes .tmp/horizon61-maintainer-quorum/run.json (horizon61_maintainer_quorum_run/1.0). Stdlib only.

What is still not H61 (full exit): CODEOWNER coverage, verified identities, succession drills.

How to test (local): python scripts/horizon61_maintainer_quorum_smoke.py --verify.

CI: .github/workflows/horizon61-smoke.yml.

Horizon 62: protected-branch merge gate (approvals + CI)

What it is: allow_merge iff the branch is not listed under protected_branches, or min_approvals_met and ci_green are both true — see Protected-branch merge gate in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Branch protect verify python scripts/horizon62_branch_protect_smoke.py --verify Loads texts/horizon62_branch_protect_sample.json; writes .tmp/horizon62-branch-protect/run.json (horizon62_branch_protect_run/1.0). Stdlib only.

What is still not H62 (full exit): merge queues, required contexts lists, stacked PR semantics.

How to test (local): python scripts/horizon62_branch_protect_smoke.py --verify.

CI: .github/workflows/horizon62-smoke.yml.

Horizon 63: secret-scan sweep ceiling

What it is: compliant iff open_secret_findings ≤ max_open_secret_findings — see Secret-scan sweep ceiling in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Secret sweep verify python scripts/horizon63_secret_sweep_smoke.py --verify Loads texts/horizon63_secret_sweep_sample.json; writes .tmp/horizon63-secret-sweep/run.json (horizon63_secret_sweep_run/1.0). Stdlib only.

What is still not H63 (full exit): entropy tuning, KMS-linked blast radius, waiver workflows.

How to test (local): python scripts/horizon63_secret_sweep_smoke.py --verify.

CI: .github/workflows/horizon63-smoke.yml.

Horizon 64: container image age ceiling (freshness)

What it is: compliant iff image_age_days ≤ max_image_age_days — see Container image age ceiling in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Image age verify python scripts/horizon64_image_age_smoke.py --verify Loads texts/horizon64_image_age_sample.json; writes .tmp/horizon64-image-age/run.json (horizon64_image_age_run/1.0). Stdlib only.

What is still not H64 (full exit): rolling rebuild automation, digest-only pulls, mirror skew handling.

How to test (local): python scripts/horizon64_image_age_smoke.py --verify.

CI: .github/workflows/horizon64-smoke.yml.

Horizon 65: RCA documentation deadline (post-incident SLA)

What it is: compliant iff severity is not in severities_requiring_rca, or hours_to_rca_doc ≤ max_hours_to_rca_doc — see RCA documentation deadline in texts/further-development-universe-brain.md.

Piece What you run Why it helps
RCA deadline verify python scripts/horizon65_rca_deadline_smoke.py --verify Loads texts/horizon65_rca_deadline_sample.json; writes .tmp/horizon65-rca-deadline/run.json (horizon65_rca_deadline_run/1.0). Stdlib only.

What is still not H65 (full exit): blameless templates, tracked corrective actions, legal holds.

How to test (local): python scripts/horizon65_rca_deadline_smoke.py --verify.

CI: .github/workflows/horizon65-smoke.yml.

Horizon 66: deprecated dependency ceiling (tech-debt gate)

What it is: compliant iff deprecated_dependency_count ≤ max_deprecated_dependencies — see Deprecated dependency ceiling in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Deprecated dep verify python scripts/horizon66_deprecated_dep_smoke.py --verify Loads texts/horizon66_deprecated_dep_sample.json; writes .tmp/horizon66-deprecated-dep/run.json (horizon66_deprecated_dep_run/1.0). Stdlib only.

What is still not H66 (full exit): transitive graphs, codemods, vendor RFC timelines.

How to test (local): python scripts/horizon66_deprecated_dep_smoke.py --verify.

CI: .github/workflows/horizon66-smoke.yml.

Horizon 67: DSAR export SLA (tiered turnaround)

What it is: compliant iff customer_tier is not in customer_tiers_requiring_fast_export, or hours_to_complete_export ≤ max_hours_to_complete_export — see DSAR export SLA in texts/further-development-universe-brain.md.

Piece What you run Why it helps
DSAR export verify python scripts/horizon67_dsar_export_smoke.py --verify Loads texts/horizon67_dsar_export_sample.json; writes .tmp/horizon67-dsar-export/run.json (horizon67_dsar_export_run/1.0). Stdlib only.

What is still not H67 (full exit): identity proofs, jurisdictional carve-outs, portal workflows.

How to test (local): python scripts/horizon67_dsar_export_smoke.py --verify.

CI: .github/workflows/horizon67-smoke.yml.

Horizon 68: blocking static-analysis ceiling (quality gate)

What it is: compliant iff blocking_static_findings ≤ max_blocking_static_findings — see Blocking static-analysis ceiling in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Static block verify python scripts/horizon68_static_block_smoke.py --verify Loads texts/horizon68_static_block_sample.json; writes .tmp/horizon68-static-block/run.json (horizon68_static_block_run/1.0). Stdlib only.

What is still not H68 (full exit): per-language packs, incremental ratchets, autofix bots.

How to test (local): python scripts/horizon68_static_block_smoke.py --verify.

CI: .github/workflows/horizon68-smoke.yml.

Horizon 69: vendor API host allow-list (subprocessor egress)

What it is: allow iff requested_host (normalized) is listed under allowed_api_hosts — see Vendor API host allow-list in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Vendor host verify python scripts/horizon69_vendor_host_smoke.py --verify Loads texts/horizon69_vendor_host_sample.json; writes .tmp/horizon69-vendor-host/run.json (horizon69_vendor_host_run/1.0). Stdlib only.

What is still not H69 (full exit): suffix wildcards, mTLS pinning, regional VPC endpoints.

How to test (local): python scripts/horizon69_vendor_host_smoke.py --verify.

CI: .github/workflows/horizon69-smoke.yml.

Horizon 70: major-incident backlog ceiling (ops hygiene)

What it is: compliant iff open_major_incidents ≤ max_open_major_incidents — see Major-incident backlog ceiling in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Incident backlog verify python scripts/horizon70_incident_backlog_smoke.py --verify Loads texts/horizon70_incident_backlog_sample.json; writes .tmp/horizon70-incident-backlog/run.json (horizon70_incident_backlog_run/1.0). Stdlib only.

What is still not H70 (full exit): severity rubrics, dedup across regions, executive dashboards.

How to test (local): python scripts/horizon70_incident_backlog_smoke.py --verify.

CI: .github/workflows/horizon70-smoke.yml.

Horizon 71: payment overdue grace window (subscription continuity)

What it is: allow_service iff hours_past_due ≤ max_hours_past_due_allowed — see Payment overdue grace window in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Payment grace verify python scripts/horizon71_payment_grace_smoke.py --verify Loads texts/horizon71_payment_grace_sample.json; writes .tmp/horizon71-payment-grace/run.json (horizon71_payment_grace_run/1.0). Stdlib only.

What is still not H71 (full exit): dunning ladders, processor webhooks, enterprise invoicing carve-outs.

How to test (local): python scripts/horizon71_payment_grace_smoke.py --verify.

CI: .github/workflows/horizon71-smoke.yml.

Horizon 72: P1 ticket backlog ceiling (support load)

What it is: compliant iff open_p1_tickets ≤ max_open_p1_tickets — see P1 ticket backlog ceiling in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Ticket backlog verify python scripts/horizon72_ticket_backlog_smoke.py --verify Loads texts/horizon72_ticket_backlog_sample.json; writes .tmp/horizon72-ticket-backlog/run.json (horizon72_ticket_backlog_run/1.0). Stdlib only.

What is still not H72 (full exit): tier-specific SLAs, holiday staffing, multi-product queues.

How to test (local): python scripts/horizon72_ticket_backlog_smoke.py --verify.

CI: .github/workflows/horizon72-smoke.yml.

Horizon 73: contract renewal notice (vendor runway)

What it is: compliant iff auto_renew is true or days_until_expiry ≥ min_notice_days_before_expiry — see Contract renewal notice in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Contract notice verify python scripts/horizon73_contract_notice_smoke.py --verify Loads texts/horizon73_contract_notice_sample.json; writes .tmp/horizon73-contract-notice/run.json (horizon73_contract_notice_run/1.0). Stdlib only.

What is still not H73 (full exit): procurement workflows, BAAs, termination clauses.

How to test (local): python scripts/horizon73_contract_notice_smoke.py --verify.

CI: .github/workflows/horizon73-smoke.yml.

Horizon 74: penetration-test critical-findings ceiling

What it is: compliant iff open_critical_pen_test_findings ≤ max_open_critical_pen_test_findings — see Penetration-test critical-findings ceiling in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Pen-test findings verify python scripts/horizon74_pentest_findings_smoke.py --verify Loads texts/horizon74_pentest_findings_sample.json; writes .tmp/horizon74-pentest-findings/run.json (horizon74_pentest_findings_run/1.0). Stdlib only.

What is still not H74 (full exit): retest SLAs, scoped scopes, bounty program links.

How to test (local): python scripts/horizon74_pentest_findings_smoke.py --verify.

CI: .github/workflows/horizon74-smoke.yml.

Horizon 75: disaster-recovery drill recency

What it is: compliant iff months_since_last_drill ≤ max_months_between_drills — see Disaster-recovery drill recency in texts/further-development-universe-brain.md.

Piece What you run Why it helps
DR drill recency verify python scripts/horizon75_drill_recency_smoke.py --verify Loads texts/horizon75_drill_recency_sample.json; writes .tmp/horizon75-drill-recency/run.json (horizon75_drill_recency_run/1.0). Stdlib only.

What is still not H75 (full exit): multi-region game days, fault injection, auditor evidence stores.

How to test (local): python scripts/horizon75_drill_recency_smoke.py --verify.

CI: .github/workflows/horizon75-smoke.yml.

Horizon 76: accessibility blocker ceiling

What it is: compliant iff open_a11y_blockers ≤ max_open_a11y_blockers — see Accessibility blocker ceiling in texts/further-development-universe-brain.md.

Piece What you run Why it helps
A11y blocker verify python scripts/horizon76_a11y_block_smoke.py --verify Loads texts/horizon76_a11y_block_sample.json; writes .tmp/horizon76-a11y-block/run.json (horizon76_a11y_block_run/1.0). Stdlib only.

What is still not H76 (full exit): axe-core CI wiring, manual audits, AT user panels.

How to test (local): python scripts/horizon76_a11y_block_smoke.py --verify.

CI: .github/workflows/horizon76-smoke.yml.

Horizon 77: legal-hold mutation gate

What it is: allow_mutate iff ¬legal_hold_active ∨ counsel_override — see Legal-hold mutation gate in texts/further-development-universe-brain.md.

Piece What you run Why it helps
Legal hold verify python scripts/horizon77_legal_hold_smoke.py --verify Loads texts/horizon77_legal_hold_sample.json; writes .tmp/horizon77-legal-hold/run.json (horizon77_legal_hold_run/1.0). Stdlib only.

What is still not H77 (full exit): custodian workflows, chain-of-custody logs, e-discovery bridges.

How to test (local): python scripts/horizon77_legal_hold_smoke.py --verify.

CI: .github/workflows/horizon77-smoke.yml.

Training script: evaluation and artifacts

The canonical training implementation is scripts/train_tinymodel1_classifier.py. scripts/train_tinymodel1_agnews.py is a thin wrapper that calls the same main() with AG News–friendly defaults.

Function / area Role
parse_args() CLI for dataset id, splits, text/label columns, caps, hyperparameters, --seed, and Hub card metadata.
set_seed() Sets Python, NumPy, and PyTorch RNGs so runs are repeatable for a given --seed.
load_splits() Loads the Hub dataset, selects train/eval split names, shuffles each split with seed, then takes the first N rows (--max-train-samples, --max-eval-samples).
infer_text_column() Picks the text column if you do not pass --text-column.
resolve_label_names() / build_label_maps() / rows_to_model_inputs() Resolve class names, map raw labels to contiguous ids, and build Dataset columns for training.
build_tokenizer() Trains a WordPiece tokenizer on training texts and writes tokenizer files under the output dir.
evaluate() / evaluate_with_details() Runs eval, builds the confusion matrix and EvalMetrics; evaluate_with_details also records per-example max softmax (winner) probability for calibration histograms.
write_eval_report() Writes eval_report.json: reproducibility, metrics, plus optional dataset_quality, error_analysis, calibration, routing (see Phase 2 section).
write_misclassified_jsonl() Writes misclassified_sample.jsonl (up to N lines) for manual error review.
write_manifest() Writes artifact.json: training config, labels, and summary metrics for downstream tooling.
write_model_card() Writes Hub-style README.md next to the weights (model card with eval summary).
copy_model_card_image() Optionally copies TinyModel1Image.png into the output dir for the card banner.

How the eval subset is defined (same script, same seed → same rows) is documented in texts/eval-reproducibility.md.

Second reference dataset (Emotion)

Besides AG News (train_tinymodel1_agnews.py), this repo includes a second single-label task on the Hub emotion (English short text, 6 classes: sadness, joy, love, anger, fear, surprise). It uses the same training code path; only dataset id, eval split, and label names are preset.

Entry point Dataset Eval split (default)
scripts/train_tinymodel1_agnews.py fancyzhx/ag_news test
scripts/train_tinymodel1_emotion.py emotion validation
scripts/train_tinymodel1_sst2.py glue (sst2) validation

Equivalent explicit CLI (if you prefer not to use the wrapper):

python scripts/train_tinymodel1_classifier.py \
  --dataset emotion \
  --eval-split validation \
  --labels sadness,joy,love,anger,fear,surprise \
  --output-dir .tmp/TinyModel-emotion

Instant smoke test (small samples, ~1 minute on CPU; needs network to download emotion once):

python scripts/train_tinymodel1_emotion.py \
  --output-dir artifacts/emotion-smoke \
  --max-train-samples 200 \
  --max-eval-samples 100 \
  --epochs 1 \
  --batch-size 8 \
  --seed 42

Then check artifacts/emotion-smoke/eval_report.jsonreproducibility.dataset should be emotion and label_order should list the six emotion names. For other Hub datasets, pass --dataset, splits, and optional --labels / --text-column to train_tinymodel1_classifier.py directly.

Embeddings smoke test (routing / search-shaped)

scripts/embeddings_smoke_test.py runs TinyModelRuntime on a few queries: classification probabilities, pairwise similarity, and retrieval over a toy candidate list (support/triage scenario).

What these terms mean

  • Classification probabilities — Output of TinyModelRuntime.classify(...): for each input text, the model returns a probability distribution across all labels (values sum to ~1.0). Use this for routing decisions and confidence-aware thresholds.
  • Pairwise similarity — Output of TinyModelRuntime.similarity(text_a, text_b): cosine similarity between two sentence embeddings (from the encoder). Higher values mean semantically closer text under this model.
  • Retrieval — Output of TinyModelRuntime.retrieve(query, candidates, top_k=...): ranks candidate texts by embedding similarity to a query and returns top matches with scores and indices.

Instant test (needs a checkpoint — train the tiny eval run first, or pass a Hub id):

python scripts/train_tinymodel1_classifier.py \
  --output-dir artifacts/eval-smoke --max-train-samples 120 --max-eval-samples 80 \
  --epochs 1 --batch-size 8 --seed 42
python scripts/embeddings_smoke_test.py --model artifacts/eval-smoke
# Or: python scripts/embeddings_smoke_test.py --model HyperlinksSpace/TinyModel1

Pretrained encoder fine-tune (compare to scratch baseline)

scripts/finetune_pretrained_classifier.py fine-tunes AutoModelForSequenceClassification from --base-model (default distilbert-base-uncased) using the same splits and metrics as the scratch trainer. Use matching --seed and sample caps, then compare eval_report.json / artifact.json to a scratch run.

Instant test (downloads base weights once; CPU-friendly small run):

python scripts/finetune_pretrained_classifier.py \
  --output-dir artifacts/finetune-smoke \
  --base-model distilbert-base-uncased \
  --max-train-samples 400 --max-eval-samples 200 \
  --epochs 1 --batch-size 8 --seed 42

Custom labels and data hygiene

For proprietary or weakly labeled data: use a short label guide, versioned snapshots, and leakage-safe splits. See texts/labeling-and-data-hygiene.md.

Current implementation status (what, why, how to run)

This section summarizes the currently implemented components and their practical purpose.

Part What it is for How to launch What to verify
Scratch baseline training (scripts/train_tinymodel1_classifier.py) Build a small from-scratch text classifier baseline and export all model artifacts. python scripts/train_tinymodel1_classifier.py --output-dir artifacts/eval-smoke --max-train-samples 120 --max-eval-samples 80 --epochs 1 --batch-size 8 --seed 42 artifacts/eval-smoke/eval_report.json exists and includes accuracy, macro_f1, per_class_f1, confusion_matrix.
Second dataset path (scripts/train_tinymodel1_emotion.py) Prove the same pipeline works on another Hub dataset without forking core training code. python scripts/train_tinymodel1_emotion.py --output-dir artifacts/emotion-smoke --max-train-samples 200 --max-eval-samples 100 --epochs 1 --batch-size 8 --seed 42 reproducibility.dataset == "emotion" and 6 labels in label_order.
Embeddings/runtime smoke (scripts/embeddings_smoke_test.py) Validate product-shaped runtime behavior: classify, similarity, retrieval. python scripts/embeddings_smoke_test.py --model artifacts/eval-smoke (or --model HyperlinksSpace/TinyModel1) Script prints all 3 blocks and ends with Embeddings smoke test completed.
Pretrained fine-tune path (scripts/finetune_pretrained_classifier.py) Compare a pretrained encoder baseline (DistilBERT/BERT-family) against scratch training using same eval reporting format. python scripts/finetune_pretrained_classifier.py --output-dir artifacts/finetune-smoke --base-model distilbert-base-uncased --max-train-samples 400 --max-eval-samples 200 --epochs 1 --batch-size 8 --seed 42 artifacts/finetune-smoke/eval_report.json + artifact.json exist; compare metrics to scratch run on same caps/seed.
Data hygiene guide (texts/labeling-and-data-hygiene.md) Lightweight rules for label quality, versioning, and leakage prevention when moving to custom/proprietary data. Read the file and apply before collecting custom labels. Label guide versioning and split hygiene rules are defined before annotation scale-up.
Kaggle→HF training workflow hardening (.github/workflows/train-via-kaggle-to-hf.yml) Make CI training/publish flow robust: stable auth handling, unique kernel slugs, resilient status polling, and clearer diagnostics. Trigger workflow from GitHub Actions with version, namespace, train hyperparameters. Workflow reaches model publish step and uploads {namespace}/TinyModel{version}.
Phase 3: ONNX, bench, API (scripts/phase3_*.py, texts/phase3-serving-profile.md) Export to ONNX, verify parity, CPU latency report, reference HTTP API. Install once (pip install -r optional-requirements-phase3.txt); then in separate shell commands, run python scripts/phase3_export_onnx.py --model <dir>, python scripts/phase3_onnx_parity.py, python scripts/phase3_benchmark.py (see Phase 3 section). Never append then python to the same line as pip install. onnx/*.onnx present; benchmark under artifacts/phase3/reports/; parity exits 0.

2) Using the Hub model and Space

Load the published model by id (no local files required):

python -c "from transformers import pipeline; p=pipeline('text-classification', model='HyperlinksSpace/TinyModel1', tokenizer='HyperlinksSpace/TinyModel1'); print(p('Stocks rallied after central bank comments', top_k=None))"

Use the general-purpose runtime helpers (classification + embeddings + semantic search):

from scripts.tinymodel_runtime import TinyModelRuntime

rt = TinyModelRuntime("HyperlinksSpace/TinyModel1")

# 1) Classification
print(rt.classify(["Oil prices fell after a demand forecast update."])[0])

# 2) Embeddings (shape: [batch, hidden_size])
emb = rt.embed(
    [
        "The team won the cup final in extra time.",
        "Central bank policy affected bond yields.",
    ]
)
print(emb.shape)

# 3) Pairwise semantic similarity
score = rt.similarity(
    "Stocks rose after inflation cooled.",
    "Markets rallied as price growth slowed.",
)
print("similarity:", round(score, 4))

# 4) Retrieval: nearest texts to a query
hits = rt.retrieve(
    "Chipmaker launches a new AI processor.",
    [
        "Parliament debated tax policy in the capital.",
        "Semiconductor company unveils next-gen accelerator.",
        "Team signs striker before the derby.",
    ],
    top_k=2,
)
for h in hits:
    print(h.index, round(h.score, 4), h.text)

TinyModelRuntime function outputs

Function Return type Output values
classify(texts) list[dict[str, float]] One dict per input text. Keys are label names from model.config.id2label; values are probabilities in [0, 1] that sum to ~1.0 for each text.
embed(texts, normalize=True) torch.Tensor Shape [batch_size, hidden_size] (default TinyModel hidden size is 128). If normalize=True, each row is L2-normalized (vector norm ~1.0).
similarity(text_a, text_b) float Cosine similarity between the two embeddings. Typical range is [-1, 1]: higher means more semantically similar under this model.
retrieve(query, candidates, top_k=3) list[RetrievalHit] Ranked top matches. Each item has: index (position in candidates), text (candidate string), score (cosine similarity; higher is closer). Length is min(top_k, len(candidates)).

Or open the demo: direct app · on the Hub.

Quick checks:

  • Space loads; inference returns labels and scores; no errors in Space logs.

3) GitHub Actions workflows

Workflow definitions live under .github/workflows/. Trigger them from Actions → select the workflow → Run workflow. Runners use ubuntu-latest unless you change the workflow.

Repository secrets (Settings → Secrets and variables → Actions)

Configure these once per repository (or organization). They are not committed to git.

Secret Used by Purpose
HF_TOKEN Workflows below Hugging Face access token with write permission to create/update models and Spaces in the target namespace.
KAGGLE_USERNAME train-via-kaggle-to-hf.yml only Your Kaggle username (same value as in Kaggle Account → API).
KAGGLE_KEY train-via-kaggle-to-hf.yml only Kaggle API key from AccountCreate New API Token.

No other GitHub secrets are read by these workflows. Internal step outputs (GITHUB_ENV) such as KAGGLE_OWNER / KAGGLE_KERNEL_SLUG are set automatically during the Kaggle run.

Core flows (validated on the GitHub Actions free tier)

Workflow File
PR smoke: Phase 1 matrix (scratch, small caps) phase1-smoke.yml
PR smoke: Phase 3 (train tiny → ONNX → parity → bench) phase3-smoke.yml
Deploy versioned Space to Hugging Face deploy-hf-space-versioned.yml
Train on Hugging Face Jobs and publish versioned model train-hf-job-versioned.yml
  • deploy-hf-space-versioned.yml — Builds the Gradio Space with scripts/build_space_artifact.py and uploads {namespace}/TinyModel{version}Space.

    • Secrets: HF_TOKEN.
    • Workflow inputs: version, namespace, model_id (for example HyperlinksSpace/TinyModel1).
  • train-hf-job-versioned.yml — Submits training on Hugging Face Jobs, then publishes {namespace}/TinyModel{version}.

    • Secrets: HF_TOKEN (also passed into the remote job so it can run publish_hf_artifact.py).
    • Workflow inputs: version, namespace, optional commit_sha (empty = current workflow SHA), flavor, timeout, max_train_samples, max_eval_samples, epochs, batch_size, learning_rate.
    • If Hugging Face returns 402 Payment Required for Jobs, add billing/credits on your HF account or train locally and publish with scripts/publish_hf_artifact.py (see texts/HUGGING_FACE_DEPLOYMENT_INTERNAL.md).

Optional: train via Kaggle

Workflow File
Train via Kaggle and publish to Hugging Face train-via-kaggle-to-hf.yml
  • train-via-kaggle-to-hf.yml — Creates a Kaggle kernel run, trains, downloads outputs, and pushes {namespace}/TinyModel{version} to the Hub.
    • Secrets: KAGGLE_USERNAME, KAGGLE_KEY, and HF_TOKEN (for upload to Hugging Face).
    • Workflow inputs: version, namespace, max_train_samples, max_eval_samples, epochs, batch_size, learning_rate.
    • External quota: Kaggle GPU/CPU weekly limits and any Kaggle compute credits your account uses; not covered by GitHub Actions alone.

4) Further development

Illustrative directions for evolving the TinyModel line (pick what matches your product goals):

  • Accuracy and capacity — Train on more AG News samples or epochs; adjust the tiny BERT config (depth, width, sequence length); add LR schedules, warmup, or regularization suited to your budget.
  • Domains and label sets — Fine-tune on proprietary or niche corpora; replace the four AG News classes with your own taxonomy and a labeled dataset.
  • Shipping inference — Document ONNX or quantized exports for edge and serverless; add batch-inference examples; optionally wire a Hugging Face Inference Endpoint for a stable HTTP API.
  • Space and API UX — Batch inputs, per-class thresholds, richer examples, or client snippets (Python and JavaScript) for integrators.
  • Evaluation discipline — Fixed test split, confusion matrix, calibration, and versioned eval reports alongside artifact.json.
  • Repository hygiene — Lightweight CI (lint, script smoke tests) that never pulls large weights; optional Hub Collections or docs that link model, Space, and release notes.

Nothing here is committed on a fixed timeline; treat it as a backlog of sensible next steps for a small text understanding stack.

5) Further development plan: what was added and how to exit-check

The living plan is in texts/further-development-plan.md. Recent updates there:

  • Exit steps (verification) for Phase 1–3, optional R&D, and each decision gate (concrete commands, exit status 0, artifacts).
  • Phase 2 routing: texts/phase2-routing-threshold-scenario.md.
  • Phase 3 (done in repo): ONNX export, parity, CPU benchmark, reference API, serving doc — see Phase 3 in this README and texts/phase3-serving-profile.md. CI: .github/workflows/phase3-smoke.yml.
  • Optional R&D backlog: texts/optional-rd-backlog.md.
  • Plan status and What is left (if any) at the end of the plan file (mostly optional follow-ups).

Quick Phase 1 exit check (local, matches CI):

python scripts/phase1_compare.py \
  --preset smoke \
  --models scratch \
  --datasets ag_news,emotion \
  --seed 42
echo $?
# Expect: 0; reports under artifacts/phase1/reports/phase1_smoke_seed42.*

For the up-to-date list of optional or future work, see “What is left (if any)” at the end of the same plan file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages