TinyModel is a practical starter model line for text classification.
End users consume deployed Hugging Face model and Space endpoints. Maintainer deployment policy lives in texts/HUGGING_FACE_DEPLOYMENT_INTERNAL.md.
Repository: HyperlinksSpace/TinyModel
TinyModel1 on Hugging Face
- Model weights & model card — HyperlinksSpace/TinyModel1: Safetensors, tokenizer, and
README.mdon the Hugging Face Hub (load withtransformersor the Inference API where available). - Space project (Hub) — HyperlinksSpace/TinyModel1Space: Space repository (app code, build logs, settings, community).
- Live Gradio app — Direct app URL: https://hyperlinksspace-tinymodel1space.hf.space · Same app on the Hub: huggingface.co/spaces/HyperlinksSpace/TinyModel1Space. If the direct link fails or is blocked, use the Hub link or see Availability in Russia below.
Availability in Russia
Some features may not work reliably from Russia—for example live preview or other flows that depend on third-party hosts or regions that are blocked or throttled. If you hit that, you can try third-party tools such as the free tier of 1VPN (browser extension or app), or Happ (paid subscription). One place people buy Happ subscriptions is this Telegram bot. These are all third-party services; use at your own discretion and follow applicable laws.
Model card (README) — On the Hub, the model card is the README.md file at the root of the model repo (same URL as the model). In this repository, the template is implemented by write_model_card() in scripts/train_tinymodel1_classifier.py; training writes README.md, artifact.json, and eval_report.json next to the weights. We do not run CI that downloads full model weights into the repo or runner caches for republish; update the card by retraining and publishing, or edit README.md on the Hub and keep weights unchanged.
Train locally after cloning the repo:
python scripts/train_tinymodel1_agnews.py --output-dir .tmp/TinyModel-localQuick local inference sanity check:
python -c "from transformers import pipeline; p=pipeline('text-classification', model='.tmp/TinyModel-local', tokenizer='.tmp/TinyModel-local'); print(p('Stocks rallied after central bank comments', top_k=None))"scripts/phase1_compare.py standardizes run profiles and prevents ad-hoc parameter drift.
It executes matching-seed runs and writes a comparison matrix with accuracy, macro_f1,
and per-class F1 for each run.
Presets:
smoke: quickest reproducibility/health check (120/80,1 epoch)dev: day-to-day iteration (1000/300,2 epochs)full: heavier baseline (6000/1200,3 epochs)
Run full Phase 1 baseline comparison (scratch vs pretrained) on both AG News and Emotion:
python scripts/phase1_compare.py --preset smoke --seed 42Outputs:
artifacts/phase1/runs/<preset>/<dataset>/<model>/...(model artifacts per run)artifacts/phase1/reports/phase1_<preset>_seed<seed>.md(human-readable table)artifacts/phase1/reports/phase1_<preset>_seed<seed>.csv(spreadsheet-friendly)artifacts/phase1/reports/phase1_<preset>_seed<seed>.json(machine-readable)
CI smoke check (no heavy pretrained download by default):
python scripts/phase1_compare.py \
--preset smoke \
--models scratch \
--datasets ag_news,emotion \
--seed 42This same default check is wired in .github/workflows/phase1-smoke.yml.
Training and pretrained fine-tuning now emit richer evaluation artifacts so reports support decisions beyond headline accuracy.
| Artifact | What it contains |
|---|---|
eval_report.json |
Existing reproducibility + metrics, plus dataset_quality.class_distribution (train/eval counts and proportions per label on the capped subsets), error_analysis.top_confusions (largest off-diagonal confusion pairs), calibration.max_prob_histogram (bins over the winner softmax probability per eval example), and routing (documented fallback behavior for low-confidence routing; thresholds are not fixed by training). |
misclassified_sample.jsonl |
Up to --max-misclassified-examples wrong predictions with text, true_label, predicted_label, max_prob (one JSON object per line). Use 0 to skip writing the file content beyond an empty run. |
Routing threshold example (Phase 2 exit): a worked min_confidence + fallback policy for triage is documented in texts/phase2-routing-threshold-scenario.md (tune on your own validation data).
CLI knobs (scratch and finetune_pretrained_classifier.py):
--max-misclassified-examples(default100)--confidence-histogram-bins(default10)--top-confusions(default20)
Third reference dataset (SST-2) — binary sentiment on GLUE, useful as an additional domain check:
python scripts/train_tinymodel1_sst2.py \
--output-dir .tmp/TinyModel-sst2 \
--max-train-samples 500 \
--max-eval-samples 200 \
--epochs 1 \
--batch-size 8 \
--seed 42Quick Phase 2 smoke (AG News, small caps):
python scripts/train_tinymodel1_classifier.py \
--output-dir .tmp/phase2-smoke \
--max-train-samples 64 \
--max-eval-samples 32 \
--epochs 1 \
--batch-size 8 \
--seed 42 \
--max-misclassified-examples 20Then inspect .tmp/phase2-smoke/eval_report.json (new sections) and .tmp/phase2-smoke/misclassified_sample.jsonl.
Expected local output folder:
.tmp/TinyModel-local/model.safetensors.tmp/TinyModel-local/config.json.tmp/TinyModel-local/tokenizer.json.tmp/TinyModel-local/README.md.tmp/TinyModel-local/artifact.json.tmp/TinyModel-local/eval_report.json— evaluation metrics, confusion matrix, reproducibility, and Phase 2 fields (class distribution, top confusions, calibration histogram, routing notes).tmp/TinyModel-local/misclassified_sample.jsonl— optional sample of errors for review (see Phase 2 section)
Optional dependencies: optional-requirements-phase3.txt (ONNX, ONNX Runtime, onnxscript for export, fastapi/uvicorn for the reference server). PyTorch 2.6+ uses torch.onnx.export(..., dynamo=True).
-
Export — from a training output directory or Hub id:
python scripts/phase3_export_onnx.py --model artifacts/phase1/runs/smoke/ag_news/scratch # or: --model HyperlinksSpace/TinyModel1On Windows Git Bash, do not use a Unix-style placeholder like
/path/to/checkpoint— the shell rewrites it underC:/Program Files/Git/.... Use a relative path from the repo or ac:/...path.Writes
onnx/classifier.onnx(logits) andonnx/encoder.onnx(pooled token for embeddings). The default dynamo path traces at batch size 1; use tokenizer padding tomax_seq_length(e.g. 128) to match. Optional--dynamic-quantizeattempts INT8 sidecars (may be skipped on some graphs). -
Parity (PyTorch vs ONNX Runtime):
python scripts/phase3_onnx_parity.py --model artifacts/phase1/runs/smoke/ag_news/scratch
-
CPU benchmark report (PyTorch
TinyModelRuntimevs ORT, classify / embed / retrieve patterns):python scripts/phase3_benchmark.py --model artifacts/phase1/runs/smoke/ag_news/scratch --compare-model .tmp/phase3-smoke
Artifacts:
artifacts/phase3/reports/benchmark_<name>.{json,md}. (Example report may be present under that folder after a run.) -
Serving contract + minimal API —
texts/phase3-serving-profile.md(GET /healthz,POST /v1/classify,POST /v1/retrieve). Reference process:pip install -r optional-requirements-phase3.txt python scripts/phase3_reference_server.py --model HyperlinksSpace/TinyModel1
-
CI —
.github/workflows/phase3-smoke.ymltrains a tiny model, exports ONNX, runs parity, and writes a benchmark underartifacts/phase3/reports/.
Optional R&D spike ideas (not part of the release path) — see texts/optional-rd-backlog.md.
This is the A–C tranche from texts/further-development-universe-brain.md (baseline closure, multi-dataset eval breadth, minimal FAQ-style retrieval). Full commands, what gets written, and how to test manually: texts/horizon1-short-term-handbook.md.
| Block | What you run | Why it helps |
|---|---|---|
| A — Verify | Two commands (do not put then on the pip line): pip install -r optional-requirements-phase3.txt then a new line: python scripts/horizon1_verify_short_term_a.py. Or: pip install -r optional-requirements-phase3.txt && python scripts/horizon1_verify_short_term_a.py (Git Bash / PowerShell 7+). Add --skip-phase3 to skip ONNX. |
Proves Phases 1–2 plus export/parity/benchmark in one local pass, aligned with phase1-smoke / phase3-smoke CI. |
| B — Three tasks | python scripts/horizon1_three_datasets.py (use --offline-datasets if Hugging Face download times out but data is already cached) |
AG News, Emotion, and SST-2 with shared caps; summary table: texts/horizon1-three-tasks-summary.md. Weights go under artifacts/horizon1/three-tasks/ (gitignored; commit the texts/ summary). |
| C — RAG smoke | python scripts/rag_faq_smoke.py (optional --model; defaults to a local checkpoint if present, else HyperlinksSpace/TinyModel1 on the Hub) |
Hybrid lexical + TinyModelRuntime retrieval over texts/rag_faq_corpus.md; template for support/FAQ products. |
What it is: a local transformers path that turns text into new text: summarize, reformulate, and grounded (RAG context + answer) — aligned with the “Generative core” line in texts/further-development-universe-brain.md. It does not replace your classifier; it complements Horizon 1 (retrieval) and your Phase 1–3 stack.
| Piece | What you run | Why it helps |
|---|---|---|
| Install | pip install -r optional-requirements-horizon2.txt (plus your existing torch) |
Picks up transformers / accelerate for AutoModelForCausalLM. |
| Smoke verify | python scripts/horizon2_generative.py --verify |
One greedy generation with sshleifer/tiny-gpt2 → proves downloads + wiring (not demo quality). |
| Real run | python scripts/horizon2_generative.py or set HORIZON2_MODEL=HuggingFaceTB/SmolLM2-360M-Instruct |
Writes horizon2 JSON under .tmp/horizon2/last_run.json with per-sample latency and token counts for cost and tier planning. |
| Side-by-side | add --compare-with <other-hf-id> |
Same inputs, two model outputs in one JSON (Horizon 2 exit shaped like “A/B on domain tasks”). |
| + RAG | --task grounded --context-file <chunk> (or --context "...") |
Pairs FAQ / retrieval with generation. |
| HTTP | pip install -r optional-requirements-phase3.txt then python scripts/horizon2_server.py --smoke |
GET / lists routes; Swagger UI: http://127.0.0.1:8766/docs — POST /v1/generate (same product pattern as the Phase 3 reference server). |
Benefits (product / engineering):
- Drafts and summaries on top of the same org data and policies you already use for classification.
- One JSON contract per run (
horizon2_generative_run/1.0) for dashboards and regression checks (seetexts/horizon2-handbook.md). - Tier awareness: smoke vs. default Instruct vs. your own API — documented in the handbook; latencies are recorded in the artifact.
CI: .github/workflows/horizon2-smoke.yml runs --verify on pushes to main (requires Hub access in GitHub’s network; local verify is the fallback).
What it is: a local SQLite memory layer for org/user scope_key, with session vs long_term rows, optional TTL + prune, an audit log, export (access-shaped JSON), and forget-scope (delete all data for a scope). See texts/further-development-universe-brain.md — Persistent mind.
| Piece | What you run | Why it helps |
|---|---|---|
| Self-test | python scripts/horizon3_memory_cli.py --verify |
No network; validates CRUD, export, session clear, TTL prune, forget. |
| Daily use | `python scripts/horizon3_memory_cli.py put | get |
| Optional HTTP | pip install -r optional-requirements-phase3.txt then python scripts/horizon3_memory_api.py |
http://127.0.0.1:8767/docs — put / list / export / forget (default port 8767; set HORIZON3_DB). |
Benefits
- Product: carry continuity across sessions (long-term) while dropping chat noise (session clear) or expiring junk (TTL + prune).
- Governance: audit trail for creates/updates/deletes; export supports access requests; forget-scope supports erasure for a tenant id (you still own legal review and scope design).
- Engineering: stdlib-only store and CLI — no new pip packages for the core; optional FastAPI matches Phase 3 patterns.
Manual test recipe: texts/horizon3-handbook.md.
CI: .github/workflows/horizon3-smoke.yml runs horizon3_memory_cli.py --verify (offline).
What it is: a CLIP-style path (Hugging Face transformers: image + caption → one alignment logit) for “does this picture go with this text?” — a narrow slice of multimodal grounding from texts/further-development-universe-brain.md. Audio and automated moderation are not in this script; add them in product layers.
| Piece | What you run | Why it helps |
|---|---|---|
| Install | pip install -r optional-requirements-horizon4.txt (and torch + transformers for real Hub models) |
Adds Pillow; reuses the same PyTorch stack as the rest of the repo. |
| CI / offline verify | python scripts/horizon4_multimodal.py --verify |
No Hub download — random CLIPConfig + CLIPModel forward; proves the wiring. On Windows this uses a subprocess and OpenMP env defaults to avoid native crashes; if PyTorch still fails, see the handbook. |
| Pretrained check | python scripts/horizon4_multimodal.py --verify-pretrained |
Loads HORIZON4_CLIP_MODEL (default openai/clip-vit-base-patch32) if cached/online. |
| Real photo + text | python scripts/horizon4_multimodal.py --image <file> --text "<caption>" |
JSON under .tmp/horizon4/last_run.json with logit_image_text for triage, QA, or internal benchmarks. |
Benefits: one concrete image–text score next to your text-only classifiers; governance still needs human/review for abuse; smoke stays offline and fast in CI.
Manual steps: texts/horizon4-handbook.md
CI: .github/workflows/horizon4-smoke.yml runs horizon4_multimodal.py --verify (no network).
What it is: a thin smoke orchestrator from texts/further-development-universe-brain.md (Converged stack)—one command runs the existing generative, memory, and CLIP smokes in order and writes one JSON file (horizon6_converged_run/1.0).
| Piece | What you run | Why it helps |
|---|---|---|
| Install | pip install torch (CPU is fine) + pip install -r optional-requirements-horizon2.txt + pip install -r optional-requirements-horizon4.txt |
H2 and H4 share a transformers stack; H3 stays stdlib-only. |
| Converged verify | python scripts/horizon6_converged_smoke.py --verify |
Chains: horizon2_generative.py --verify → horizon3_memory_cli.py --verify → horizon4_multimodal.py --verify. Output: .tmp/horizon6-converge/run.json. |
| Optional RAG | same command with --with-rag |
Also runs rag_faq_smoke.py (needs a trained config.json dir or Hub download; can fail in air-gapped envs). |
What is still not H6 (full exit): a single production runtime and router, one auth/tenant story, and a real incident runbook—this repo only proves component smokes in sequence.
How to test (local): install deps as above, then python scripts/horizon6_converged_smoke.py --verify. Expect exit 0 and ok: true in the JSON; H2 may hit the Hub once for sshleifer/tiny-gpt2 if not cached. Faster one-offs: run each horizon’s --verify alone (see Horizon 2–4 sections above).
CI: .github/workflows/horizon6-smoke.yml runs the same command on CPU in GitHub Actions.
What it is: a stdlib-only check that two separate SQLite files (two “tenants”) do not share memory rows or exports, using the same Horizon 3 store as the rest of the repo. This is a toy for H7 isolation from texts/further-development-universe-brain.md—not legal/compliance by itself.
| Piece | What you run | Why it helps |
|---|---|---|
| Self-test | python scripts/horizon7_assured_smoke.py --verify |
No torch; output .tmp/horizon7-assured/run.json (horizon7_assured_run/1.0) with per-check ok flags. |
What is still not H7 (full exit): repeatable tenant onboarding, regulatory evidence packs, external audit, SLAs, quotas—treat the script as a developer check only.
How to test (local): python scripts/horizon7_assured_smoke.py --verify — should print horizon7 verify: OK and write JSON with all checks ok: true.
CI: .github/workflows/horizon7-smoke.yml runs the same command (no extra pip deps).
What it is: a single JSON “build + health” snapshot from texts/further-development-universe-brain.md (Observability & probe bundle)—Python/platform, optional git short SHA, and a real run of Horizon 7’s verify as a dependency probe.
| Piece | What you run | Why it helps |
|---|---|---|
| Probe verify | python scripts/horizon8_observability_probe.py --verify |
Writes .tmp/horizon8-probe/run.json (horizon8_probe_run/1.0). No torch; needs git only for git_rev when available. |
What is still not H8 (full exit): SLOs, alerting, streaming metrics, and dashboards—this is a file-shaped probe for CI and manual triage.
How to test (local): python scripts/horizon8_observability_probe.py --verify — expect horizon8 verify: OK and ok: true with a probes list.
CI: .github/workflows/horizon8-smoke.yml.
What it is: a versioned sample policy (texts/horizon9_policy_sample.json) and a smoke that checks deny-over-allow precedence and default deny—from the Declarative policy & capability gates horizon in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Policy verify | python scripts/horizon9_policy_smoke.py --verify |
Writes .tmp/horizon9-policy/run.json (horizon9_policy_run/1.0). Optional --policy path.json. Stdlib only. |
What is still not H9 (full exit): AuthN, OPA, signed policy, dynamic attributes, audit of policy edits in production.
How to test (local): python scripts/horizon9_policy_smoke.py --verify — expect horizon9 verify: OK and all case rows ok: true.
CI: .github/workflows/horizon9-smoke.yml.
What it is: a sample budget (texts/horizon10_budget_sample.json) and smoke that accumulates abstract units per action until deny—see Resource & cost envelopes in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Budget verify | python scripts/horizon10_budget_smoke.py --verify |
Writes .tmp/horizon10-budget/run.json (horizon10_budget_run/1.0). Stdlib only. |
What is still not H10 (full exit): live metering, distributed quotas, billing reconciliation.
How to test (local): python scripts/horizon10_budget_smoke.py --verify.
CI: .github/workflows/horizon10-smoke.yml.
What it is: validated newline-delimited JSON for label corrections (horizon11_feedback_record/1.0)—see Human outcome capture in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Feedback verify | python scripts/horizon11_feedback_smoke.py --verify |
Writes .tmp/horizon11-feedback/ (sample_feedback.jsonl + run.json, horizon11_feedback_run/1.0). Stdlib only. |
What is still not H11 (full exit): secure pipelines, PII policy, automated retraining.
How to test (local): python scripts/horizon11_feedback_smoke.py --verify.
CI: .github/workflows/horizon11-smoke.yml.
What it is: Integrity fingerprints for committed sample policies/budgets—see Provenance & integrity manifest in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Provenance verify | python scripts/horizon12_provenance_smoke.py --verify |
Writes .tmp/horizon12-provenance/run.json (horizon12_provenance_run/1.0). Stdlib only. |
What is still not H12 (full exit): signing, timestamp authorities, in-toto/Sigstore.
How to test (local): python scripts/horizon12_provenance_smoke.py --verify.
CI: .github/workflows/horizon12-smoke.yml.
What it is: a state-machine exercise for OPEN / HALF_OPEN / CLOSED around a failing upstream—see Resilience: circuit breaker in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Circuit verify | python scripts/horizon13_circuit_smoke.py --verify |
Writes .tmp/horizon13-circuit/run.json (horizon13_circuit_run/1.0). Stdlib only. |
What is still not H13 (full exit): async middleware, distributed coordination, production metrics.
How to test (local): python scripts/horizon13_circuit_smoke.py --verify.
CI: .github/workflows/horizon13-smoke.yml.
What it is: a linear inference DAG plus cycle detection and parallel-root sanity checks—see Orchestrated workflows in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| DAG verify | python scripts/horizon14_workflow_smoke.py --verify |
Writes .tmp/horizon14-workflow/run.json (horizon14_workflow_run/1.0). Stdlib only. |
What is still not H14 (full exit): retries, sagas, production orchestrator integration.
How to test (local): python scripts/horizon14_workflow_smoke.py --verify.
CI: .github/workflows/horizon14-smoke.yml.
What it is: texts/horizon15_export_envelope_sample.json defines allowed keys per export kind; the smoke rejects extra fields—see Data minimization & export envelopes in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Envelope verify | python scripts/horizon15_export_smoke.py --verify |
Writes .tmp/horizon15-export/run.json (horizon15_export_run/1.0). Stdlib only. |
What is still not H15 (full exit): encryption, legal sign-off, automated redaction pipelines.
How to test (local): python scripts/horizon15_export_smoke.py --verify.
CI: .github/workflows/horizon15-smoke.yml.
What it is: a manifest of declared artifact versions vs a minimum reader — see Compatibility & versioning in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Semver verify | python scripts/horizon16_semver_smoke.py --verify |
Writes .tmp/horizon16-semver/run.json (horizon16_semver_run/1.0). Stdlib only (numeric x.y.z). |
What is still not H16 (full exit): full PEP 440, automated matrix across all consumers.
How to test (local): python scripts/horizon16_semver_smoke.py --verify.
CI: .github/workflows/horizon16-smoke.yml.
What it is: a deterministic ladder from a 0–100 health score to FULL / DEGRADED / MINIMAL / OFFLINE — see Graceful degradation in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Degrade verify | python scripts/horizon17_degrade_smoke.py --verify |
Writes .tmp/horizon17-degrade/run.json (horizon17_degrade_run/1.0). Stdlib only. |
What is still not H17 (full exit): wired probes, status pages, per-SKU docs.
How to test (local): python scripts/horizon17_degrade_smoke.py --verify.
CI: .github/workflows/horizon17-smoke.yml.
What it is: launch / game-day gates from structured phases and checks — see Operational readiness in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Readiness verify | python scripts/horizon18_readiness_smoke.py --verify |
Loads texts/horizon18_readiness_sample.json; writes .tmp/horizon18-readiness/run.json (horizon18_readiness_run/1.0). Stdlib only. |
What is still not H18 (full exit): CI wiring for every check, paging, rehearsal calendars.
How to test (local): python scripts/horizon18_readiness_smoke.py --verify.
CI: .github/workflows/horizon18-smoke.yml.
What it is: a linear SHA-256 chain over synthetic audit events — see Tamper-evident audit trail in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Chain verify | python scripts/horizon19_audit_chain_smoke.py --verify |
Writes .tmp/horizon19-audit-chain/run.json (horizon19_audit_chain_run/1.0). Stdlib only. |
What is still not H19 (full exit): signing keys, Merkle batches, WORM storage.
How to test (local): python scripts/horizon19_audit_chain_smoke.py --verify.
CI: .github/workflows/horizon19-smoke.yml.
What it is: hash bucket vs rollout_percent for staged releases — see Feature flags & staged rollout in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Flags verify | python scripts/horizon20_flags_smoke.py --verify |
Loads texts/horizon20_flags_sample.json; writes .tmp/horizon20-flags/run.json (horizon20_flags_run/1.0). Stdlib only. |
What is still not H20 (full exit): hosted flag services, experiments UI, audit exports.
How to test (local): python scripts/horizon20_flags_smoke.py --verify.
CI: .github/workflows/horizon20-smoke.yml.
What it is: category TTL vs synthetic records at a fixed as_of date — see Data retention & purge eligibility in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Retention verify | python scripts/horizon21_retention_smoke.py --verify |
Loads texts/horizon21_retention_sample.json; writes .tmp/horizon21-retention/run.json (horizon21_retention_run/1.0). Stdlib only. |
What is still not H21 (full exit): legal holds, distributed backups, scheduler integration.
How to test (local): python scripts/horizon21_retention_smoke.py --verify.
CI: .github/workflows/horizon21-smoke.yml.
What it is: discrete tick / consume simulation with golden allows — see Rate limiting & fairness in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Bucket verify | python scripts/horizon22_token_bucket_smoke.py --verify |
Loads texts/horizon22_token_bucket_sample.json; writes .tmp/horizon22-token-bucket/run.json (horizon22_token_bucket_run/1.0). Stdlib only. |
What is still not H22 (full exit): wall-clock refill, distributed quotas, per-route limits.
How to test (local): python scripts/horizon22_token_bucket_smoke.py --verify.
CI: .github/workflows/horizon22-smoke.yml.
What it is: failure propagation to a fixed point over depends_on edges — see Blast radius in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Blast verify | python scripts/horizon23_blast_radius_smoke.py --verify |
Loads texts/horizon23_blast_sample.json; writes .tmp/horizon23-blast-radius/run.json (horizon23_blast_radius_run/1.0). Stdlib only. |
What is still not H23 (full exit): redundancy paths, partial failures, regional graphs.
How to test (local): python scripts/horizon23_blast_radius_smoke.py --verify.
CI: .github/workflows/horizon23-smoke.yml.
What it is: bounded metric regression before promoting a candidate — see Canary promotion & regression gates in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Canary verify | python scripts/horizon24_canary_gate_smoke.py --verify |
Loads texts/horizon24_canary_gate_sample.json; writes .tmp/horizon24-canary-gate/run.json (horizon24_canary_gate_run/1.0). Stdlib only. |
What is still not H24 (full exit): shadow traffic wiring, CI ingestion of benchmark JSON from runners.
How to test (local): python scripts/horizon24_canary_gate_smoke.py --verify.
CI: .github/workflows/horizon24-smoke.yml.
What it is: first healthy region from an ordered list — see Regional failover & traffic steering in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Failover verify | python scripts/horizon25_failover_smoke.py --verify |
Loads texts/horizon25_failover_sample.json; writes .tmp/horizon25-failover/run.json (horizon25_failover_run/1.0). Stdlib only. |
What is still not H25 (full exit): latency-aware steering, residency constraints, sticky routing.
How to test (local): python scripts/horizon25_failover_smoke.py --verify.
CI: .github/workflows/horizon25-smoke.yml.
What it is: compare errors_observed to ⌊window × failure_budget_pct⌋ — see SLO error budget in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Budget verify | python scripts/horizon26_error_budget_smoke.py --verify |
Loads texts/horizon26_error_budget_sample.json; writes .tmp/horizon26-error-budget/run.json (horizon26_error_budget_run/1.0). Stdlib only. |
What is still not H26 (full exit): composite SLIs, burn-rate alerting, paging integrations.
How to test (local): python scripts/horizon26_error_budget_smoke.py --verify.
CI: .github/workflows/horizon26-smoke.yml.
What it is: case-insensitive substring rules — see Prompt injection resistance in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Gate verify | python scripts/horizon27_prompt_gate_smoke.py --verify |
Loads texts/horizon27_prompt_gate_sample.json; writes .tmp/horizon27-prompt-gate/run.json (horizon27_prompt_gate_run/1.0). Stdlib only. |
What is still not H27 (full exit): ML moderation, multilingual tuning, tokenizer-aware scans.
How to test (local): python scripts/horizon27_prompt_gate_smoke.py --verify.
CI: .github/workflows/horizon27-smoke.yml.
What it is: ordered stream with first-seen keys — see Idempotent side-effects in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Idempotency verify | python scripts/horizon28_idempotency_smoke.py --verify |
Loads texts/horizon28_idempotency_sample.json; writes .tmp/horizon28-idempotency/run.json (horizon28_idempotency_run/1.0). Stdlib only. |
What is still not H28 (full exit): durable stores, conflict detection on mismatched payloads.
How to test (local): python scripts/horizon28_idempotency_smoke.py --verify.
CI: .github/workflows/horizon28-smoke.yml.
What it is: [min, max_exclusive) numeric semver tuples — see Supply chain bounds in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| SBOM verify | python scripts/horizon29_sbom_bounds_smoke.py --verify |
Loads texts/horizon29_sbom_bounds_sample.json; writes .tmp/horizon29-sbom-bounds/run.json (horizon29_sbom_bounds_run/1.0). Stdlib only. |
What is still not H29 (full exit): PEP 440, prereleases, signed attestations.
How to test (local): python scripts/horizon29_sbom_bounds_smoke.py --verify.
CI: .github/workflows/horizon29-smoke.yml.
What it is: active leases vs wall-clock check_at — see Distributed coordination (leases) in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Lease verify | python scripts/horizon30_lease_smoke.py --verify |
Loads texts/horizon30_lease_sample.json; writes .tmp/horizon30-lease/run.json (horizon30_lease_run/1.0). Stdlib only. |
What is still not H30 (full exit): fencing tokens, renewal loops, skew budgets.
How to test (local): python scripts/horizon30_lease_smoke.py --verify.
CI: .github/workflows/horizon30-smoke.yml.
What it is: batch distinct-count caps — see Cardinality & observability budgets in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Cardinality verify | python scripts/horizon31_cardinality_smoke.py --verify |
Loads texts/horizon31_cardinality_sample.json; writes .tmp/horizon31-cardinality/run.json (horizon31_cardinality_run/1.0). Stdlib only. |
What is still not H31 (full exit): approximate sketches, streaming windows, tenant isolation.
How to test (local): python scripts/horizon31_cardinality_smoke.py --verify.
CI: .github/workflows/horizon31-smoke.yml.
What it is: lag_units vs max_lag_allowed — see Streaming backlog & consumer lag in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Lag verify | python scripts/horizon32_consumer_lag_smoke.py --verify |
Loads texts/horizon32_consumer_lag_sample.json; writes .tmp/horizon32-consumer-lag/run.json (horizon32_consumer_lag_run/1.0). Stdlib only. |
What is still not H32 (full exit): Kafka-specific semantics, consumer groups, checkpoint protocols.
How to test (local): python scripts/horizon32_consumer_lag_smoke.py --verify.
CI: .github/workflows/horizon32-smoke.yml.
What it is: explicit allowed_pairs for (legal_basis, processing_purpose) — see Purpose limitation in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Purpose verify | python scripts/horizon33_purpose_matrix_smoke.py --verify |
Loads texts/horizon33_purpose_matrix_sample.json; writes .tmp/horizon33-purpose-matrix/run.json (horizon33_purpose_matrix_run/1.0). Stdlib only. |
What is still not H33 (full exit): legal review, DPIAs, transfers, sector rules.
How to test (local): python scripts/horizon33_purpose_matrix_smoke.py --verify.
CI: .github/workflows/horizon33-smoke.yml.
What it is: votes_yes × 2 > replicas_total — see Distributed quorum in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Quorum verify | python scripts/horizon34_quorum_smoke.py --verify |
Loads texts/horizon34_quorum_sample.json; writes .tmp/horizon34-quorum/run.json (horizon34_quorum_run/1.0). Stdlib only. |
What is still not H34 (full exit): Byzantine quorum math, weighted voters, raft semantics.
How to test (local): python scripts/horizon34_quorum_smoke.py --verify.
CI: .github/workflows/horizon34-smoke.yml.
What it is: allow-list algorithm plus key_bits_min — see Cryptographic suite policy in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Crypto verify | python scripts/horizon35_crypto_suite_smoke.py --verify |
Loads texts/horizon35_crypto_suite_sample.json; writes .tmp/horizon35-crypto-suite/run.json (horizon35_crypto_suite_run/1.0). Stdlib only. |
What is still not H35 (full exit): TLS negotiation order, PQ hybrids, HSM attestation.
How to test (local): python scripts/horizon35_crypto_suite_smoke.py --verify.
CI: .github/workflows/horizon35-smoke.yml.
What it is: frozen inside [start, end) intervals — see Maintenance freeze windows in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Freeze verify | python scripts/horizon36_maintenance_freeze_smoke.py --verify |
Loads texts/horizon36_maintenance_freeze_sample.json; writes .tmp/horizon36-maintenance-freeze/run.json (horizon36_maintenance_freeze_run/1.0). Stdlib only. |
What is still not H36 (full exit): RRULE calendars, regional tz overlays, break-glass audit hooks.
How to test (local): python scripts/horizon36_maintenance_freeze_smoke.py --verify.
CI: .github/workflows/horizon36-smoke.yml.
What it is: cap distinct pairs across dim_a × dim_b — see Pair cardinality in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Pair cardinality verify | python scripts/horizon37_pair_cardinality_smoke.py --verify |
Loads texts/horizon37_pair_cardinality_sample.json; writes .tmp/horizon37-pair-cardinality/run.json (horizon37_pair_cardinality_run/1.0). Stdlib only. |
What is still not H37 (full exit): three-way tuples, streaming sketches.
How to test (local): python scripts/horizon37_pair_cardinality_smoke.py --verify.
CI: .github/workflows/horizon37-smoke.yml.
What it is: adjacent integer series never decreases — see Monotonic checkpoints in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Watermark verify | python scripts/horizon38_watermark_smoke.py --verify |
Loads texts/horizon38_watermark_sample.json; writes .tmp/horizon38-watermark/run.json (horizon38_watermark_run/1.0). Stdlib only. |
What is still not H38 (full exit): per-partition vectors, Kafka ISR semantics.
How to test (local): python scripts/horizon38_watermark_smoke.py --verify.
CI: .github/workflows/horizon38-smoke.yml.
What it is: mutex_pairs forbid scheduling both jobs together — see Mutually exclusive jobs in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Mutex verify | python scripts/horizon39_job_mutex_smoke.py --verify |
Loads texts/horizon39_job_mutex_sample.json; writes .tmp/horizon39-job-mutex/run.json (horizon39_job_mutex_run/1.0). Stdlib only. |
What is still not H39 (full exit): durations, capacities, orchestrator backends.
How to test (local): python scripts/horizon39_job_mutex_smoke.py --verify.
CI: .github/workflows/horizon39-smoke.yml.
What it is: composite_ok iff every gate passes — see Composite policy AND in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Policy AND verify | python scripts/horizon40_policy_and_smoke.py --verify |
Loads texts/horizon40_policy_and_sample.json; writes .tmp/horizon40-policy-and/run.json (horizon40_policy_and_run/1.0). Stdlib only. |
What is still not H40 (full exit): OR groups, weighted scores, dynamic gate lists.
How to test (local): python scripts/horizon40_policy_and_smoke.py --verify.
CI: .github/workflows/horizon40-smoke.yml.
What it is: allowed iff region ∈ allowed_regions — see Geo-fence / data residency in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Geo-fence verify | python scripts/horizon41_geo_fence_smoke.py --verify |
Loads texts/horizon41_geo_fence_sample.json; writes .tmp/horizon41-geo-fence/run.json (horizon41_geo_fence_run/1.0). Stdlib only. |
What is still not H41 (full exit): multi-region failover semantics, lineage proofs, private interconnect routing.
How to test (local): python scripts/horizon41_geo_fence_smoke.py --verify.
CI: .github/workflows/horizon41-smoke.yml.
What it is: outbound url hostnames must match allowed_hosts (exact or registrable suffix when the rule contains .) — see Egress allow-list in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Egress verify | python scripts/horizon42_egress_allow_smoke.py --verify |
Loads texts/horizon42_egress_allow_sample.json; writes .tmp/horizon42-egress-allow/run.json (horizon42_egress_allow_run/1.0). Stdlib only. |
What is still not H42 (full exit): glob patterns, IP allow-lists, DNS rebinding defenses.
How to test (local): python scripts/horizon42_egress_allow_smoke.py --verify.
CI: .github/workflows/horizon42-smoke.yml.
What it is: valid iff age_seconds ≤ max_age_seconds — see Credential freshness in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Credential age verify | python scripts/horizon43_credential_age_smoke.py --verify |
Loads texts/horizon43_credential_age_sample.json; writes .tmp/horizon43-credential-age/run.json (horizon43_credential_age_run/1.0). Stdlib only. |
What is still not H43 (full exit): narrow JWT validity windows, replay caches, rotation webhooks.
How to test (local): python scripts/horizon43_credential_age_smoke.py --verify.
CI: .github/workflows/horizon43-smoke.yml.
What it is: apply_ok iff client_revision == stored_revision — see Optimistic concurrency in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Revision gate verify | python scripts/horizon44_optimistic_lock_smoke.py --verify |
Loads texts/horizon44_optimistic_lock_sample.json; writes .tmp/horizon44-optimistic-lock/run.json (horizon44_optimistic_lock_run/1.0). Stdlib only. |
What is still not H44 (full exit): vector clocks, CRDT merges, merge UX.
How to test (local): python scripts/horizon44_optimistic_lock_smoke.py --verify.
CI: .github/workflows/horizon44-smoke.yml.
What it is: allowed iff bytes ≤ max_bytes — see Payload size ceiling in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Payload size verify | python scripts/horizon45_payload_size_smoke.py --verify |
Loads texts/horizon45_payload_size_sample.json; writes .tmp/horizon45-payload-size/run.json (horizon45_payload_size_run/1.0). Stdlib only. |
What is still not H45 (full exit): per-tenant quotas, compression bombs, WebSocket frame limits.
How to test (local): python scripts/horizon45_payload_size_smoke.py --verify.
CI: .github/workflows/horizon45-smoke.yml.
What it is: sort samples_ms, approximate p99 (ceil(0.99·n)−1 rank), compare to max_p99_ms against expect_under_budget — see Latency tail budget in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| p99 verify | python scripts/horizon46_latency_p99_smoke.py --verify |
Loads texts/horizon46_latency_p99_sample.json; writes .tmp/horizon46-latency-p99/run.json (horizon46_latency_p99_run/1.0). Stdlib only. |
What is still not H46 (full exit): HDR histograms, weighted SLIs, multi-region tails.
How to test (local): python scripts/horizon46_latency_p99_smoke.py --verify.
CI: .github/workflows/horizon46-smoke.yml.
What it is: allowed iff the kill switch is off and policy_allow is true — see Kill switch in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Kill switch verify | python scripts/horizon47_kill_switch_smoke.py --verify |
Loads texts/horizon47_kill_switch_sample.json; writes .tmp/horizon47-kill-switch/run.json (horizon47_kill_switch_run/1.0). Stdlib only. |
What is still not H47 (full exit): scoped kills, gradual drains, audited expiry.
How to test (local): python scripts/horizon47_kill_switch_smoke.py --verify.
CI: .github/workflows/horizon47-smoke.yml.
What it is: pass_gate iff unique approver count ≥ min_distinct_approvers — see Dual control in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Dual control verify | python scripts/horizon48_dual_control_smoke.py --verify |
Loads texts/horizon48_dual_control_sample.json; writes .tmp/horizon48-dual-control/run.json (horizon48_dual_control_run/1.0). Stdlib only. |
What is still not H48 (full exit): SSO-bound identities, role matrices, duty calendars.
How to test (local): python scripts/horizon48_dual_control_smoke.py --verify.
CI: .github/workflows/horizon48-smoke.yml.
What it is: allow iff artifact_sha256 equals pinned_sha256 (hex, case-insensitive) — see Pinned artifact digest in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Digest pin verify | python scripts/horizon49_digest_pin_smoke.py --verify |
Loads texts/horizon49_digest_pin_sample.json; writes .tmp/horizon49-digest-pin/run.json (horizon49_digest_pin_run/1.0). Stdlib only. |
What is still not H49 (full exit): Sigstore attestations, OCI digest locks, SBOM linkage.
How to test (local): python scripts/horizon49_digest_pin_smoke.py --verify.
CI: .github/workflows/horizon49-smoke.yml.
What it is: compatible iff server_schema_major ≥ required_minimum_major — see Wire-format major-version compatibility in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Schema compat verify | python scripts/horizon50_schema_compat_smoke.py --verify |
Loads texts/horizon50_schema_compat_sample.json; writes .tmp/horizon50-schema-compat/run.json (horizon50_schema_compat_run/1.0). Stdlib only. |
What is still not H50 (full exit): minor negotiation, feature bits, codegen bridges.
How to test (local): python scripts/horizon50_schema_compat_smoke.py --verify.
CI: .github/workflows/horizon50-smoke.yml.
What it is: under_budget iff utilization_pct ≤ max_utilization_pct — see Quota headroom in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Quota headroom verify | python scripts/horizon51_quota_headroom_smoke.py --verify |
Loads texts/horizon51_quota_headroom_sample.json; writes .tmp/horizon51-quota-headroom/run.json (horizon51_quota_headroom_run/1.0). Stdlib only. |
What is still not H51 (full exit): predictive capacity, inode caps, replication slack models.
How to test (local): python scripts/horizon51_quota_headroom_smoke.py --verify.
CI: .github/workflows/horizon51-smoke.yml.
What it is: allow iff every required role appears in granted roles — see RBAC subset gate in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Role subset verify | python scripts/horizon52_role_subset_smoke.py --verify |
Loads texts/horizon52_role_subset_sample.json; writes .tmp/horizon52-role-subset/run.json (horizon52_role_subset_run/1.0). Stdlib only. |
What is still not H52 (full exit): ABAC, hierarchical roles, JIT elevation.
How to test (local): python scripts/horizon52_role_subset_smoke.py --verify.
CI: .github/workflows/horizon52-smoke.yml.
What it is: allow unless both dry_run and mutating_operation are true — see Dry-run gate in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Dry-run verify | python scripts/horizon53_dry_run_gate_smoke.py --verify |
Loads texts/horizon53_dry_run_gate_sample.json; writes .tmp/horizon53-dry-run-gate/run.json (horizon53_dry_run_gate_run/1.0). Stdlib only. |
What is still not H53 (full exit): shadow environments, mixed batches, replay audits.
How to test (local): python scripts/horizon53_dry_run_gate_smoke.py --verify.
CI: .github/workflows/horizon53-smoke.yml.
What it is: compliant iff backup_age_hours ≤ max_allowed_age_hours — see Backup recency in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Backup recency verify | python scripts/horizon54_backup_recency_smoke.py --verify |
Loads texts/horizon54_backup_recency_sample.json; writes .tmp/horizon54-backup-recency/run.json (horizon54_backup_recency_run/1.0). Stdlib only. |
What is still not H54 (full exit): replication lag proofs, immutable vaults, restore drills.
How to test (local): python scripts/horizon54_backup_recency_smoke.py --verify.
CI: .github/workflows/horizon54-smoke.yml.
What it is: allow if not sensitive or encryption at rest is enabled — see Encryption required in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Encryption tier verify | python scripts/horizon55_encryption_required_smoke.py --verify |
Loads texts/horizon55_encryption_required_sample.json; writes .tmp/horizon55-encryption-required/run.json (horizon55_encryption_required_run/1.0). Stdlib only. |
What is still not H55 (full exit): CMKs, field-level crypto, tenant KMS isolation.
How to test (local): python scripts/horizon55_encryption_required_smoke.py --verify.
CI: .github/workflows/horizon55-smoke.yml.
What it is: allow iff offered_tls_version is listed under allowed_tls_versions — see TLS version allow-list in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| TLS version verify | python scripts/horizon56_tls_version_smoke.py --verify |
Loads texts/horizon56_tls_version_sample.json; writes .tmp/horizon56-tls-version/run.json (horizon56_tls_version_run/1.0). Stdlib only. |
What is still not H56 (full exit): live handshake probes, per-listener matrices, downgrade monitors.
How to test (local): python scripts/horizon56_tls_version_smoke.py --verify.
CI: .github/workflows/horizon56-smoke.yml.
What it is: compliant iff severity does not require paging or pager_sent is true — see Severity routing in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Severity pager verify | python scripts/horizon57_sev_pager_smoke.py --verify |
Loads texts/horizon57_sev_pager_sample.json; writes .tmp/horizon57-sev-pager/run.json (horizon57_sev_pager_run/1.0). Stdlib only. |
What is still not H57 (full exit): rotation calendars, multi-channel paging, customer status bridges.
How to test (local): python scripts/horizon57_sev_pager_smoke.py --verify.
CI: .github/workflows/horizon57-smoke.yml.
What it is: compliant iff critical_open and high_open counts stay within max_critical_open / max_high_open — see Vulnerability budget in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Vuln budget verify | python scripts/horizon58_vuln_budget_smoke.py --verify |
Loads texts/horizon58_vuln_budget_sample.json; writes .tmp/horizon58-vuln-budget/run.json (horizon58_vuln_budget_run/1.0). Stdlib only. |
What is still not H58 (full exit): scanner quorum, reachability proofs, waiver workflows.
How to test (local): python scripts/horizon58_vuln_budget_smoke.py --verify.
CI: .github/workflows/horizon58-smoke.yml.
What it is: allow iff the channel does not require signatures or signature_valid is true — see Signature gate in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Signature gate verify | python scripts/horizon59_signature_gate_smoke.py --verify |
Loads texts/horizon59_signature_gate_sample.json; writes .tmp/horizon59-signature-gate/run.json (horizon59_signature_gate_run/1.0). Stdlib only. |
What is still not H59 (full exit): Sigstore transparency, threshold signing, attestations.
How to test (local): python scripts/horizon59_signature_gate_smoke.py --verify.
CI: .github/workflows/horizon59-smoke.yml.
What it is: compliant iff each dependency_license (normalized) appears in allowed_license_ids — see SPDX license allow-list in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| License allow verify | python scripts/horizon60_license_allow_smoke.py --verify |
Loads texts/horizon60_license_allow_sample.json; writes .tmp/horizon60-license-allow/run.json (horizon60_license_allow_run/1.0). Stdlib only. |
What is still not H60 (full exit): composite SPDX expressions, transitive graphs, counsel workflows.
How to test (local): python scripts/horizon60_license_allow_smoke.py --verify.
CI: .github/workflows/horizon60-smoke.yml.
What it is: compliant iff maintainer_count ≥ min_maintainers — see Maintainer quorum in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Maintainer quorum verify | python scripts/horizon61_maintainer_quorum_smoke.py --verify |
Loads texts/horizon61_maintainer_quorum_sample.json; writes .tmp/horizon61-maintainer-quorum/run.json (horizon61_maintainer_quorum_run/1.0). Stdlib only. |
What is still not H61 (full exit): CODEOWNER coverage, verified identities, succession drills.
How to test (local): python scripts/horizon61_maintainer_quorum_smoke.py --verify.
CI: .github/workflows/horizon61-smoke.yml.
What it is: allow_merge iff the branch is not listed under protected_branches, or min_approvals_met and ci_green are both true — see Protected-branch merge gate in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Branch protect verify | python scripts/horizon62_branch_protect_smoke.py --verify |
Loads texts/horizon62_branch_protect_sample.json; writes .tmp/horizon62-branch-protect/run.json (horizon62_branch_protect_run/1.0). Stdlib only. |
What is still not H62 (full exit): merge queues, required contexts lists, stacked PR semantics.
How to test (local): python scripts/horizon62_branch_protect_smoke.py --verify.
CI: .github/workflows/horizon62-smoke.yml.
What it is: compliant iff open_secret_findings ≤ max_open_secret_findings — see Secret-scan sweep ceiling in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Secret sweep verify | python scripts/horizon63_secret_sweep_smoke.py --verify |
Loads texts/horizon63_secret_sweep_sample.json; writes .tmp/horizon63-secret-sweep/run.json (horizon63_secret_sweep_run/1.0). Stdlib only. |
What is still not H63 (full exit): entropy tuning, KMS-linked blast radius, waiver workflows.
How to test (local): python scripts/horizon63_secret_sweep_smoke.py --verify.
CI: .github/workflows/horizon63-smoke.yml.
What it is: compliant iff image_age_days ≤ max_image_age_days — see Container image age ceiling in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Image age verify | python scripts/horizon64_image_age_smoke.py --verify |
Loads texts/horizon64_image_age_sample.json; writes .tmp/horizon64-image-age/run.json (horizon64_image_age_run/1.0). Stdlib only. |
What is still not H64 (full exit): rolling rebuild automation, digest-only pulls, mirror skew handling.
How to test (local): python scripts/horizon64_image_age_smoke.py --verify.
CI: .github/workflows/horizon64-smoke.yml.
What it is: compliant iff severity is not in severities_requiring_rca, or hours_to_rca_doc ≤ max_hours_to_rca_doc — see RCA documentation deadline in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| RCA deadline verify | python scripts/horizon65_rca_deadline_smoke.py --verify |
Loads texts/horizon65_rca_deadline_sample.json; writes .tmp/horizon65-rca-deadline/run.json (horizon65_rca_deadline_run/1.0). Stdlib only. |
What is still not H65 (full exit): blameless templates, tracked corrective actions, legal holds.
How to test (local): python scripts/horizon65_rca_deadline_smoke.py --verify.
CI: .github/workflows/horizon65-smoke.yml.
What it is: compliant iff deprecated_dependency_count ≤ max_deprecated_dependencies — see Deprecated dependency ceiling in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Deprecated dep verify | python scripts/horizon66_deprecated_dep_smoke.py --verify |
Loads texts/horizon66_deprecated_dep_sample.json; writes .tmp/horizon66-deprecated-dep/run.json (horizon66_deprecated_dep_run/1.0). Stdlib only. |
What is still not H66 (full exit): transitive graphs, codemods, vendor RFC timelines.
How to test (local): python scripts/horizon66_deprecated_dep_smoke.py --verify.
CI: .github/workflows/horizon66-smoke.yml.
What it is: compliant iff customer_tier is not in customer_tiers_requiring_fast_export, or hours_to_complete_export ≤ max_hours_to_complete_export — see DSAR export SLA in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| DSAR export verify | python scripts/horizon67_dsar_export_smoke.py --verify |
Loads texts/horizon67_dsar_export_sample.json; writes .tmp/horizon67-dsar-export/run.json (horizon67_dsar_export_run/1.0). Stdlib only. |
What is still not H67 (full exit): identity proofs, jurisdictional carve-outs, portal workflows.
How to test (local): python scripts/horizon67_dsar_export_smoke.py --verify.
CI: .github/workflows/horizon67-smoke.yml.
What it is: compliant iff blocking_static_findings ≤ max_blocking_static_findings — see Blocking static-analysis ceiling in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Static block verify | python scripts/horizon68_static_block_smoke.py --verify |
Loads texts/horizon68_static_block_sample.json; writes .tmp/horizon68-static-block/run.json (horizon68_static_block_run/1.0). Stdlib only. |
What is still not H68 (full exit): per-language packs, incremental ratchets, autofix bots.
How to test (local): python scripts/horizon68_static_block_smoke.py --verify.
CI: .github/workflows/horizon68-smoke.yml.
What it is: allow iff requested_host (normalized) is listed under allowed_api_hosts — see Vendor API host allow-list in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Vendor host verify | python scripts/horizon69_vendor_host_smoke.py --verify |
Loads texts/horizon69_vendor_host_sample.json; writes .tmp/horizon69-vendor-host/run.json (horizon69_vendor_host_run/1.0). Stdlib only. |
What is still not H69 (full exit): suffix wildcards, mTLS pinning, regional VPC endpoints.
How to test (local): python scripts/horizon69_vendor_host_smoke.py --verify.
CI: .github/workflows/horizon69-smoke.yml.
What it is: compliant iff open_major_incidents ≤ max_open_major_incidents — see Major-incident backlog ceiling in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Incident backlog verify | python scripts/horizon70_incident_backlog_smoke.py --verify |
Loads texts/horizon70_incident_backlog_sample.json; writes .tmp/horizon70-incident-backlog/run.json (horizon70_incident_backlog_run/1.0). Stdlib only. |
What is still not H70 (full exit): severity rubrics, dedup across regions, executive dashboards.
How to test (local): python scripts/horizon70_incident_backlog_smoke.py --verify.
CI: .github/workflows/horizon70-smoke.yml.
What it is: allow_service iff hours_past_due ≤ max_hours_past_due_allowed — see Payment overdue grace window in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Payment grace verify | python scripts/horizon71_payment_grace_smoke.py --verify |
Loads texts/horizon71_payment_grace_sample.json; writes .tmp/horizon71-payment-grace/run.json (horizon71_payment_grace_run/1.0). Stdlib only. |
What is still not H71 (full exit): dunning ladders, processor webhooks, enterprise invoicing carve-outs.
How to test (local): python scripts/horizon71_payment_grace_smoke.py --verify.
CI: .github/workflows/horizon71-smoke.yml.
What it is: compliant iff open_p1_tickets ≤ max_open_p1_tickets — see P1 ticket backlog ceiling in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Ticket backlog verify | python scripts/horizon72_ticket_backlog_smoke.py --verify |
Loads texts/horizon72_ticket_backlog_sample.json; writes .tmp/horizon72-ticket-backlog/run.json (horizon72_ticket_backlog_run/1.0). Stdlib only. |
What is still not H72 (full exit): tier-specific SLAs, holiday staffing, multi-product queues.
How to test (local): python scripts/horizon72_ticket_backlog_smoke.py --verify.
CI: .github/workflows/horizon72-smoke.yml.
What it is: compliant iff auto_renew is true or days_until_expiry ≥ min_notice_days_before_expiry — see Contract renewal notice in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Contract notice verify | python scripts/horizon73_contract_notice_smoke.py --verify |
Loads texts/horizon73_contract_notice_sample.json; writes .tmp/horizon73-contract-notice/run.json (horizon73_contract_notice_run/1.0). Stdlib only. |
What is still not H73 (full exit): procurement workflows, BAAs, termination clauses.
How to test (local): python scripts/horizon73_contract_notice_smoke.py --verify.
CI: .github/workflows/horizon73-smoke.yml.
What it is: compliant iff open_critical_pen_test_findings ≤ max_open_critical_pen_test_findings — see Penetration-test critical-findings ceiling in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Pen-test findings verify | python scripts/horizon74_pentest_findings_smoke.py --verify |
Loads texts/horizon74_pentest_findings_sample.json; writes .tmp/horizon74-pentest-findings/run.json (horizon74_pentest_findings_run/1.0). Stdlib only. |
What is still not H74 (full exit): retest SLAs, scoped scopes, bounty program links.
How to test (local): python scripts/horizon74_pentest_findings_smoke.py --verify.
CI: .github/workflows/horizon74-smoke.yml.
What it is: compliant iff months_since_last_drill ≤ max_months_between_drills — see Disaster-recovery drill recency in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| DR drill recency verify | python scripts/horizon75_drill_recency_smoke.py --verify |
Loads texts/horizon75_drill_recency_sample.json; writes .tmp/horizon75-drill-recency/run.json (horizon75_drill_recency_run/1.0). Stdlib only. |
What is still not H75 (full exit): multi-region game days, fault injection, auditor evidence stores.
How to test (local): python scripts/horizon75_drill_recency_smoke.py --verify.
CI: .github/workflows/horizon75-smoke.yml.
What it is: compliant iff open_a11y_blockers ≤ max_open_a11y_blockers — see Accessibility blocker ceiling in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| A11y blocker verify | python scripts/horizon76_a11y_block_smoke.py --verify |
Loads texts/horizon76_a11y_block_sample.json; writes .tmp/horizon76-a11y-block/run.json (horizon76_a11y_block_run/1.0). Stdlib only. |
What is still not H76 (full exit): axe-core CI wiring, manual audits, AT user panels.
How to test (local): python scripts/horizon76_a11y_block_smoke.py --verify.
CI: .github/workflows/horizon76-smoke.yml.
What it is: allow_mutate iff ¬legal_hold_active ∨ counsel_override — see Legal-hold mutation gate in texts/further-development-universe-brain.md.
| Piece | What you run | Why it helps |
|---|---|---|
| Legal hold verify | python scripts/horizon77_legal_hold_smoke.py --verify |
Loads texts/horizon77_legal_hold_sample.json; writes .tmp/horizon77-legal-hold/run.json (horizon77_legal_hold_run/1.0). Stdlib only. |
What is still not H77 (full exit): custodian workflows, chain-of-custody logs, e-discovery bridges.
How to test (local): python scripts/horizon77_legal_hold_smoke.py --verify.
CI: .github/workflows/horizon77-smoke.yml.
The canonical training implementation is scripts/train_tinymodel1_classifier.py. scripts/train_tinymodel1_agnews.py is a thin wrapper that calls the same main() with AG News–friendly defaults.
| Function / area | Role |
|---|---|
parse_args() |
CLI for dataset id, splits, text/label columns, caps, hyperparameters, --seed, and Hub card metadata. |
set_seed() |
Sets Python, NumPy, and PyTorch RNGs so runs are repeatable for a given --seed. |
load_splits() |
Loads the Hub dataset, selects train/eval split names, shuffles each split with seed, then takes the first N rows (--max-train-samples, --max-eval-samples). |
infer_text_column() |
Picks the text column if you do not pass --text-column. |
resolve_label_names() / build_label_maps() / rows_to_model_inputs() |
Resolve class names, map raw labels to contiguous ids, and build Dataset columns for training. |
build_tokenizer() |
Trains a WordPiece tokenizer on training texts and writes tokenizer files under the output dir. |
evaluate() / evaluate_with_details() |
Runs eval, builds the confusion matrix and EvalMetrics; evaluate_with_details also records per-example max softmax (winner) probability for calibration histograms. |
write_eval_report() |
Writes eval_report.json: reproducibility, metrics, plus optional dataset_quality, error_analysis, calibration, routing (see Phase 2 section). |
write_misclassified_jsonl() |
Writes misclassified_sample.jsonl (up to N lines) for manual error review. |
write_manifest() |
Writes artifact.json: training config, labels, and summary metrics for downstream tooling. |
write_model_card() |
Writes Hub-style README.md next to the weights (model card with eval summary). |
copy_model_card_image() |
Optionally copies TinyModel1Image.png into the output dir for the card banner. |
How the eval subset is defined (same script, same seed → same rows) is documented in texts/eval-reproducibility.md.
Besides AG News (train_tinymodel1_agnews.py), this repo includes a second single-label task on the Hub emotion (English short text, 6 classes: sadness, joy, love, anger, fear, surprise). It uses the same training code path; only dataset id, eval split, and label names are preset.
| Entry point | Dataset | Eval split (default) |
|---|---|---|
scripts/train_tinymodel1_agnews.py |
fancyzhx/ag_news |
test |
scripts/train_tinymodel1_emotion.py |
emotion |
validation |
scripts/train_tinymodel1_sst2.py |
glue (sst2) |
validation |
Equivalent explicit CLI (if you prefer not to use the wrapper):
python scripts/train_tinymodel1_classifier.py \
--dataset emotion \
--eval-split validation \
--labels sadness,joy,love,anger,fear,surprise \
--output-dir .tmp/TinyModel-emotionInstant smoke test (small samples, ~1 minute on CPU; needs network to download emotion once):
python scripts/train_tinymodel1_emotion.py \
--output-dir artifacts/emotion-smoke \
--max-train-samples 200 \
--max-eval-samples 100 \
--epochs 1 \
--batch-size 8 \
--seed 42Then check artifacts/emotion-smoke/eval_report.json — reproducibility.dataset should be emotion and label_order should list the six emotion names. For other Hub datasets, pass --dataset, splits, and optional --labels / --text-column to train_tinymodel1_classifier.py directly.
scripts/embeddings_smoke_test.py runs TinyModelRuntime on a few queries: classification probabilities, pairwise similarity, and retrieval over a toy candidate list (support/triage scenario).
What these terms mean
- Classification probabilities — Output of
TinyModelRuntime.classify(...): for each input text, the model returns a probability distribution across all labels (values sum to ~1.0). Use this for routing decisions and confidence-aware thresholds. - Pairwise similarity — Output of
TinyModelRuntime.similarity(text_a, text_b): cosine similarity between two sentence embeddings (from the encoder). Higher values mean semantically closer text under this model. - Retrieval — Output of
TinyModelRuntime.retrieve(query, candidates, top_k=...): ranks candidate texts by embedding similarity to a query and returns top matches with scores and indices.
Instant test (needs a checkpoint — train the tiny eval run first, or pass a Hub id):
python scripts/train_tinymodel1_classifier.py \
--output-dir artifacts/eval-smoke --max-train-samples 120 --max-eval-samples 80 \
--epochs 1 --batch-size 8 --seed 42
python scripts/embeddings_smoke_test.py --model artifacts/eval-smoke
# Or: python scripts/embeddings_smoke_test.py --model HyperlinksSpace/TinyModel1scripts/finetune_pretrained_classifier.py fine-tunes AutoModelForSequenceClassification from --base-model (default distilbert-base-uncased) using the same splits and metrics as the scratch trainer. Use matching --seed and sample caps, then compare eval_report.json / artifact.json to a scratch run.
Instant test (downloads base weights once; CPU-friendly small run):
python scripts/finetune_pretrained_classifier.py \
--output-dir artifacts/finetune-smoke \
--base-model distilbert-base-uncased \
--max-train-samples 400 --max-eval-samples 200 \
--epochs 1 --batch-size 8 --seed 42For proprietary or weakly labeled data: use a short label guide, versioned snapshots, and leakage-safe splits. See texts/labeling-and-data-hygiene.md.
This section summarizes the currently implemented components and their practical purpose.
| Part | What it is for | How to launch | What to verify |
|---|---|---|---|
Scratch baseline training (scripts/train_tinymodel1_classifier.py) |
Build a small from-scratch text classifier baseline and export all model artifacts. | python scripts/train_tinymodel1_classifier.py --output-dir artifacts/eval-smoke --max-train-samples 120 --max-eval-samples 80 --epochs 1 --batch-size 8 --seed 42 |
artifacts/eval-smoke/eval_report.json exists and includes accuracy, macro_f1, per_class_f1, confusion_matrix. |
Second dataset path (scripts/train_tinymodel1_emotion.py) |
Prove the same pipeline works on another Hub dataset without forking core training code. | python scripts/train_tinymodel1_emotion.py --output-dir artifacts/emotion-smoke --max-train-samples 200 --max-eval-samples 100 --epochs 1 --batch-size 8 --seed 42 |
reproducibility.dataset == "emotion" and 6 labels in label_order. |
Embeddings/runtime smoke (scripts/embeddings_smoke_test.py) |
Validate product-shaped runtime behavior: classify, similarity, retrieval. | python scripts/embeddings_smoke_test.py --model artifacts/eval-smoke (or --model HyperlinksSpace/TinyModel1) |
Script prints all 3 blocks and ends with Embeddings smoke test completed. |
Pretrained fine-tune path (scripts/finetune_pretrained_classifier.py) |
Compare a pretrained encoder baseline (DistilBERT/BERT-family) against scratch training using same eval reporting format. | python scripts/finetune_pretrained_classifier.py --output-dir artifacts/finetune-smoke --base-model distilbert-base-uncased --max-train-samples 400 --max-eval-samples 200 --epochs 1 --batch-size 8 --seed 42 |
artifacts/finetune-smoke/eval_report.json + artifact.json exist; compare metrics to scratch run on same caps/seed. |
Data hygiene guide (texts/labeling-and-data-hygiene.md) |
Lightweight rules for label quality, versioning, and leakage prevention when moving to custom/proprietary data. | Read the file and apply before collecting custom labels. | Label guide versioning and split hygiene rules are defined before annotation scale-up. |
Kaggle→HF training workflow hardening (.github/workflows/train-via-kaggle-to-hf.yml) |
Make CI training/publish flow robust: stable auth handling, unique kernel slugs, resilient status polling, and clearer diagnostics. | Trigger workflow from GitHub Actions with version, namespace, train hyperparameters. |
Workflow reaches model publish step and uploads {namespace}/TinyModel{version}. |
Phase 3: ONNX, bench, API (scripts/phase3_*.py, texts/phase3-serving-profile.md) |
Export to ONNX, verify parity, CPU latency report, reference HTTP API. | Install once (pip install -r optional-requirements-phase3.txt); then in separate shell commands, run python scripts/phase3_export_onnx.py --model <dir>, python scripts/phase3_onnx_parity.py, python scripts/phase3_benchmark.py (see Phase 3 section). Never append then python to the same line as pip install. |
onnx/*.onnx present; benchmark under artifacts/phase3/reports/; parity exits 0. |
Load the published model by id (no local files required):
python -c "from transformers import pipeline; p=pipeline('text-classification', model='HyperlinksSpace/TinyModel1', tokenizer='HyperlinksSpace/TinyModel1'); print(p('Stocks rallied after central bank comments', top_k=None))"Use the general-purpose runtime helpers (classification + embeddings + semantic search):
from scripts.tinymodel_runtime import TinyModelRuntime
rt = TinyModelRuntime("HyperlinksSpace/TinyModel1")
# 1) Classification
print(rt.classify(["Oil prices fell after a demand forecast update."])[0])
# 2) Embeddings (shape: [batch, hidden_size])
emb = rt.embed(
[
"The team won the cup final in extra time.",
"Central bank policy affected bond yields.",
]
)
print(emb.shape)
# 3) Pairwise semantic similarity
score = rt.similarity(
"Stocks rose after inflation cooled.",
"Markets rallied as price growth slowed.",
)
print("similarity:", round(score, 4))
# 4) Retrieval: nearest texts to a query
hits = rt.retrieve(
"Chipmaker launches a new AI processor.",
[
"Parliament debated tax policy in the capital.",
"Semiconductor company unveils next-gen accelerator.",
"Team signs striker before the derby.",
],
top_k=2,
)
for h in hits:
print(h.index, round(h.score, 4), h.text)| Function | Return type | Output values |
|---|---|---|
classify(texts) |
list[dict[str, float]] |
One dict per input text. Keys are label names from model.config.id2label; values are probabilities in [0, 1] that sum to ~1.0 for each text. |
embed(texts, normalize=True) |
torch.Tensor |
Shape [batch_size, hidden_size] (default TinyModel hidden size is 128). If normalize=True, each row is L2-normalized (vector norm ~1.0). |
similarity(text_a, text_b) |
float |
Cosine similarity between the two embeddings. Typical range is [-1, 1]: higher means more semantically similar under this model. |
retrieve(query, candidates, top_k=3) |
list[RetrievalHit] |
Ranked top matches. Each item has: index (position in candidates), text (candidate string), score (cosine similarity; higher is closer). Length is min(top_k, len(candidates)). |
Or open the demo: direct app · on the Hub.
Quick checks:
- Space loads; inference returns labels and scores; no errors in Space logs.
Workflow definitions live under .github/workflows/. Trigger them from Actions → select the workflow → Run workflow. Runners use ubuntu-latest unless you change the workflow.
Configure these once per repository (or organization). They are not committed to git.
| Secret | Used by | Purpose |
|---|---|---|
HF_TOKEN |
Workflows below | Hugging Face access token with write permission to create/update models and Spaces in the target namespace. |
KAGGLE_USERNAME |
train-via-kaggle-to-hf.yml only |
Your Kaggle username (same value as in Kaggle Account → API). |
KAGGLE_KEY |
train-via-kaggle-to-hf.yml only |
Kaggle API key from Account → Create New API Token. |
No other GitHub secrets are read by these workflows. Internal step outputs (GITHUB_ENV) such as KAGGLE_OWNER / KAGGLE_KERNEL_SLUG are set automatically during the Kaggle run.
| Workflow | File |
|---|---|
| PR smoke: Phase 1 matrix (scratch, small caps) | phase1-smoke.yml |
| PR smoke: Phase 3 (train tiny → ONNX → parity → bench) | phase3-smoke.yml |
| Deploy versioned Space to Hugging Face | deploy-hf-space-versioned.yml |
| Train on Hugging Face Jobs and publish versioned model | train-hf-job-versioned.yml |
-
deploy-hf-space-versioned.yml— Builds the Gradio Space withscripts/build_space_artifact.pyand uploads{namespace}/TinyModel{version}Space.- Secrets:
HF_TOKEN. - Workflow inputs:
version,namespace,model_id(for exampleHyperlinksSpace/TinyModel1).
- Secrets:
-
train-hf-job-versioned.yml— Submits training on Hugging Face Jobs, then publishes{namespace}/TinyModel{version}.- Secrets:
HF_TOKEN(also passed into the remote job so it can runpublish_hf_artifact.py). - Workflow inputs:
version,namespace, optionalcommit_sha(empty = current workflow SHA),flavor,timeout,max_train_samples,max_eval_samples,epochs,batch_size,learning_rate. - If Hugging Face returns 402 Payment Required for Jobs, add billing/credits on your HF account or train locally and publish with
scripts/publish_hf_artifact.py(seetexts/HUGGING_FACE_DEPLOYMENT_INTERNAL.md).
- Secrets:
| Workflow | File |
|---|---|
| Train via Kaggle and publish to Hugging Face | train-via-kaggle-to-hf.yml |
train-via-kaggle-to-hf.yml— Creates a Kaggle kernel run, trains, downloads outputs, and pushes{namespace}/TinyModel{version}to the Hub.- Secrets:
KAGGLE_USERNAME,KAGGLE_KEY, andHF_TOKEN(for upload to Hugging Face). - Workflow inputs:
version,namespace,max_train_samples,max_eval_samples,epochs,batch_size,learning_rate. - External quota: Kaggle GPU/CPU weekly limits and any Kaggle compute credits your account uses; not covered by GitHub Actions alone.
- Secrets:
Illustrative directions for evolving the TinyModel line (pick what matches your product goals):
- Accuracy and capacity — Train on more AG News samples or epochs; adjust the tiny BERT config (depth, width, sequence length); add LR schedules, warmup, or regularization suited to your budget.
- Domains and label sets — Fine-tune on proprietary or niche corpora; replace the four AG News classes with your own taxonomy and a labeled dataset.
- Shipping inference — Document ONNX or quantized exports for edge and serverless; add batch-inference examples; optionally wire a Hugging Face Inference Endpoint for a stable HTTP API.
- Space and API UX — Batch inputs, per-class thresholds, richer examples, or client snippets (Python and JavaScript) for integrators.
- Evaluation discipline — Fixed test split, confusion matrix, calibration, and versioned eval reports alongside
artifact.json. - Repository hygiene — Lightweight CI (lint, script smoke tests) that never pulls large weights; optional Hub Collections or docs that link model, Space, and release notes.
Nothing here is committed on a fixed timeline; treat it as a backlog of sensible next steps for a small text understanding stack.
The living plan is in texts/further-development-plan.md. Recent updates there:
- Exit steps (verification) for Phase 1–3, optional R&D, and each decision gate (concrete commands, exit status 0, artifacts).
- Phase 2 routing:
texts/phase2-routing-threshold-scenario.md. - Phase 3 (done in repo): ONNX export, parity, CPU benchmark, reference API, serving doc — see Phase 3 in this
READMEandtexts/phase3-serving-profile.md. CI:.github/workflows/phase3-smoke.yml. - Optional R&D backlog:
texts/optional-rd-backlog.md. - Plan status and What is left (if any) at the end of the plan file (mostly optional follow-ups).
Quick Phase 1 exit check (local, matches CI):
python scripts/phase1_compare.py \
--preset smoke \
--models scratch \
--datasets ag_news,emotion \
--seed 42
echo $?
# Expect: 0; reports under artifacts/phase1/reports/phase1_smoke_seed42.*For the up-to-date list of optional or future work, see “What is left (if any)” at the end of the same plan file.