From 08d55b0277c695a5365194d20d775cb675d52f6f Mon Sep 17 00:00:00 2001 From: che cheng Date: Sat, 2 May 2026 01:30:29 +0800 Subject: [PATCH] docs: propose macdoc docx workflow spec Refs #92 --- .../macdoc-docx-workflow-cli/.openspec.yaml | 4 + .../macdoc-docx-workflow-cli/design.md | 74 +++++++ .../macdoc-docx-workflow-cli/proposal.md | 52 +++++ .../specs/docx-workflow-cli/spec.md | 197 ++++++++++++++++++ .../changes/macdoc-docx-workflow-cli/tasks.md | 29 +++ 5 files changed, 356 insertions(+) create mode 100644 openspec/changes/macdoc-docx-workflow-cli/.openspec.yaml create mode 100644 openspec/changes/macdoc-docx-workflow-cli/design.md create mode 100644 openspec/changes/macdoc-docx-workflow-cli/proposal.md create mode 100644 openspec/changes/macdoc-docx-workflow-cli/specs/docx-workflow-cli/spec.md create mode 100644 openspec/changes/macdoc-docx-workflow-cli/tasks.md diff --git a/openspec/changes/macdoc-docx-workflow-cli/.openspec.yaml b/openspec/changes/macdoc-docx-workflow-cli/.openspec.yaml new file mode 100644 index 0000000..2e10e34 --- /dev/null +++ b/openspec/changes/macdoc-docx-workflow-cli/.openspec.yaml @@ -0,0 +1,4 @@ +schema: spec-driven +created: 2026-05-02 +created_by: che cheng +created_with: claude diff --git a/openspec/changes/macdoc-docx-workflow-cli/design.md b/openspec/changes/macdoc-docx-workflow-cli/design.md new file mode 100644 index 0000000..c38543f --- /dev/null +++ b/openspec/changes/macdoc-docx-workflow-cli/design.md @@ -0,0 +1,74 @@ +## Context + +macdoc already exposes `convert --to docx` routes for format conversion and imports `word-builder-swift` as a dependency. It also has adjacent OOXML packages in the workspace, and `che-word-mcp` already proves the lower-level `.docx` mutation engine for interactive agent workflows. The missing layer is a deterministic CLI workflow that lets an agent declare a desired document edit as data, inspect the planned operations, run the edit, and verify the result without owning an MCP session. + +Issue #92 changed direction from a standalone `dxedit` executable to an integrated `macdoc docx` command family. That direction keeps the user-facing product compact while preserving the option to add a thin wrapper after the manifest schema has production mileage. + +## Goals / Non-Goals + +**Goals:** + +- Add one `macdoc docx` namespace covering build, patch, apply, plan, verify, and diff. +- Define one Codable JSON manifest model shared by CLI commands and tests. +- Put parsing, planning, execution, readback verification, and structural diff logic in an importable Swift library target. +- Preserve source documents by writing a separate output file by default. +- Keep generated fixtures synthetic and small enough for stable CI. + +**Non-Goals:** + +- Ship a public `dxedit` binary in Phase 1. +- Add YAML parsing or a YAML dependency in Phase 1. +- Commit private thesis, manuscript, advisor-review, or client documents as fixtures. +- Replace `che-word-mcp` session tools, autosave behaviour, comment-thread workflows, or interactive MCP editing. +- Guarantee Microsoft Word visual rendering fidelity in Phase 1. + +## Decisions + +### Use `macdoc docx` as the Phase 1 product surface + +The CLI surface will be `macdoc docx build`, `macdoc docx patch`, `macdoc docx apply`, `macdoc docx plan`, `macdoc docx verify`, and `macdoc docx diff`. A future `dxedit` wrapper can delegate to the same library after users have exercised the schema. This avoids asking agents to choose between two tools while the contract is still being shaped. + +Alternative considered: create `dxedit` first and wire macdoc later. Rejected because it would force documentation, issue triage, and plugin instructions to teach a second command before the manifest model is stable. + +### Keep workflow logic in `DocxWorkflowLib` + +Create a new library target with the public entry points `DocxManifest`, `DocxWorkflowPlanner`, `DocxWorkflowExecutor`, `DocxWorkflowVerifier`, and `DocxWorkflowDiffer`. `MacDocCLI` translates ArgumentParser options into URLs and calls the library. Tests cover the library directly first, then add CLI smoke tests for argument routing and exit behaviour. + +Alternative considered: implement everything inside `Sources/MacDocCLI`. Rejected because manifest execution and verification need importable APIs for tests, future MCP/tool wrappers, and a future `dxedit` wrapper. + +### Use JSON-first Codable manifests + +Phase 1 accepts JSON manifests only. The manifest includes `schemaVersion`, `workflow`, optional `input`, optional `template`, optional `output`, ordered `steps`, and `checks`. CLI arguments can override `input`, `template`, and `output`; the effective resolved paths appear in `plan` output so automation logs are reproducible. + +Alternative considered: accept YAML immediately for author comfort. Rejected because the repo does not need another parser dependency before the schema stabilizes, and JSON maps cleanly to Swift Codable tests. + +### Separate build, patch, and apply semantics + +Build starts from an empty document model and uses `word-builder-swift` concepts for sections, paragraphs, runs, tables, images, and equations where the package exposes stable model coverage. Patch starts from a template document and targets explicit placeholders such as text tokens or content controls. Apply starts from an existing document and runs ordered operations such as text replacement or insertion against resolved anchors. + +Alternative considered: make every operation a generic apply step. Rejected because creation, placeholder filling, and source mutation have different safety checks and error messages. + +### Make planning mandatory inside every execution path + +Execution first builds the same operation plan that `macdoc docx plan` prints, then runs that plan. Anchor misses, duplicate placeholder matches that are not explicitly allowed, unsupported step types, and invalid path resolution fail before writing the output file. + +Alternative considered: execute directly and report partial results. Rejected because agent workflows need deterministic failure before file writes. + +### Verify by OOXML readback and manifest checks + +`verify` reads the output `.docx` after execution and enforces checks declared in the manifest: required text, forbidden text, expected replacement counts, required image relationship count, and successful readback. It also reports the executed operation summary from the plan when available. + +Alternative considered: rely on process exit code and non-empty output. Rejected because #92 is about agent quality; a non-empty document is not enough evidence that the intended edit landed. + +### Diff at a Word-aware structural level + +`diff` compares extracted document text and selected structure metadata rather than raw zip bytes. The first pass reports paragraph text additions/removals, table count changes, image relationship count changes, and field/equation count changes when the reader exposes those values. Raw ZIP byte diff is not the default because unrelated archive ordering or relationship id churn would create noisy output. + +Alternative considered: shell out to a generic binary diff. Rejected because it is unusable for agent review of `.docx` edits. + +## Risks / Trade-offs + +- [Risk] OOXML reader/writer coverage is lower than the full Microsoft Word surface. → Mitigation: Phase 1 supports a constrained operation set, returns explicit unsupported-step errors, and relies on existing OOXML preservation tests for source round-trip safety. +- [Risk] JSON manifests are less pleasant to hand-write than YAML. → Mitigation: keep schemaVersion 1 compact, document examples, and defer YAML until the workflow has stable users. +- [Risk] Build, patch, and apply can drift into three separate engines. → Mitigation: all three produce a normalized `DocxOperationPlan` before execution, and verify/diff consume shared readback summaries. +- [Risk] Directly depending on transitive OOXML products can create package resolution ambiguity. → Mitigation: add explicit root package dependencies for the Word/OOXML products used by `DocxWorkflowLib` instead of relying on transitive imports. diff --git a/openspec/changes/macdoc-docx-workflow-cli/proposal.md b/openspec/changes/macdoc-docx-workflow-cli/proposal.md new file mode 100644 index 0000000..9bbd12b --- /dev/null +++ b/openspec/changes/macdoc-docx-workflow-cli/proposal.md @@ -0,0 +1,52 @@ +## Why + +Agent-driven Word editing currently spans multiple mental models: `macdoc convert --to docx` creates converted documents, `word-builder-swift` can build new documents from Swift code, and `che-word-mcp` exposes lower-level mutation tools. Issue #92 needs a first-class `macdoc docx` workflow so agents can build, patch, apply, plan, verify, and diff `.docx` edits from a stable manifest contract instead of choosing between a separate `dxedit` binary and ad-hoc MCP calls. + +## What Changes + +- Add an integrated `macdoc docx` command namespace for deterministic Word document workflows. +- Define JSON-first Codable manifests for three workflow families: + - build: create a new `.docx` from declarative sections, paragraphs, tables, images, and equations using the `word-builder-swift` model where applicable. + - patch: fill or replace placeholders in a template `.docx` without treating the document as a blank build. + - apply: mutate an existing `.docx` through ordered manifest steps and write a separate output document by default. +- Add companion planning and validation commands under the same namespace: + - plan: parse the manifest and source document, resolve anchors/placeholders, and print the operations that would run. + - verify: read back the output document and enforce manifest-declared checks. + - diff: compare two `.docx` files at a Word-aware structural/text level suitable for CLI review. +- Keep manifest decoding, planning, execution, verification, and diffing in an importable Swift library target; keep `MacDocCLI` as the thin command layer. +- Treat a standalone `dxedit` executable as a future compatibility wrapper, not the Phase 1 product surface. + +## Non-Goals + +- No standalone public `dxedit` binary in Phase 1. +- No YAML manifest dependency in Phase 1; JSON is the required input format. +- No private thesis, manuscript, or advisor-review documents committed as fixtures. +- No attempt to make `macdoc docx` replace `che-word-mcp` session tools, autosave semantics, or interactive MCP editing workflows. +- No visual Microsoft Word rendering verification in Phase 1; verification is based on OOXML readback and structural assertions. + +## Capabilities + +### New Capabilities + +- `docx-workflow-cli`: The integrated `macdoc docx` command namespace, manifest contract, library/CLI boundary, dry-run planning, execution, verification, and structural diff behaviour. + +### Modified Capabilities + +(none) + +## Impact + +- Affected specs: docx-workflow-cli +- Affected code: + - New: Sources/DocxWorkflowLib/ + - New: Tests/DocxWorkflowLibTests/ + - New: Sources/MacDocCLI/MacDoc+Docx.swift + - New: Tests/MacDocCLITests/DocxWorkflowCommandTests.swift + - Modified: Package.swift + - Modified: Sources/MacDocCLI/MacDoc.swift + - Modified: README.md + - Modified: CONVERSIONS.md +- Related dependencies and systems: + - Uses existing `word-builder-swift` for new document build semantics where it already covers the requested document model. + - Uses existing OOXML/Word packages for readback, mutation, and preservation instead of inventing a second `.docx` engine inside the CLI target. + - Tracks GitHub issue #92. diff --git a/openspec/changes/macdoc-docx-workflow-cli/specs/docx-workflow-cli/spec.md b/openspec/changes/macdoc-docx-workflow-cli/specs/docx-workflow-cli/spec.md new file mode 100644 index 0000000..2822b25 --- /dev/null +++ b/openspec/changes/macdoc-docx-workflow-cli/specs/docx-workflow-cli/spec.md @@ -0,0 +1,197 @@ +## ADDED Requirements + +### Requirement: Integrated docx command namespace + +The system SHALL expose a `docx` subcommand under the existing `macdoc` executable. The namespace SHALL include `build`, `patch`, `apply`, `plan`, `verify`, and `diff` subcommands. The system SHALL NOT require a separate `dxedit` executable for Phase 1 workflows. + +#### Scenario: List docx subcommands + +- **WHEN** the user runs `macdoc docx --help` +- **THEN** the help output lists `build`, `patch`, `apply`, `plan`, `verify`, and `diff` + +#### Scenario: No separate executable requirement + +- **WHEN** a Phase 1 workflow is documented or tested +- **THEN** the command starts with `macdoc docx` and does not require `dxedit` + +### Requirement: Importable workflow library boundary + +The system SHALL keep manifest decoding, path resolution, operation planning, execution, verification, and structural diffing in an importable Swift library target named `DocxWorkflowLib`. The `MacDocCLI` target SHALL only parse command-line arguments, call `DocxWorkflowLib`, print results, and map thrown errors to non-zero exits. + +#### Scenario: Library APIs are testable without CLI process execution + +- **WHEN** `DocxWorkflowLibTests` import `DocxWorkflowLib` +- **THEN** tests instantiate manifest, planner, executor, verifier, and differ types without launching the `macdoc` executable + +#### Scenario: CLI delegates workflow execution + +- **WHEN** a CLI smoke test invokes `macdoc docx apply input.docx manifest.json --output output.docx` +- **THEN** the CLI routes parsed arguments into `DocxWorkflowLib` and does not duplicate manifest execution logic in the CLI target + +### Requirement: JSON Codable manifest contract + +The system SHALL accept JSON manifests decoded through Swift Codable types. A manifest SHALL include `schemaVersion` and `workflow`. `workflow` SHALL be one of `build`, `patch`, or `apply`. A manifest SHALL contain ordered `steps`; verification checks SHALL be represented as an ordered `checks` array when verification is requested. CLI arguments SHALL override `input`, `template`, and `output` path fields from the manifest during path resolution. + +#### Scenario: Valid apply manifest decodes + +- **WHEN** the user provides a JSON manifest with `schemaVersion: 1`, `workflow: "apply"`, one `replaceText` step, and one `containsText` check +- **THEN** the system decodes the manifest into `DocxManifest` and preserves the step and check order + +##### Example: apply manifest + +```json +{ + "schemaVersion": 1, + "workflow": "apply", + "input": "input.docx", + "output": "output.docx", + "steps": [ + { "op": "replaceText", "find": "{{advisor}}", "with": "Dr. Chen", "scope": "body" } + ], + "checks": [ + { "type": "containsText", "text": "Dr. Chen" }, + { "type": "notContainsText", "text": "{{advisor}}" } + ] +} +``` + +#### Scenario: Invalid manifest fails before document write + +- **WHEN** the manifest is not valid JSON, omits `schemaVersion`, omits `workflow`, uses an unknown workflow, or contains an unknown step operation +- **THEN** the system returns a typed validation error and does not create or overwrite an output `.docx` file + +#### Scenario: CLI path overrides are reflected in the plan + +- **WHEN** the manifest declares `output: "manifest-output.docx"` and the user passes `--output cli-output.docx` +- **THEN** the effective operation plan records `cli-output.docx` as the output path + +### Requirement: Build workflow creates new documents + +The `build` workflow SHALL start from an empty document model and write a new `.docx` file from declarative manifest content. The workflow SHALL support sections containing paragraphs and tables in Phase 1. Paragraph content SHALL support plain text runs and basic run properties that map to the existing Word builder model. + +#### Scenario: Build a document from manifest content + +- **WHEN** the user runs `macdoc docx build build.json --output built.docx` and `build.json` contains one section with one paragraph containing `Hello from macdoc` +- **THEN** the system writes `built.docx` +- **AND** OOXML readback of `built.docx` contains `Hello from macdoc` + +#### Scenario: Build rejects source document arguments + +- **WHEN** the user runs `macdoc docx build build.json --input existing.docx --output built.docx` +- **THEN** the system returns a validation error because `build` does not mutate a source document + +### Requirement: Patch workflow fills template placeholders + +The `patch` workflow SHALL read a template `.docx`, resolve explicit placeholders, apply manifest-provided replacements, and write a separate output `.docx` file by default. Supported placeholder anchors SHALL include literal text tokens in the form `{{name}}`. A placeholder that matches zero locations SHALL fail. A placeholder that matches more than one location SHALL fail unless the manifest explicitly declares replacement of all matches for that placeholder. + +#### Scenario: Patch a single placeholder + +- **WHEN** the user runs `macdoc docx patch template.docx patch.json --output patched.docx` and `template.docx` contains `Advisor: {{advisor}}` +- **THEN** the system writes `patched.docx` +- **AND** OOXML readback contains `Advisor: Dr. Chen` +- **AND** OOXML readback does not contain `{{advisor}}` + +#### Scenario: Duplicate placeholder fails without all-matches opt-in + +- **WHEN** the template contains two `{{advisor}}` placeholders and the manifest replacement does not declare all-matches replacement +- **THEN** the system returns an anchor ambiguity error and does not write the output document + +### Requirement: Apply workflow mutates existing documents through ordered steps + +The `apply` workflow SHALL read an existing `.docx`, execute ordered manifest steps, and write a separate output `.docx` file by default. Phase 1 SHALL support `replaceText`, `insertParagraphAfterText`, and `insertImageAfterText` operations when their anchors resolve exactly according to the step options. The workflow SHALL preserve unchanged parts through the underlying OOXML writer rather than rebuilding the document from scratch. + +#### Scenario: Apply ordered replacement and insertion + +- **WHEN** the user runs `macdoc docx apply input.docx apply.json --output edited.docx` and the manifest first replaces `{{status}}` with `Approved` and then inserts a paragraph after `Approved` +- **THEN** OOXML readback of `edited.docx` shows `Approved` before the inserted paragraph text + +#### Scenario: Apply fails on missing anchor + +- **WHEN** an `insertParagraphAfterText` step targets an anchor text that does not exist in the source document +- **THEN** the system returns an anchor resolution error and does not write the output document + +### Requirement: Plan reports deterministic operations without writing output + +The `plan` command SHALL decode the manifest, resolve effective input/template/output paths, resolve anchors when a source or template document is involved, and print the ordered operations that execution would run. The `plan` command SHALL NOT write an output `.docx` file. + +#### Scenario: Plan an apply workflow + +- **WHEN** the user runs `macdoc docx plan apply.json --input input.docx --output edited.docx` +- **THEN** stdout contains the workflow, effective input path, effective output path, ordered step identifiers, resolved anchor counts, and planned operation count +- **AND** `edited.docx` is not created + +#### Scenario: Plan fails on invalid manifest + +- **WHEN** the user runs `macdoc docx plan invalid.json` +- **THEN** the command exits non-zero and prints the manifest validation error + +### Requirement: Verify enforces manifest checks by OOXML readback + +The `verify` command SHALL read the output `.docx` and enforce manifest-declared checks. Phase 1 checks SHALL include `containsText`, `notContainsText`, `replacementCount`, and `readbackSucceeds`. Verification SHALL exit zero only when every check passes. + +#### Scenario: Verify successful edit + +- **WHEN** the user runs `macdoc docx verify edited.docx --manifest apply.json` and `edited.docx` contains every required text, lacks every forbidden text, matches the expected replacement count, and can be read back +- **THEN** the command exits zero and prints a passed check summary + +#### Scenario: Verify reports failed checks + +- **WHEN** the manifest requires `containsText: "Approved"` and the output document does not contain `Approved` +- **THEN** the command exits non-zero and prints the failing check identifier + +### Requirement: Diff reports Word-aware document changes + +The `diff` command SHALL compare two `.docx` files using OOXML readback summaries rather than raw zip bytes. Phase 1 output SHALL include paragraph text additions and removals, table count changes, image relationship count changes, and field or equation count changes when the reader exposes those counts. + +#### Scenario: Diff text changes + +- **WHEN** the user runs `macdoc docx diff before.docx after.docx` and `after.docx` replaces `Pending` with `Approved` +- **THEN** stdout reports removal of `Pending` and addition of `Approved` + +#### Scenario: Diff ignores archive byte ordering noise + +- **WHEN** two `.docx` files have the same readback text and structure summary but different zip entry ordering +- **THEN** `macdoc docx diff` reports no semantic changes + +### Requirement: Safe output write behaviour + +The system SHALL write to a separate output path by default. The system SHALL fail before writing when manifest validation, path resolution, planning, anchor resolution, or source readback fails. The system SHALL only overwrite an existing output file when the user passes an explicit overwrite option. + +#### Scenario: Existing output requires overwrite option + +- **WHEN** `edited.docx` already exists and the user runs `macdoc docx apply input.docx apply.json --output edited.docx` without the overwrite option +- **THEN** the command exits non-zero and leaves `edited.docx` unchanged + +#### Scenario: Planning failure leaves output absent + +- **WHEN** execution fails because a manifest anchor is missing +- **THEN** the requested output path does not exist after the command exits + +##### Example: missing anchor during apply + +- **GIVEN** `input.docx` contains `Status: Pending` +- **AND** `apply.json` inserts a paragraph after anchor text `Status: Approved` +- **WHEN** the user runs `macdoc docx apply input.docx apply.json --output edited.docx` +- **THEN** the command exits non-zero +- **AND** `edited.docx` does not exist + +### Requirement: Synthetic fixture policy for docx workflow tests + +The system SHALL test docx workflow behaviour using synthetic fixtures created in tests or committed minimal fixtures that contain no private thesis, manuscript, advisor-review, client, or personal document content. + +#### Scenario: Tests build fixtures programmatically + +- **WHEN** a unit test needs a source `.docx` with a placeholder, a paragraph, a table, or an image relationship +- **THEN** the test creates the fixture programmatically or uses a minimal committed fixture with synthetic text + +#### Scenario: Private fixtures are rejected + +- **WHEN** a proposed test fixture contains private thesis, manuscript, advisor-review, client, or personal content +- **THEN** the fixture is out of scope for the docx workflow test suite + +##### Example: rejected private fixture + +- **GIVEN** a proposed fixture file is named `advisor-review-real-manuscript.docx` +- **AND** its paragraphs contain a real student name, advisor comments, or manuscript content +- **WHEN** the fixture is proposed for DocxWorkflowLibTests +- **THEN** the fixture is rejected from the repository test suite diff --git a/openspec/changes/macdoc-docx-workflow-cli/tasks.md b/openspec/changes/macdoc-docx-workflow-cli/tasks.md new file mode 100644 index 0000000..8684185 --- /dev/null +++ b/openspec/changes/macdoc-docx-workflow-cli/tasks.md @@ -0,0 +1,29 @@ +## 1. Package and CLI Surface + +- [ ] 1.1 Update Package.swift to add the DocxWorkflowLib library target, DocxWorkflowLibTests test target, explicit Word/OOXML dependencies, and the MacDocCLI dependency on DocxWorkflowLib for the Importable workflow library boundary. +- [ ] 1.2 Create Sources/DocxWorkflowLib with public DocxManifest, DocxWorkflowPlanner, DocxWorkflowExecutor, DocxWorkflowVerifier, DocxWorkflowDiffer, and typed error models to Keep workflow logic in `DocxWorkflowLib`. +- [ ] 1.3 Add Sources/MacDocCLI/MacDoc+Docx.swift and register it from Sources/MacDocCLI/MacDoc.swift so the Integrated docx command namespace implements Use `macdoc docx` as the Phase 1 product surface. + +## 2. Manifest, Planning, and Safety + +- [ ] 2.1 Implement the JSON Codable manifest contract in DocxWorkflowLib with schemaVersion, workflow, steps, checks, and CLI path override resolution to Use JSON-first Codable manifests. +- [ ] 2.2 Implement Safe output write behaviour with preflight validation, missing-anchor failure before writes, existing-output protection, and explicit overwrite handling. +- [ ] 2.3 Implement DocxOperationPlan generation so Plan reports deterministic operations without writing output and Make planning mandatory inside every execution path. + +## 3. Workflow Execution + +- [ ] 3.1 Implement Build workflow creates new documents from manifest sections, paragraphs, text runs, and tables using word-builder-swift model coverage. +- [ ] 3.2 Implement Patch workflow fills template placeholders with literal `{{name}}` token resolution, zero-match failure, duplicate-match failure, and all-matches opt-in. +- [ ] 3.3 Implement Apply workflow mutates existing documents through ordered steps for replaceText, insertParagraphAfterText, and insertImageAfterText while preserving unchanged OOXML parts. +- [ ] 3.4 Normalize build, patch, and apply plans through shared operation records so implementation preserves Separate build, patch, and apply semantics without creating three unrelated engines. + +## 4. Verification and Diff + +- [ ] 4.1 Implement Verify enforces manifest checks by OOXML readback for containsText, notContainsText, replacementCount, and readbackSucceeds, following Verify by OOXML readback and manifest checks. +- [ ] 4.2 Implement Diff reports Word-aware document changes by comparing readback summaries for paragraph text additions/removals, table count, image relationship count, and field/equation count where available, following Diff at a Word-aware structural level. + +## 5. Tests and Documentation + +- [ ] 5.1 Add DocxWorkflowLibTests covering Synthetic fixture policy for docx workflow tests by generating source/template `.docx` fixtures programmatically or using minimal synthetic committed fixtures only. +- [ ] 5.2 Add MacDocCLITests for `macdoc docx --help`, build, patch, apply, plan no-output behaviour, verify pass/fail, diff output, invalid manifest failure, and existing-output protection. +- [ ] 5.3 Update README.md and CONVERSIONS.md with JSON manifest examples, command examples, non-goals for YAML and dxedit, and guidance on when to use `macdoc docx` instead of `macdoc convert --to docx` or che-word-mcp.