Skip to content

Switch LFS backend from GitHub LFS to lfs.dimensionalos.com#2022

Open
spomichter wants to merge 2 commits intolegacy-dev-dont-mergefrom
lfs-server-cutover
Open

Switch LFS backend from GitHub LFS to lfs.dimensionalos.com#2022
spomichter wants to merge 2 commits intolegacy-dev-dont-mergefrom
lfs-server-cutover

Conversation

@spomichter
Copy link
Copy Markdown
Contributor

Summary

  • Adds .lfsconfig pointing git lfs at our self-hosted server lfs.dimensionalos.com (see dimensionalOS/dimensional-lfs) — a giftless instance on EC2 that signs presigned URLs to s3://dimos-github-lfs/lfs/dimensionalOS/dimos/<sha>. We're cutting over from GitHub LFS to escape its bandwidth budget cap.
  • All 75 LFS objects already-migrated to the new S3 layout (verified end-to-end with a roundtrip SHA check against cafe.jpg.tar.gzb8cf30...7603).
  • Adds dimos/utils/test_lfs.py with three batch-API smoke tests so CI gives a clear "LFS server X is broken" signal rather than the looser "git lfs pull failed".

What changes for users

Before After
git lfs pull GitHub LFS, counted against repo's LFS bandwidth budget lfs.dimensionalos.com -> S3, no budget
git lfs push GitHub LFS giftless validates a GitHub PAT against dimensionalOS/dimos write permission, then signs an S3 PUT
Auth surface GitHub-managed Same model: anyone can pull, only repo collaborators can push

For pull-only consumers, nothing changesgit lfs pull is still anonymous.

For push, devs need their git credential helper to supply a GitHub PAT with repo scope (or fine-grained PAT with read/write contents on this repo). gh auth token works.

Test plan

  • CI green on dev rebase
  • dimos/utils/test_lfs.py (new) passes — exercises the LFS server's batch API directly
  • dimos/utils/test_data.py::test_pull_file (existing slow test) passes — exercises the full git lfs pull -> smudge -> decompress pipeline against the new server
  • After merge, do a manual git lfs push from a fresh checkout to confirm the GitHub-PAT auth path works end-to-end (can't fully test in CI since CI doesn't push)

Rollback

If the LFS server has issues post-merge, revert this PR. git lfs pull will go back to GitHub LFS automatically — the LFS objects still exist there from before this migration. We haven't disabled or pruned GitHub LFS storage.

Related

  • Server source: dimensionalOS/dimensional-lfs (private)
  • Migration script: ran one-shot from local LFS cache; documented in the dimensional-lfs README.

Adds .lfsconfig pointing `git lfs` at our self-hosted giftless instance
(see dimensionalOS/dimensional-lfs) backed by S3. All 75 LFS objects
were migrated to the new layout before this change merged.

Effect:
- `git lfs pull` (no auth) — matches GitHub LFS public-repo behavior
- `git lfs push` — requires a GitHub PAT with write access to this repo,
  validated by giftless against the GitHub API

Adds dimos/utils/test_lfs.py with three direct batch-API smoke tests
(reachability, anonymous-write rejection, known-object roundtrip) so a
broken LFS server gives a clearer signal in CI than a smudge failure.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 8, 2026

Greptile Summary

This PR switches the Git LFS backend from GitHub LFS to a self-hosted giftless instance (lfs.dimensionalos.com) backed by S3, escaping GitHub's bandwidth cap. It adds a two-line .lfsconfig and a new batch-API smoke test file to give CI a clear signal when the LFS server is unhealthy.

  • .lfsconfig — minimal, correctly formatted; URL matches the batch endpoint used in tests and the PR description.
  • dimos/utils/test_lfs.py — three @pytest.mark.slow tests covering anonymous download (presigned URL shape), anonymous upload rejection, and a SHA-verified roundtrip download; marked correctly so they don't run in fast CI passes.

Confidence Score: 5/5

Safe to merge — the change is a two-line config redirect and three network smoke tests with no impact on application logic.

Both changed files are additive and self-contained: .lfsconfig is a standard git config that only affects LFS transport, and the test file only talks to the new LFS server. The rollback path is clean because GitHub LFS storage is preserved. No application code is touched.

No files require special attention beyond the minor diagnostic-quality note in dimos/utils/test_lfs.py.

Important Files Changed

Filename Overview
.lfsconfig New config file pointing git LFS at self-hosted giftless server; correct INI format and URL matches the batch-API base used in the smoke tests.
dimos/utils/test_lfs.py Three batch-API smoke tests for the new LFS backend; overall sound, but test_known_object_roundtrip accesses obj["actions"]["download"]["href"] without a guarding assertion, so an LFS-spec object-level error yields a KeyError rather than a clear diagnostic.

Sequence Diagram

sequenceDiagram
    participant Client as git lfs / test
    participant LFS as lfs.dimensionalos.com (giftless)
    participant GitHub as api.github.com
    participant S3 as s3://dimos-github-lfs

    Note over Client,S3: Anonymous download
    Client->>LFS: POST /dimensionalOS/dimos/objects/batch
    LFS-->>Client: 200 presigned-S3-URL
    Client->>S3: GET presigned-S3-URL
    S3-->>Client: object bytes

    Note over Client,S3: Authenticated upload
    Client->>LFS: POST /dimensionalOS/dimos/objects/batch + GitHub PAT
    LFS->>GitHub: Verify PAT write permission
    GitHub-->>LFS: 200 OK
    LFS-->>Client: 200 presigned-S3-PUT-URL
    Client->>S3: PUT presigned-S3-PUT-URL
    S3-->>Client: 200 OK

    Note over Client,S3: Unauthenticated upload blocked
    Client->>LFS: POST /dimensionalOS/dimos/objects/batch (no auth)
    LFS-->>Client: 403 Forbidden
Loading

Reviews (2): Last reviewed commit: "Merge branch 'dev' into lfs-server-cutov..." | Re-trigger Greptile

Comment thread dimos/utils/test_lfs.py
Comment on lines +67 to +71
@pytest.mark.slow
def test_anonymous_upload_is_forbidden():
"""An unauthenticated upload returns 403 — only repo collaborators can push."""
response = _batch("upload", "0" * 64, 1)
assert response.status_code == 403, response.text
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The test pins 403, but HTTP semantics permit a server to return 401 for unauthenticated requests (signalling "please authenticate") rather than 403 ("authenticated but not permitted"). Giftless may return either code depending on configuration. Accepting both makes the test resilient to that variation without weakening the assertion that anonymous upload is rejected.

Suggested change
@pytest.mark.slow
def test_anonymous_upload_is_forbidden():
"""An unauthenticated upload returns 403 — only repo collaborators can push."""
response = _batch("upload", "0" * 64, 1)
assert response.status_code == 403, response.text
@pytest.mark.slow
def test_anonymous_upload_is_forbidden():
"""An unauthenticated upload returns 401/403 — only repo collaborators can push."""
response = _batch("upload", "0" * 64, 1)
assert response.status_code in (401, 403), response.text

Comment thread dimos/utils/test_lfs.py
Comment on lines +63 to +64
href = obj["actions"]["download"]["href"]
assert href.startswith("https://dimos-github-lfs.s3"), href
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The startswith("https://dimos-github-lfs.s3") check couples the test to the virtual-hosted S3 URL style. AWS presigned URLs can also be path-style (https://s3.amazonaws.com/dimos-github-lfs/…), and the format may also include a regional subdomain variant. Checking for the bucket name anywhere in the URL is less brittle while still confirming the object came from the right bucket.

Suggested change
href = obj["actions"]["download"]["href"]
assert href.startswith("https://dimos-github-lfs.s3"), href
href = obj["actions"]["download"]["href"]
assert "dimos-github-lfs" in href, href

Comment thread dimos/utils/test_lfs.py
Comment on lines +35 to +37

def _batch(operation: str, oid: str, size: int, *, auth=None):
return requests.post(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 auth parameter is defined but never exercised in any test

_batch accepts an auth kwarg intended for the authenticated-push path, but no test currently passes a real credential. If the giftless auth handler (PAT → GitHub permission check → S3 PUT signing) regresses, none of these tests would catch it. Consider adding a pytest.mark.slow test that uses a read-only test PAT (stored as a CI secret) to exercise at least the upload batch request, so the push auth path has some CI coverage.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Comment thread dimos/utils/test_lfs.py

@pytest.mark.slow
def test_anonymous_upload_is_forbidden():
"""An unauthenticated upload returns 403 — only repo collaborators can push."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a normal dev process for a third party contributor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants