Skip to content

Add PromoTech promoter detection transform#1

Open
phy0x1a79ed wants to merge 7 commits into
mainfrom
feat/promotech
Open

Add PromoTech promoter detection transform#1
phy0x1a79ed wants to merge 7 commits into
mainfrom
feat/promotech

Conversation

@phy0x1a79ed
Copy link
Copy Markdown
Owner

Summary

  • Adds a new promotech.py Metasmith transform that runs PromoTech (RF-HOT model) bacterial promoter detection on assembly FASTA files
  • Adds promotech.oci container reference pointing to ghcr.io/phy0x1a79ed/promotech:1.0
  • Adds functional_annotation::promotech_predictions type definition (TSV output with BED-like promoter coordinates)
  • Updates all relevant indexes and type registrations across functionalAnnotation and logistics transform libraries

Context

Part of the cyanoverse/run-promotech task — running PromoTech across ~2,844 cyanobacterial assemblies on the fir HPC cluster via Metasmith workflows.

Test plan

  • Verify Metasmith solver resolves sequences::assembly → functional_annotation::promotech_predictions path
  • Run --test 3 on HPC to confirm container pulls and transform executes
  • Verify output TSVs contain expected columns (chrom, start, end, score, strand, sequence)

🤖 Generated with Claude Code

phy0x1a79ed and others added 7 commits March 19, 2026 18:12
- Add promotech.py transform: two-step workflow (parse 40nt windows → RF-HOT predict)
- Add promotech.oci container reference (ghcr.io/phy0x1a79ed/promotech:1.0)
- Add functional_annotation::promotech_predictions type definition (TSV output)
- Register promotech.oci in container types and indexes
- Register promotech.py in functionalAnnotation transform index

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Local testing on a 3.7 MB assembly showed:
- Peak RSS: ~30 GB (parse step)
- Wall time: ~43 min (11 min parse + 32 min predict)

Bump memory from 16 GB → 32 GB and duration from 2h → 4h to
accommodate larger assemblies with safety margin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Missing from _metadata/types/containers.yml caused DataInstanceLibrary
to fail resolving containers::promotech.oci at load time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Metasmith library loader auto-sorted index entries and normalized
YAML formatting when loading the libraries during workflow generation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PromoTech needs CWD=/opt/promotech for relative model paths, but
iasm.container is relative to /ws (the container workdir bind).
Prefix with /ws/ to make the path absolute inside the container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bounce script writes exitcode.* to CWD after each ExecWithContainer.
After 'cd /opt/promotech', CWD is inside the read-only container image,
causing 'Read-only file system' errors. Restore CWD to /ws (writable
workdir bind) after each step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous full run: 526/2844 succeeded at 32 GB base. Many OOM kills on
larger assemblies (peak RSS hit 32 GB on 3.7 MB assemblies; larger ones
need more). 48 GB base with Metasmith auto-retry at 96 GB should cover
all assemblies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant