Add PromoTech promoter detection transform#1
Open
phy0x1a79ed wants to merge 7 commits into
Open
Conversation
- Add promotech.py transform: two-step workflow (parse 40nt windows → RF-HOT predict) - Add promotech.oci container reference (ghcr.io/phy0x1a79ed/promotech:1.0) - Add functional_annotation::promotech_predictions type definition (TSV output) - Register promotech.oci in container types and indexes - Register promotech.py in functionalAnnotation transform index Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Local testing on a 3.7 MB assembly showed: - Peak RSS: ~30 GB (parse step) - Wall time: ~43 min (11 min parse + 32 min predict) Bump memory from 16 GB → 32 GB and duration from 2h → 4h to accommodate larger assemblies with safety margin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Missing from _metadata/types/containers.yml caused DataInstanceLibrary to fail resolving containers::promotech.oci at load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Metasmith library loader auto-sorted index entries and normalized YAML formatting when loading the libraries during workflow generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PromoTech needs CWD=/opt/promotech for relative model paths, but iasm.container is relative to /ws (the container workdir bind). Prefix with /ws/ to make the path absolute inside the container. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bounce script writes exitcode.* to CWD after each ExecWithContainer. After 'cd /opt/promotech', CWD is inside the read-only container image, causing 'Read-only file system' errors. Restore CWD to /ws (writable workdir bind) after each step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous full run: 526/2844 succeeded at 32 GB base. Many OOM kills on larger assemblies (peak RSS hit 32 GB on 3.7 MB assemblies; larger ones need more). 48 GB base with Metasmith auto-retry at 96 GB should cover all assemblies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
promotech.pyMetasmith transform that runs PromoTech (RF-HOT model) bacterial promoter detection on assembly FASTA filespromotech.ocicontainer reference pointing toghcr.io/phy0x1a79ed/promotech:1.0functional_annotation::promotech_predictionstype definition (TSV output with BED-like promoter coordinates)functionalAnnotationandlogisticstransform librariesContext
Part of the cyanoverse/run-promotech task — running PromoTech across ~2,844 cyanobacterial assemblies on the fir HPC cluster via Metasmith workflows.
Test plan
sequences::assembly → functional_annotation::promotech_predictionspath--test 3on HPC to confirm container pulls and transform executes🤖 Generated with Claude Code