feat(ribocode,ribotish): pyfasta indexes, prefix-scoped outputs, optional ribotish -a by pinin4fjords · Pull Request #11684 · nf-core/modules

pinin4fjords · 2026-05-18T15:08:38Z

Bundles three in-place module changes carried in nf-core/riboseq#174. Each is self-contained and addresses a different pain point we hit running RiboCode / Ribo-TISH at scale.

ribocode/prepare

Pre-build the pyfasta .gdx/.flat indexes for annotation/transcripts_sequence.fa immediately after prepare_transcripts, using the same key_fn RiboCode applies internally (split on first space, otherwise split on |). The stub also touches the two new sidecars.

Why: downstream RiboCode steps open the FASTA with pyfasta, which lazily writes .gdx/.flat next to the input on first read. Under Fusion staging those writes land back at the upstream task's S3 prefix and silently corrupt the staged copy on retries. Building the indexes inside the producing task fixes it.

ribocode/ribocode

Switch the orf_txt and orf_txt_collapsed output globs from *.txt / *_collapsed.txt to ${prefix}.txt / ${prefix}_collapsed.txt so multi-instance publication is unambiguous (*.txt previously matched both the all-ORFs and collapsed files into the same emit). The prefix binding is promoted out of def in both script: and stub: so it resolves at the output-glob stage; the Nextflow 26 strict parser rejects re-declaring the same local with def across the two blocks.

The existing stub assertion that indexed process.out.orf_txt[0][1][0] is corrected to the new single-file shape (process.out.orf_txt[0][1]).

ribotish/predict

Breaking signature change. The third input tuple gains an optional fourth element, reference_gtf, plumbed through to ribotish predict as -a <gtf> when populated:

tuple val(meta3), path(fasta), path(gtf), path(reference_gtf, stageAs: 'secondary.gtf')

Callers must supply a fourth element on every emit. Pass [] for the no-op case (no secondary annotation). The existing test cases in this PR are migrated that way; positive-coverage tests for the populated path will land in a follow-up.

Why: Ribo-TISH's -a argument is the documented hook for layering a secondary annotation (e.g. MANE/RefSeq) on top of the primary GTF, and we want to expose it from the module without a second optional input tuple.

Test plan

All three modules pass under Docker on a c5.9xlarge VM with nf-core 4.0.2 / nextflow 26.04.1 / nf-test 0.9.5:

nf-core modules test --profile docker ribocode/prepare
nf-core modules test --profile docker ribocode/ribocode
nf-core modules test --profile docker ribotish/predict

Snapshot deltas:

ribocode/prepare: non-stub snapshot gains the two new file md5s (transcripts_sequence.fa.flat, transcripts_sequence.fa.gdx); existing files' md5s unchanged.
ribocode/ribocode: orf_outputs snapshot drops the duplicate test_collapsed.txt entry that the old *.txt glob had pulled into orf_txt; everything else unchanged.
ribotish/predict: no snapshot change (the [] migration is a no-op at runtime).

Source: nf-core/riboseq#174

…onal ribotish -a Bundles three in-place module changes carried in nf-core/riboseq#174. ribocode/prepare: pre-build the pyfasta .gdx/.flat indexes for annotation/transcripts_sequence.fa using the same key_fn RiboCode applies internally (split on first space, else split on '|'). Downstream RiboCode tasks otherwise lazily build those sidecars inside the staged input directory, which fails under Fusion staging because writes leak back to the upstream task's S3 prefix. ribocode/ribocode: scope the orf_txt and orf_txt_collapsed output globs to ${prefix}.txt and ${prefix}_collapsed.txt rather than *.txt/*_collapsed.txt so multi-instance publication is unambiguous. The prefix binding is promoted out of `def` in both the script and stub blocks so it resolves at the output-glob stage (Nextflow 26 strict parser rejects redeclaration of the same name across script/stub if either uses `def`). The existing stub-test assertion that indexed orf_txt[0][1][0] is adjusted to the new single-file shape. ribotish/predict: extend the fasta/gtf input tuple with an optional fourth path, reference_gtf, plumbed to ribotish predict as `-a <gtf>` when populated. BREAKING signature change for callers: every emitter must supply a fourth element in the third tuple (use `[]` for the no-op case). Source: nf-core/riboseq#174 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pinin4fjords · 2026-05-18T15:22:10Z

Superseded by the per-module splits:

ribocode (prepare + ribocode): feat(ribocode): pre-build pyfasta indexes + prefix-scoped outputs #11685
ribotish/predict: feat(ribotish/predict): optional reference_gtf input + topics versions + 0.2.8 #11686

Closing this bundled draft. Branch ribocode-ribotish-bundled-fixes will be pruned once the splits land.

…-core#11685) * feat(ribocode): pre-build pyfasta indexes + prefix-scoped outputs Two related changes carried in nf-core/riboseq#174 and split out of the bundled PR nf-core#11684. ribocode/prepare: pre-build the pyfasta `.gdx`/`.flat` indexes for `annotation/transcripts_sequence.fa` immediately after `prepare_transcripts`, using the same `key_fn` RiboCode applies internally (split on first space, otherwise split on `|`). Stub touches the two new sidecars. Why: downstream RiboCode steps open the FASTA with pyfasta, which lazily writes `.gdx`/`.flat` next to the input on first read. Under Fusion staging those writes land back at the upstream task's S3 prefix and silently corrupt the staged copy on retries. Building the indexes inside the producing task fixes it. ribocode/ribocode: switch the `orf_txt` and `orf_txt_collapsed` output globs from `*.txt` / `*_collapsed.txt` to `${prefix}.txt` / `${prefix}_collapsed.txt` so multi-instance publication is unambiguous (`*.txt` previously matched both files into the same emit). The `prefix` binding is promoted out of `def` in both `script:` and `stub:` so it resolves at the output-glob stage; the Nextflow 26 strict parser rejects re-declaring the same local with `def` across both blocks. The existing stub assertion at `process.out.orf_txt[0][1][0]` is corrected to the new single-file shape (`process.out.orf_txt[0][1]`). Source: nf-core/riboseq#174 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ribocode/prepare): reframe pyfasta pre-build comment The lazy pyfasta sidecar write isn't Fusion-specific - it's a Nextflow symlink-staging concern that affects any backend (writes leak back to the producer task's work dir via the staged-input symlink). Rewording the inline comment to match. No code change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ribocode/prepare): use RiboCode's GenomeSeq for pyfasta pre-build Replace the inline 8-line python heredoc (which replicated RiboCode's `get_chrom` key_fn verbatim) with a single `python -c` line that imports and instantiates `RiboCode.prepare_transcripts.GenomeSeq` directly. The class constructor itself runs `Fasta(filename, key_fn=get_chrom)` with the same key function, so we drop the replication while producing byte-identical .gdx/.flat sidecars (md5-verified on the realistic FASTA format prepare_transcripts emits). No snapshot change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s + 0.2.8 (nf-core#11686) * feat(ribotish/predict): add optional secondary reference GTF for -a Carried in nf-core/riboseq#174 and split out of the bundled PR nf-core#11684. **Breaking signature change.** The third input tuple gains an optional fourth element, `reference_gtf`, plumbed through to `ribotish predict` as `-a <gtf>` when populated: tuple val(meta3), path(fasta), path(gtf), path(reference_gtf, stageAs: 'secondary.gtf') Callers must supply a fourth element on every emit. Pass `[]` for the no-op case (no secondary annotation). The existing test cases in this PR are migrated that way; positive-coverage tests for the populated path will land in a follow-up. Why: Ribo-TISH's `-a` argument is the documented hook for layering a secondary annotation (e.g. MANE/RefSeq) on top of the primary GTF, and we want to expose it from the module without a second optional input tuple. Source: nf-core/riboseq#174 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ribotish/predict): optional reference_gtf in its own tuple + topics versions + bump 0.2.8 Three coupled cleanups in response to the lint feedback on PR nf-core#11686: 1. Move the new `reference_gtf` input out of the existing fasta/gtf tuple and into its own optional input tuple (meta7) - the convention this module already uses for `bam_ti`, `candidate_orfs`, `para_ribo`, and `para_ti`. The existing `(meta3, fasta, gtf)` signature is preserved, so callers no longer need to grow that tuple; they wire a separate `Channel.of([[], []])` (or a populated channel) into the new slot. 2. Migrate version reporting from the legacy `versions.yml` heredoc to the new topic-based emission (`tuple val("${task.process}"), val('ribotish'), eval('...'), topic: versions, emit: versions_ribotish`). The `versions.yml` heredoc is removed from both `script:` and `stub:`. `meta.yml` regenerated by `nf-core modules lint --fix` to add the `topics:` block and reshape the `versions_ribotish` output entry. 3. Bump ribotish from 0.2.7 to 0.2.8 (bioconda; build hash unchanged). Test snapshot regenerated under `--update`: versions snapshot key renamed from `versions_*` to `versions_ribotish_*`, version string updated to `0.2.8`. Prediction-table assertions unchanged - 0.2.8 is a patch release. Source: nf-core/riboseq#174 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ribotish/predict): consolidate to one unnamed snapshot per test Per SPPearce's review comment on nf-core#11686: each test should have a single anonymous snapshot() call rather than multiple named ones. Non-stub tests roll `transprofile` + the topic-versions findAll into one snapshot; the existing `predictions` / `all` contains() row checks are kept as separate assertions (they pin specific known-good output rows and aren't redundant with the snapshot). Stub tests roll `predictions` + `all` + `transprofile` + versions into one snapshot. Versions are referenced via the canonical `process.out.findAll { key, val -> key.startsWith('versions') }` pattern (653 modules in nf-core/modules use it vs 53 with explicit `process.out.versions_<tool>`). Snapshot keys are now the test names directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added the size/s label May 18, 2026

This was referenced May 18, 2026

feat(ribocode): pre-build pyfasta indexes + prefix-scoped outputs #11685

Merged

feat(ribotish/predict): optional reference_gtf input + topics versions + 0.2.8 #11686

Merged

pinin4fjords closed this May 18, 2026

pinin4fjords deleted the ribocode-ribotish-bundled-fixes branch May 18, 2026 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ribocode,ribotish): pyfasta indexes, prefix-scoped outputs, optional ribotish -a#11684

feat(ribocode,ribotish): pyfasta indexes, prefix-scoped outputs, optional ribotish -a#11684
pinin4fjords wants to merge 1 commit into
masterfrom
ribocode-ribotish-bundled-fixes

pinin4fjords commented May 18, 2026

Uh oh!

pinin4fjords commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pinin4fjords commented May 18, 2026

ribocode/prepare

ribocode/ribocode

ribotish/predict

Test plan

Uh oh!

pinin4fjords commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant