Skip to content

[MLE-5159] docs(audio-ws): correct response format from WAV to Raw PCM + fix sample voice#252

Open
rishabh-bhargava wants to merge 2 commits intomainfrom
fix/MLE-5159-ws-audio-format-docs
Open

[MLE-5159] docs(audio-ws): correct response format from WAV to Raw PCM + fix sample voice#252
rishabh-bhargava wants to merge 2 commits intomainfrom
fix/MLE-5159-ws-audio-format-docs

Conversation

@rishabh-bhargava
Copy link
Copy Markdown
Contributor

Summary

The WS docs at https://docs.together.ai/reference/audio-speech-websocket are misleading on three independent points; this PR fixes all three.

  1. Audio format claim — the docs advertise Format: WAV (PCM s16le), but the WS streams raw PCM s16le bytes with no RIFF/WAVE header. A developer who saves the bytes with a .wav extension gets a file that no standard player will open (afplay returns Error: AudioFileOpen failed ('typ?')). Updated to Format: Raw PCM (s16le, mono).
  2. Sample save extension — both Python and JavaScript samples wrote to output.wav. Updated to output.pcm (and the print/console messages match).
  3. Sample voice — both samples used voice=tara, which belongs to Orpheus, not Kokoro. Running the docs sample literally returns immediately with Voice 'tara' is not available for model 'hexgrad/Kokoro-82M'. Available voices: af_heart, .... Updated to voice=af_heart. Also added a session.created guard in the Python sample so a future failure-on-first-event doesn't crash the script with KeyError: 'session' before the user can see what went wrong.

Linear: MLE-5159

Test Plan

  • Reproduced the original failure: copy-paste docs Python sample → crashes with KeyError: 'session' because the first server event is tts.failed. No output file produced.
  • After fixing voice + adding the session guard: sample runs end-to-end. Writes 257012 bytes to output.pcm for the three example sentences (≈ 5.35 s of audio at 24 kHz s16le mono).
  • Wrapped via ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav → plays cleanly via afplay (exit 0). Confirms the fix matches reality.
  • Empirically confirmed (Cartesia + attempted Kokoro) that the WS adapter never produces RIFF/WAVE-framed bytes — the format claim was wrong for every model on this endpoint, not just Minimax.

Deploy chain

After this PR merges, the existing sync-openapi-spec-to-docs.yml workflow auto-opens a sync PR against togethercomputer/mintlify-docs. Once that secondary PR is approved and merged, Mintlify rebuilds and the changes go live at docs.together.ai.

🤖 Generated with Claude Code

rishabh-bhargava and others added 2 commits April 27, 2026 00:00
… PCM"

The Together WS endpoint streams raw PCM s16le samples with no RIFF/WAVE
header, base64-wrapped per audio_output.delta event. The previous
"WAV (PCM s16le)" claim led developers to write the bytes to a .wav
file and find that no player accepts them (afplay, QuickTime, VLC all
reject the file because there is no WAV magic).

Updates the audio format description and the two code samples
(Python, Node.js) to save to .pcm rather than .wav, matching the
actual on-the-wire format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d event

The voice 'tara' belongs to Orpheus, not Kokoro. Kokoro's default
voice 'af_heart' is the popular choice and exists in the catalog.
Running the sample as written produced an immediate
conversation.item.tts.failed (Voice 'tara' is not available for
model 'hexgrad/Kokoro-82M').

The Python sample compounded that with an unconditional
session_data['session']['id'] access on the first message — when
the first message is tts.failed instead of session.created, that
crashes with KeyError before any code can react. Added a guard so
the sample fails gracefully with the actual error message.

JS sample already gated on message.type === 'session.created' so
no event-handling change is needed there.

Verified end-to-end: with the fixes applied, the sample now writes
257012 bytes (≈ 5.35 s of raw PCM s16le @ 24 kHz mono) to output.pcm.
ffmpeg wraps it cleanly and afplay plays it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

✱ Stainless preview builds

This PR will update the togetherai SDKs with the following commit messages.

go

chore(internal): regenerate SDK with no functional changes

openapi

docs(api): update audio speech websocket format and code samples

python

chore(internal): regenerate SDK with no functional changes

terraform

chore(internal): add together-go SDK dependency, update dependencies

typescript

chore(internal): regenerate SDK with no functional changes

Edit this comment to update them. They will appear in their respective SDK's changelogs.

togetherai-openapi studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅

togetherai-go studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅build ⏭️ (prev: build ✅) → lint ✅test ❗

go get github.com/stainless-sdks/togetherai-go@18da1221afec14da0abe016bb9bcc4239846ea46
togetherai-python studio · code · diff

Your SDK build had at least one "warning" diagnostic, but this did not represent a regression.
generate ⚠️build ⏭️ (prev: build ✅) → lint ⏭️ (prev: lint ✅) → test ⏭️

togetherai-terraform studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅lint ✅test ✅

togetherai-typescript studio · code · diff

Your SDK build had at least one "warning" diagnostic, but this did not represent a regression.
generate ⚠️build ⏭️ (prev: build ✅) → lint ⏭️ (prev: lint ✅) → test ❗


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-04-27 07:11:30 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant