[MLE-5159] docs(audio-ws): correct response format from WAV to Raw PCM + fix sample voice by rishabh-bhargava · Pull Request #252 · togethercomputer/openapi

rishabh-bhargava · 2026-04-27T07:08:46Z

Summary

The WS docs at https://docs.together.ai/reference/audio-speech-websocket are misleading on three independent points; this PR fixes all three.

Audio format claim — the docs advertise Format: WAV (PCM s16le), but the WS streams raw PCM s16le bytes with no RIFF/WAVE header. A developer who saves the bytes with a .wav extension gets a file that no standard player will open (afplay returns Error: AudioFileOpen failed ('typ?')). Updated to Format: Raw PCM (s16le, mono).
Sample save extension — both Python and JavaScript samples wrote to output.wav. Updated to output.pcm (and the print/console messages match).
Sample voice — both samples used voice=tara, which belongs to Orpheus, not Kokoro. Running the docs sample literally returns immediately with Voice 'tara' is not available for model 'hexgrad/Kokoro-82M'. Available voices: af_heart, .... Updated to voice=af_heart. Also added a session.created guard in the Python sample so a future failure-on-first-event doesn't crash the script with KeyError: 'session' before the user can see what went wrong.

Linear: MLE-5159

Test Plan

Reproduced the original failure: copy-paste docs Python sample → crashes with KeyError: 'session' because the first server event is tts.failed. No output file produced.
After fixing voice + adding the session guard: sample runs end-to-end. Writes 257012 bytes to output.pcm for the three example sentences (≈ 5.35 s of audio at 24 kHz s16le mono).
Wrapped via ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav → plays cleanly via afplay (exit 0). Confirms the fix matches reality.
Empirically confirmed (Cartesia + attempted Kokoro) that the WS adapter never produces RIFF/WAVE-framed bytes — the format claim was wrong for every model on this endpoint, not just Minimax.

Deploy chain

After this PR merges, the existing sync-openapi-spec-to-docs.yml workflow auto-opens a sync PR against togethercomputer/mintlify-docs. Once that secondary PR is approved and merged, Mintlify rebuilds and the changes go live at docs.together.ai.

🤖 Generated with Claude Code

… PCM" The Together WS endpoint streams raw PCM s16le samples with no RIFF/WAVE header, base64-wrapped per audio_output.delta event. The previous "WAV (PCM s16le)" claim led developers to write the bytes to a .wav file and find that no player accepts them (afplay, QuickTime, VLC all reject the file because there is no WAV magic). Updates the audio format description and the two code samples (Python, Node.js) to save to .pcm rather than .wav, matching the actual on-the-wire format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…d event The voice 'tara' belongs to Orpheus, not Kokoro. Kokoro's default voice 'af_heart' is the popular choice and exists in the catalog. Running the sample as written produced an immediate conversation.item.tts.failed (Voice 'tara' is not available for model 'hexgrad/Kokoro-82M'). The Python sample compounded that with an unconditional session_data['session']['id'] access on the first message — when the first message is tts.failed instead of session.created, that crashes with KeyError before any code can react. Added a guard so the sample fails gracefully with the actual error message. JS sample already gated on message.type === 'session.created' so no event-handling change is needed there. Verified end-to-end: with the fixes applied, the sample now writes 257012 bytes (≈ 5.35 s of raw PCM s16le @ 24 kHz mono) to output.pcm. ffmpeg wraps it cleanly and afplay plays it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-27T07:09:25Z

✱ Stainless preview builds

This PR will update the togetherai SDKs with the following commit messages.

go

chore(internal): regenerate SDK with no functional changes

openapi

docs(api): update audio speech websocket format and code samples

python

chore(internal): regenerate SDK with no functional changes

terraform

chore(internal): add together-go SDK dependency, update dependencies

typescript

chore(internal): regenerate SDK with no functional changes

Edit this comment to update them. They will appear in their respective SDK's changelogs.

✅ togetherai-openapi studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅

✅ togetherai-go studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅ → build ⏭️ (prev: build ✅) → lint ✅ → test ❗
go get github.com/stainless-sdks/togetherai-go@18da1221afec14da0abe016bb9bcc4239846ea46

✅ togetherai-python studio · code · diff

Your SDK build had at least one "warning" diagnostic, but this did not represent a regression.
generate ⚠️ → build ⏭️ (prev: build ✅) → lint ⏭️ (prev: lint ✅) → test ⏭️

✅ togetherai-terraform studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅ → lint ✅ → test ✅

✅ togetherai-typescript studio · code · diff

Your SDK build had at least one "warning" diagnostic, but this did not represent a regression.
generate ⚠️ → build ⏭️ (prev: build ✅) → lint ⏭️ (prev: lint ✅) → test ❗

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-04-27 07:11:30 UTC

rishabh-bhargava and others added 2 commits April 27, 2026 00:00

rishabh-bhargava requested review from Nutlope, yadavsahil197 and zainhas April 27, 2026 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLE-5159] docs(audio-ws): correct response format from WAV to Raw PCM + fix sample voice#252

[MLE-5159] docs(audio-ws): correct response format from WAV to Raw PCM + fix sample voice#252
rishabh-bhargava wants to merge 2 commits intomainfrom
fix/MLE-5159-ws-audio-format-docs

rishabh-bhargava commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rishabh-bhargava commented Apr 27, 2026

Summary

Test Plan

Deploy chain

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 27, 2026 •

edited

Loading