[MLE-5159] docs(audio-ws): correct response format from WAV to Raw PCM + fix sample voice#252
[MLE-5159] docs(audio-ws): correct response format from WAV to Raw PCM + fix sample voice#252rishabh-bhargava wants to merge 2 commits intomainfrom
Conversation
… PCM" The Together WS endpoint streams raw PCM s16le samples with no RIFF/WAVE header, base64-wrapped per audio_output.delta event. The previous "WAV (PCM s16le)" claim led developers to write the bytes to a .wav file and find that no player accepts them (afplay, QuickTime, VLC all reject the file because there is no WAV magic). Updates the audio format description and the two code samples (Python, Node.js) to save to .pcm rather than .wav, matching the actual on-the-wire format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d event The voice 'tara' belongs to Orpheus, not Kokoro. Kokoro's default voice 'af_heart' is the popular choice and exists in the catalog. Running the sample as written produced an immediate conversation.item.tts.failed (Voice 'tara' is not available for model 'hexgrad/Kokoro-82M'). The Python sample compounded that with an unconditional session_data['session']['id'] access on the first message — when the first message is tts.failed instead of session.created, that crashes with KeyError before any code can react. Added a guard so the sample fails gracefully with the actual error message. JS sample already gated on message.type === 'session.created' so no event-handling change is needed there. Verified end-to-end: with the fixes applied, the sample now writes 257012 bytes (≈ 5.35 s of raw PCM s16le @ 24 kHz mono) to output.pcm. ffmpeg wraps it cleanly and afplay plays it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✱ Stainless preview buildsThis PR will update the go openapi python terraform typescript Edit this comment to update them. They will appear in their respective SDK's changelogs. ✅ togetherai-openapi studio · code · diff
✅ togetherai-go studio · code · diff
✅ togetherai-python studio · code · diff
✅ togetherai-terraform studio · code · diff
✅ togetherai-typescript studio · code · diff
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
Summary
The WS docs at https://docs.together.ai/reference/audio-speech-websocket are misleading on three independent points; this PR fixes all three.
Format: WAV (PCM s16le), but the WS streams raw PCM s16le bytes with no RIFF/WAVE header. A developer who saves the bytes with a.wavextension gets a file that no standard player will open (afplayreturnsError: AudioFileOpen failed ('typ?')). Updated toFormat: Raw PCM (s16le, mono).output.wav. Updated tooutput.pcm(and the print/console messages match).voice=tara, which belongs to Orpheus, not Kokoro. Running the docs sample literally returns immediately withVoice 'tara' is not available for model 'hexgrad/Kokoro-82M'. Available voices: af_heart, .... Updated tovoice=af_heart. Also added asession.createdguard in the Python sample so a future failure-on-first-event doesn't crash the script withKeyError: 'session'before the user can see what went wrong.Linear: MLE-5159
Test Plan
KeyError: 'session'because the first server event istts.failed. No output file produced.output.pcmfor the three example sentences (≈ 5.35 s of audio at 24 kHz s16le mono).ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav→ plays cleanly viaafplay(exit 0). Confirms the fix matches reality.Deploy chain
After this PR merges, the existing
sync-openapi-spec-to-docs.ymlworkflow auto-opens a sync PR againsttogethercomputer/mintlify-docs. Once that secondary PR is approved and merged, Mintlify rebuilds and the changes go live at docs.together.ai.🤖 Generated with Claude Code