Skip to content

feat!: major performance & accuracy improvements in speech-to-text module#1132

Open
IgorSwat wants to merge 10 commits into
mainfrom
@is/speech-to-text-ultimate
Open

feat!: major performance & accuracy improvements in speech-to-text module#1132
IgorSwat wants to merge 10 commits into
mainfrom
@is/speech-to-text-ultimate

Conversation

@IgorSwat
Copy link
Copy Markdown
Contributor

@IgorSwat IgorSwat commented May 8, 2026

Description

This PR introduces several changes to the speech-to-text module based on Whisper models:

  • CoreML integration - models re-exported to CoreML backend, bringing significant performance upgrade for iOS devices.
  • New streaming algorithm - eliminates duplicates in streaming output, resulting in a major quality improvement of the live streaming mode.
  • Changes in demo apps: removed faulty 'voice mode' screen in LLM demo app, refactored speech to text screen in 'speech' app by adding new CoreML models to selection bar and changing the default model for iOS devices.
  • Minor code improvements in speech-to-text module

Introduces a breaking change?

  • Yes
  • No

Change: removes predefined constants for quantized models.
Justification: the quantized models differ very slightly from the original ones, introducing unnecessary complexity in this case.

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Run demo app to test the live streaming mode.

Screenshots

Related issues

#1124

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@IgorSwat IgorSwat requested review from benITo47, chmjkb and msluszniak May 8, 2026 08:26
@IgorSwat IgorSwat added model Issues related to exporting, improving, fixing ML models improvement PRs or issues focused on improvements in the current codebase labels May 8, 2026
Comment thread apps/speech/screens/SpeechToTextScreen.tsx Outdated
Comment thread apps/speech/package.json Outdated
Comment thread packages/react-native-executorch/src/constants/modelUrls.ts
@IgorSwat IgorSwat changed the title feat: major performance & accuracy improvements in speech-to-text module feat!: major performance & accuracy improvements in speech-to-text module May 8, 2026
@msluszniak
Copy link
Copy Markdown
Member

Also if this PR adds breaking change, please describe it directly below Introduces a breaking change? section in PR body.

@IgorSwat IgorSwat force-pushed the @is/speech-to-text-ultimate branch from c5d3c14 to a91344c Compare May 19, 2026 11:17
@msluszniak
Copy link
Copy Markdown
Member

Side note, after merging PR with TTS and rebasing, please make sure that native tests works here after all changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement PRs or issues focused on improvements in the current codebase model Issues related to exporting, improving, fixing ML models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants