Whisper Hinglish
SotA Open Weights Model with Code-switching Capabilities
Trelis has released Whisper Hinglish Preview, a State of the Art open-weights model to transcribe Hindi with interspersed English.
The model significantly outperforms existing open weights model and approaches the performance of Sarvam and ElevenLabs for mixed script (Roman + Devanagari) transcription. The model is available on Huggingface as whisper-hinglish-preview and via API and UI on Trelis Router.
The key design feature is a <|mixedcode|> token that can be injected after the language token to have the model transcribe using mixed Roman/Latin and Hindi scripts.
Access the model via API or UI with Trelis Router, or find it on Huggingface.
Cheers, Ronan
Trelis Links:
🎙️ Enterprise Voice AI Services (ASR, TTS, Agents)
Timestamps:
0:00 Introduction and overview
0:04 Whisper Hinglish model for Hindi-English code-switched transcription
1:06 Code-switch mode feature: mixes Devanagari and Roman scripts
2:12 Pure Hindi transcription performance vs commercial models
2:35 Model trained from Vani base model by Art Park
3:15 Mixed code token requirement for code-switching functionality
3:30 Credits to team members Akhila and Aman
Trelis Releases Whisper Hinglish Transcription Model
Trelis has released a preview of Whisper Hinglish, a transcription model for Hindi and English mixed speech, commonly known as Hinglish. The model is available as an open weights release, allowing anyone to download and run it.
Performance Compared to Commercial Models
Trelis states the model performs at the state of the art level among open weights models for transcribing Hinglish. The company reports the model is competitive with commercial alternatives including Eleven Labs’ Scribe V2 and Sarvam’s offering.
Access and Availability
The model can be found on Hugging Face at Trelis/Whisper-Hinglish-Preview. Users can test the model through two methods:
Via the web interface at router.Trelis.com, where it appears in the list of Trelis models
Through API access with free monthly requests available
Code Switch Functionality
The model includes a code switch mode that handles mixed language transcription differently from standard transcription models. Standard transcription produces a single script type: Roman script for English or Devanagari script for Hindi. The code switch mode transcribes Hindi words in Devanagari script while rendering English loanwords in Roman script, reflecting how Hinglish is actually spoken.
Benchmark Results on Hinglish Audio
For audio containing both Hindi and English loanwords, Trelis evaluated the model on several benchmarks:
Koshi benchmark: Word error rate approaches Sarvam and Scribe V2 performance levels
Code Switch FLIR: The model performs ahead of Sarvam’s code mixing functionality and trails Scribe V2
HIACC adult and child datasets: The model scores ahead of both Sarvam and Scribe V2
Performance on Pure Hindi
While designed primarily for mixed code transcription, the model also handles pure Hindi audio. On pure Hindi transcription, the model achieves word error rates close to Sarvam and Scribe performance levels. On FLIR Hindi, performance is comparable to both commercial models.
English Transcription Capability
The model was trained from a base whisper model, specifically the Vani model from Art Park. The Vani model provides reasonable Hindi performance but loses some English capability. The Trelis model maintains strength in both Hindi and English transcription, though it scores somewhat lower than Sarvam and Scribe V2 on English-only benchmarks.
Technical Implementation
To use the model, users can download it and run it locally or deploy it to cloud environments. The code switch functionality requires a specific implementation detail: users must pass the “mixed code” token after specifying the language. Without this token, the model will transcribe Hindi entirely in Hindi script or English entirely in Roman script. With the token, the model outputs each word in its native script.
Development Credits
Akila led the model development work. Amon contributed to the early stages of data preparation for the project.
Model Design Focus
The primary design goal centers on mixed code transcription rather than pure Hindi or pure English performance. This focus reflects the actual usage patterns of Hinglish speakers who naturally incorporate English loanwords into Hindi speech.

