Whisper Hinglish

SotA Open Weights Model with Code-switching Capabilities

Jun 18, 2026

Trelis has released Whisper Hinglish Preview, a State of the Art open-weights model to transcribe Hindi with interspersed English.

The model significantly outperforms existing open weights model and approaches the performance of Sarvam and ElevenLabs for mixed script (Roman + Devanagari) transcription. The model is available on Huggingface as whisper-hinglish-preview and via API and UI on Trelis Router.

The key design feature is a <|mixedcode|> token that can be injected after the language token to have the model transcribe using mixed Roman/Latin and Hindi scripts.

Access the model via API or UI with Trelis Router, or find it on Huggingface.

Cheers, Ronan

Trelis Links:

🎙️ Enterprise Voice AI Services (ASR, TTS, Agents)

💼 Join the Trelis Team

💸 Apply for a Trelis Grant

Timestamps:

0:00 Introduction and overview

0:04 Whisper Hinglish model for Hindi-English code-switched transcription

1:06 Code-switch mode feature: mixes Devanagari and Roman scripts

2:12 Pure Hindi transcription performance vs commercial models

2:35 Model trained from Vani base model by Art Park

3:15 Mixed code token requirement for code-switching functionality

3:30 Credits to team members Akhila and Aman

Trelis Releases Whisper Hinglish Transcription Model

Trelis has released a preview of Whisper Hinglish, a transcription model for Hindi and English mixed speech, commonly known as Hinglish. The model is available as an open weights release, allowing anyone to download and run it.

Performance Compared to Commercial Models

Trelis states the model performs at the state of the art level among open weights models for transcribing Hinglish. The company reports the model is competitive with commercial alternatives including Eleven Labs’ Scribe V2 and Sarvam’s offering.

Access and Availability

The model can be found on Hugging Face at Trelis/Whisper-Hinglish-Preview. Users can test the model through two methods:

Via the web interface at router.Trelis.com, where it appears in the list of Trelis models
Through API access with free monthly requests available

Code Switch Functionality

The model includes a code switch mode that handles mixed language transcription differently from standard transcription models. Standard transcription produces a single script type: Roman script for English or Devanagari script for Hindi. The code switch mode transcribes Hindi words in Devanagari script while rendering English loanwords in Roman script, reflecting how Hinglish is actually spoken.

Benchmark Results on Hinglish Audio

For audio containing both Hindi and English loanwords, Trelis evaluated the model on several benchmarks:

Koshi benchmark: Word error rate approaches Sarvam and Scribe V2 performance levels

Code Switch FLIR: The model performs ahead of Sarvam’s code mixing functionality and trails Scribe V2

HIACC adult and child datasets: The model scores ahead of both Sarvam and Scribe V2

Performance on Pure Hindi

While designed primarily for mixed code transcription, the model also handles pure Hindi audio. On pure Hindi transcription, the model achieves word error rates close to Sarvam and Scribe performance levels. On FLIR Hindi, performance is comparable to both commercial models.

English Transcription Capability

The model was trained from a base whisper model, specifically the Vani model from Art Park. The Vani model provides reasonable Hindi performance but loses some English capability. The Trelis model maintains strength in both Hindi and English transcription, though it scores somewhat lower than Sarvam and Scribe V2 on English-only benchmarks.

Technical Implementation

To use the model, users can download it and run it locally or deploy it to cloud environments. The code switch functionality requires a specific implementation detail: users must pass the “mixed code” token after specifying the language. Without this token, the model will transcribe Hindi entirely in Hindi script or English entirely in Roman script. With the token, the model outputs each word in its native script.

Development Credits

Akila led the model development work. Amon contributed to the early stages of data preparation for the project.

Model Design Focus

The primary design goal centers on mixed code transcription rather than pure Hindi or pure English performance. This focus reflects the actual usage patterns of Hinglish speakers who naturally incorporate English loanwords into Hindi speech.

Trelis Research

Discussion about this post

Ready for more?