Back to Blog

Building a Multilingual Meeting Notes System for Indian Business: OdiaMeet

How I built a system that transcribes meetings in 10 Indian languages including code-mixed conversations, and generates clean English minutes of meeting at near-zero cost.

5 min read

Building a Multilingual Meeting Notes System for Indian Business: OdiaMeet

Indian business meetings don't happen in one language. They happen in Hindi and English and the regional language of whoever's on the call, often switching mid-sentence. "Sales target toh achieve ho gaya, but next quarter ra plan ki heba?" That sentence mixes Hindi, English, and Odia. Any transcription system trained on monolingual data will mangle it.

This is the problem OdiaMeet solves. I built it to record Google Meet sessions, transcribe across 10 Indian languages including code-mixed conversations, and generate clean, structured English minutes of meeting.

The Real Problem With Existing Tools

Otter.ai, Fireflies, Google's built-in transcription — these tools work reasonably well for English-dominant meetings. In an Indian business context, they produce transcripts that look like this:

"Sales target toh [inaudible] but next quarter [inaudible] plan [inaudible]"

The "inaudible" markers aren't audio quality issues. The model genuinely cannot process the Odia and Hindi sections. You end up with a transcript that captures the English words and loses the actual discussion.

For meetings where decisions are made — and decisions in Indian business are frequently made in code-mixed conversation — this is a meaningful failure.

Architecture Overview

Google Meet → Chrome Extension → Audio Stream
                                       │
                              ┌────────▼────────┐
                              │   Sarvam AI     │
                              │  (Indic ASR)    │
                              │  10 languages   │
                              └────────┬────────┘
                                       │
                              Raw transcript (multilingual)
                                       │
                              ┌────────▼────────┐
                              │   DeepSeek      │
                              │  (MoM Format)   │
                              │  Structure +    │
                              │  Summarize      │
                              └────────┬────────┘
                                       │
                              Structured English MoM
                              (Summary, Decisions,
                               Action Items, Next Steps)

Three components: a Chrome extension for recording, Sarvam AI for transcription, DeepSeek for formatting.

Why Sarvam AI for Indic Language Transcription

Sarvam AI is a Bengaluru-based AI company that has invested specifically in Indic language models. Their ASR (Automatic Speech Recognition) system is trained on Indian language data — including code-mixed speech — rather than adapted from monolingual English models.

The languages supported: Odia, Hindi, Kannada, Tamil, Telugu, Bengali, Gujarati, Malayalam, Marathi, and Punjabi.

The critical capability is code-mixed transcription. The model understands that a single sentence can contain words from three languages and transcribes them as a coherent unit rather than treating language switches as audio anomalies.

Sarvam provides free trial credits sufficient for a meaningful pilot. For production use, the pricing is consumption-based and reasonable for business use cases.

Why DeepSeek for MoM Formatting

Once you have a raw transcript — potentially in multiple languages, with filler words, cross-talk, and repetition — you need to turn it into a useful document. This is a formatting and summarization task, not a transcription task.

DeepSeek is an excellent choice here for two reasons: quality and cost.

On quality: formatting a meeting transcript into structured output (summary, decisions, action items with assignees, next steps) is a well-defined task that DeepSeek handles reliably. You're not asking it to reason about novel problems — you're asking it to extract and organize information that's already in the transcript.

On cost: DeepSeek's API pricing is a fraction of GPT-4 or Claude for this type of task. A typical 60-minute meeting transcript produces roughly 5,000-8,000 tokens of text. At DeepSeek's rates, you're spending pennies per meeting. For a business running 100 meetings a month, the monthly cost for MoM formatting is under $2.

The MoM prompt is structured to extract:

Given this meeting transcript, generate:

1. SUMMARY (2-3 sentences)
2. KEY DISCUSSION POINTS (bullet list)
3. DECISIONS MADE (numbered list with context)
4. ACTION ITEMS (assignee + task + deadline if mentioned)
5. NEXT STEPS / FOLLOW-UP MEETINGS

The Chrome Extension

The extension integrates with Google Meet's audio stream via the Web Audio API. It buffers audio in segments, sends them to Sarvam's transcription API, and accumulates the raw transcript. At meeting end (or on demand), it packages the transcript and sends it to DeepSeek for MoM generation.

The extension interface is minimal: a small recording indicator, a "Generate MoM" button, and a copy-to-clipboard output. No backend required — the extension calls Sarvam and DeepSeek APIs directly using user-provided API keys stored in Chrome's secure storage.

The Code-Mixed Language Challenge

Code-mixing isn't just vocabulary borrowing. It involves grammatical patterns from multiple languages applied to a single utterance. "Sales target toh achieve ho gaya, but next quarter ra plan ki heba?" uses:

  • Hindi grammatical particles (toh, ho gaya)
  • English vocabulary (sales target, achieve, next quarter, plan)
  • Odia grammatical structure (ra, ki heba)

A model that processes language word by word, assigning each word to a single language, fails at this. Sarvam's approach treats the utterance holistically — the model learned from actual code-mixed Indian speech data, so it has seen these patterns.

The resulting transcript isn't perfect, but it's dramatically more useful than what general-purpose ASR produces.

Cost Analysis

ComponentCost
Sarvam AI transcriptionFree trial credits; ~$0.01-0.02/minute at scale
DeepSeek MoM formatting~$0.002-0.005 per meeting
Chrome ExtensionFree (self-hosted)
Total per 100 meetings/monthUnder $5

Compare this to Otter.ai Business at $20/user/month, or Fireflies Business at $19/user/month — and neither of those handles Indic language transcription accurately.

Phase 2: The AI Meeting Assistant

The next iteration stores all transcripts in a searchable index. An AI assistant layer lets you query across meetings: "What did we decide about the Bangalore vendor in Q1?" or "What action items does Priya have outstanding?"

This turns your meeting archive from a document graveyard into institutional memory.

If you're building communication infrastructure for a team that works in Indian languages — or any multilingual context — this architecture is directly applicable. Let's talk.

Discussion

Loading…