Plavno developed Project Khutba — an innovative mobile and web platform enabling mosques to broadcast prayers in real time with instant translation into multiple languages.The system allows mosque administrators to schedule prayer sessions, automatically launch live streams, and provide both live and on-demand access to translated prayers, making religious content accessible to global audiences
Plug-and-play Solutions
Sub‑second feel • RTF < 1 across languages
Modular ASR → NMT → TTS (or speech‑to‑speech)
Scales from 1 speaker to thousands of concurrent listeners
Multi‑region WebRTC/WebSocket streaming built for hostile networks
Designed for quick implementation with REST/gRPC/WebRTC SDKs

Organizations need real‑time multilingual access for sermons, lectures, town halls, and events — without spinning up a bespoke speech stack. Khutba’s goal: deliver human‑friendly latency, domain‑faithful translation, and industrial reliability in a form factor that product teams can integrate quickly

Plavno identified 4 challenges that consistently break live multilingual experiences:
Sub‑second feel • RTF < 1 across languages
Modular ASR → NMT → TTS (or speech‑to‑speech)
Scales from 1 speaker to thousands of concurrent listeners
Multi‑region WebRTC/WebSocket streaming built for hostile networks
Designed for quick implementation with REST/gRPC/WebRTC SDKs

Plavno’s solution is a modular, low‑latency pipeline you can drop in fast — engineered to preserve a live feel, keep domain terms correct, and scale across regions. Before the deep dive, here’s what it delivers:
Speaker → Cloud ingest: The speaker streams audio over WebRTC to an EC2 cluster.
Outer Scaler: Returns the optimal machine URL and pre‑warms the pipeline.
Inner Scaler: Fans listeners across the RTC worker pool (ports 5001…5000+n) and spins up STT.
Front: VAD/diarization to segment speech.
ASR: Conformer‑Transducer with CTC alignment & auto‑punctuation.
MT: Prefix‑to‑prefix Simultaneous Translation (low look‑ahead).
Aggregation: The Vocalizer channel aggregator renders each language via Azure Neural TTS and sends live transcripts over WebSockets.
Output: Auto‑generate audio + VTT/SRT subtitles to S3 when the session ends.
Pluggable modules: LEGO‑like pipeline components you can swap in/out.
Modes: Choose ASR → NMT → TTS or end‑to‑end speech‑to‑speech when prosody matters.
Integrations: Ship via REST, gRPC, or WebRTC SDKs.
Features
Online re‑beaming + reliability‑based segmentation
RAG lexicons with a reviewer feedback loop
Model pre‑warm and audio pre‑buffer to avoid first‑word lag
STUN/TURN, adaptive bitrate, jitter buffers, and health checks
Value
Advanced AI models and specialized techniques ensure accurate, contextually-aware translations across languages and domains
NLLB / M2M models with mixture of experts (MoE) heads for specialized language pairs
Fine-tuned low-rank adaptation for language families and regional dialects
Precise handling of proper nouns, places, and formatting consistency
Retrieval-augmented generation for domain-specific terminology and context
Benchmarks
Enterprise-grade infrastructure designed to handle massive concurrent loads while maintaining consistent sub-second response times
Intelligent worker sharding based on language pairs and processing requirement
Distributed jitter buffers and health checks across geographic regions
Real-time QoS guardrails with automatic scaling and performance tracking
Maintains RTF < 1 performance even during traffic spikes and peak usage
Real-time factor maintained under load
Uptime across regions
Concurrent listeners supported
Application
Sermons, community gatherings
Summits, expos, academic forums
Councils, courts, emergency broadcasts
Universities, MOOCs, virtual classrooms
Global town halls, training, investor briefings
Sports commentary, theatre, live shows
Delivery Crew
High-performing developers for growing companies
Competitive Ability
Real-world performance metrics that demonstrate the system`s capabilities in production environments
Audio Ingestion
< 50ms
Speech Recognition
< 200ms
Translation & TTS
< 300ms
Total Delivery
< 550ms
16x Cerebras Acceleration
1000+ Concurrent Users
50+ Language Pairs
1K req / sec Peak Throughput
NLLB Base Translation Model
MoE Expert Specialization
LoRA Dialect Adaptation
RAG Domain Terms
NLLB Live Audio Stream
VTT: WebTT Captions
SRT: Subtitle Files
Auto-generated Post-event
Leading developers driving success for dynamic businesses
Sub‑second perceived latency that feels conversational
Maintains domain fidelity during fast speech and code‑switching
Plug‑in SDKs; deploy without backend surgery
Thousands of concurrent listeners per session, multi‑region
Clean audio + VTT/SRT delivered automatically
Tools We Used
Project Estimator
The estimated time to launch the product
Clear vision of functionality you need
15% discount on your first sprint

Frequently Asked Questions
Find answers to your common concerns
Up to 50 in regular rooms; in Conference Mode, multiple speakers with thousands of listeners.
Session artifacts can be stored in AWS S3 when enabled; retention is configurable.
Sub‑second perceived delay in typical networks, thanks to WebRTC and streaming STT/NMT/TTS.
TLS, token‑based auth, RBAC, Cloudflare WAF/CDN, and isolated rooms; access is scoped by roles.
Translated audio (TTS) and on‑screen captions; listeners can switch languages.
Use Conference Mode to assign speaker roles and broadcast to thousands with live translation.
Web app (React) and mobile app (React Native, details TBD).
About Plavno

Senior engineers + proven AI components to accelerate time-to-value.

From MVPs to enterprise platforms at global scale.

From extension UX to GPU pipelines and global scale.
Testimonials
Contact Us
We can sign NDA for complete secrecy
Discuss your project details
Plavno experts contact you within 24h
Submit a comprehensive project proposal with estimates, timelines, team composition, etc
Plavno has a team of experts that ready to start your project. Ask me!

Vitaly Kovalev
Sales Manager