Self-Hosted Voice AI
Whissle Gateway
The complete voice AI stack — ASR, LLM, TTS, diarization, and voice agents — running on a single GPU. Self-hosted or cloud — your choice. Deploy on-prem today, cloud API returning soon.
$ docker run -d --gpus all -p 9000:9000 \
whissleasr/whissle-gateway:standardOr install with Lulu companion app:
$ curl -fsSL https://whissle.ai/install.sh | bashGateway API at localhost:9000 • Lulu at localhost:3000 • Ready in ~2 minutes
Everything on One GPU
ASR
23 Languages
440ms TTFT • TensorRT • KenLM • ITN
LLM
3B on GPU
265 tok/s • OpenAI-compatible API
TTS
Human-Quality
Orpheus EN + Hindi • 230ms TTFB
Diarization
ECAPA-TDNN
Multi-speaker separation • Speaker ID
Metadata
Per Utterance
Emotion • Intent • Age • Gender • Entities
Voice Agents
Full Pipeline
ASR → LLM → TTS on one GPU
Deploy Your Way
Run on your hardware for maximum privacy, or use our managed cloud when it returns. Same API, same models — your choice.
On-Prem
Your GPU, your network. Data never leaves your infrastructure.
Cloud API
Managed endpoints — no GPU needed. Coming back soon.
Hybrid
On-prem for sensitive data, cloud for scale. Best of both.
Runs on Any NVIDIA GPU
| GPU | VRAM | Recommended Variant |
|---|---|---|
| T4 | 16 GB | lite |
| RTX 3090 | 24 GB | standard |
| RTX 4090 | 24 GB | standard |
| A100 | 40–80 GB | full |
| RTX 6000 | 48 GB | enterprise |
| H100 | 80 GB | enterprise |
| H200 | 141 GB | enterprise |
