🔒 Service Notice: Cloud services temporarily down — reinforcing our on-prem AI. Contact: hello@whissle.ai🔒 Service Notice: Cloud services temporarily down — reinforcing our on-prem AI. Contact: hello@whissle.ai

Self-Hosted Voice AI

Whissle Gateway

The complete voice AI stack — ASR, LLM, TTS, diarization, and voice agents — running on a single GPU. Self-hosted or cloud — your choice. Deploy on-prem today, cloud API returning soon.

$ docker run -d --gpus all -p 9000:9000 \
whissleasr/whissle-gateway:standard

Or install with Lulu companion app:

$ curl -fsSL https://whissle.ai/install.sh | bash

Gateway API at localhost:9000 • Lulu at localhost:3000 • Ready in ~2 minutes

Everything on One GPU

🎙️

ASR

23 Languages

440ms TTFT • TensorRT • KenLM • ITN

🧠

LLM

3B on GPU

265 tok/s • OpenAI-compatible API

🗣️

TTS

Human-Quality

Orpheus EN + Hindi • 230ms TTFB

👥

Diarization

ECAPA-TDNN

Multi-speaker separation • Speaker ID

📊

Metadata

Per Utterance

Emotion • Intent • Age • Gender • Entities

🤖

Voice Agents

Full Pipeline

ASR → LLM → TTS on one GPU

Deploy Your Way

Run on your hardware for maximum privacy, or use our managed cloud when it returns. Same API, same models — your choice.

On-Prem

Your GPU, your network. Data never leaves your infrastructure.

Cloud API

Managed endpoints — no GPU needed. Coming back soon.

Hybrid

On-prem for sensitive data, cloud for scale. Best of both.

Runs on Any NVIDIA GPU

GPUVRAMRecommended Variant
T416 GBlite
RTX 309024 GBstandard
RTX 409024 GBstandard
A10040–80 GBfull
RTX 600048 GBenterprise
H10080 GBenterprise
H200141 GBenterprise

Ready to meet your personal AI?

Download the browser, try the web app, or build with our APIs — open source, self-hostable, and privacy-first.