Skip to content

Self-Hosting Setup Guide

Run a Whisper server on your own hardware. Your audio stays on your network. Three commands to start, works on any machine that runs Docker.

Quick Start

bash
git clone https://github.com/omachala/diction.git
cd diction
docker compose up -d whisper-small

That's it. Whisper is now running at http://<your-server-ip>:9002.

Open the Diction app, switch to Self-Hosted, paste the URL, and start dictating.

Choosing a Model

The Docker Compose setup includes several models at different sizes. Pick the one that fits your hardware:

ModelPortRAMSpeedBest for
whisper-tiny9001~350 MB~1-2sLow-power hardware, quick tests
whisper-small9002~800 MB~3-4sRecommended starting point
whisper-medium9003~1.8 GB~8-12sBetter accuracy, needs more RAM
whisper-large9004~3.5 GB~20-30sBest accuracy, needs serious hardware
whisper-distil-large9005~2 GB~4-6sNear-large accuracy, much faster

Start any model with:

bash
docker compose up -d whisper-small    # or whisper-tiny, whisper-medium, etc.

You can run multiple models at the same time on different ports.

Connecting the App

  1. Open the Diction app
  2. Switch to the Self-Hosted tab
  3. Paste your server URL into the Endpoint URL field:
http://192.168.1.100:9002

Replace 192.168.1.100 with your server's actual IP address. A green dot in the app confirms the endpoint is reachable.

No Public IP?

You do not need to open ports on your router. Several free options let you connect your phone to a home server from anywhere:

  • Cloudflare Tunnel -- free, outbound-only connection. No port forwarding needed.
  • Tailscale -- free WireGuard mesh VPN. Install on server and phone, connect from anywhere.
  • ngrok -- instant public URL, useful for quick testing.

Any Whisper Endpoint Works

Diction is not locked to our Docker setup. It works with any OpenAI-compatible speech-to-text endpoint:

  • faster-whisper-server (what the Docker Compose setup uses)
  • whisper.cpp with the HTTP server
  • OpenAI's own Whisper API
  • Any future model that speaks the same protocol

If it accepts POST /v1/audio/transcriptions with a file upload and returns a JSON transcript, Diction can use it.

Optional: API Key

If your server is behind an API key (common with reverse proxies or hosted endpoints), enter it in the API Key field in the Self-Hosted settings. It is sent as a Bearer token with every request.

Requirements

  • Any machine that can run Docker (home server, NAS, cloud VM, Raspberry Pi for tiny models)
  • iPhone on the same network, or reachable via tunnel/VPN

Full Documentation

The complete Docker Compose configuration, model details, and advanced setup options are in the GitHub repository.