Four real options for adding voice to your web app. Costs, quality, setup time, and when each one is the right call — built from a real dev session with Toby.
Option 1
Self-hosted piper TTS
No API cost
Privacy-first
Local neural TTS that runs on a $5/mo VPS. No API calls, no per-character billing, no data leaving your server. Best for privacy, offline use, or high-volume scenarios where API costs would compound. Setup takes an afternoon.
Option 2
OpenAI TTS API
Toby's pick · Best quality
Nova and Alloy voices are the best sounding browser-compatible TTS available right now. $0.015 per 1,000 characters — a 30-second walkthrough costs under $0.01. Plug in your API key, works instantly. Already wired into SideGuy's two-tier engine as the Tier 1 path.
Voices
Nova, Alloy, Shimmer +
API key needed
Yes — yours
Option 3
pocket-tts proxy endpoint
No key for users
A serverless function (Vercel or Netlify) that proxies any TTS model. PJ's API key lives server-side — visitors never need one. Combines quality of OpenAI with zero friction for end users. Natural path to x402 micropayment gating per request.
Quality
Excellent (proxied)
API key needed
No — server-side
Option 4
Browser speechSynthesis (polished)
Zero cost · Zero setup
The Web Speech API built into every modern browser. Free, instant, no API key. Voice quality varies by device and OS — macOS Siri voices sound great, Android can be robotic. SideGuy uses this as the Tier 2 fallback when no API endpoint is configured.