Every dictation app falls into one of two categories: it either sends your audio to a server, or it processes everything on your computer. The marketing rarely makes this clear, so here's a plain breakdown of what each approach actually does, what you give up, and what you gain.
How Cloud Dictation Works
When you use a cloud dictation service — Google's Speech API, Amazon Transcribe, Deepgram, or Apple's own server-side dictation — your Mac records audio, compresses it, and uploads it to a remote data center. A large model running on specialized hardware transcribes the audio and sends text back.
The round trip takes time. Even on fast connections, you're looking at 200–500ms of network latency on top of the processing time. The audio leaves your machine, travels across the internet, gets processed, and the result comes back.
What that means in practice:
- Your voice data exists on someone else's server. Even if it's deleted after processing, it was transmitted and handled by infrastructure you don't control.
- You need an internet connection. No Wi-Fi, no dictation. Airplane mode, coffee shop with bad signal, VPN blocking — all break the workflow.
- You pay per minute. Cloud APIs charge by audio duration. Heavy users can spend $20–50/month or more.
- Accuracy depends on the provider's model. You get whatever model version they're running. It could improve overnight — or get worse.
How Local Dictation Works
Local dictation runs a speech model directly on your Mac. The audio never leaves your machine — it goes from your microphone to memory, gets transcribed, and the result appears at your cursor. No network calls, no accounts, no API keys.
SpokenKey uses NVIDIA's Parakeet TDT model running through sherpa-onnx, an inference runtime optimized for on-device speech recognition. The model downloads once (~540MB) and runs entirely on your CPU.
What that means in practice:
- Your audio never leaves your Mac. There's no server to trust, no privacy policy to read, no data retention to worry about. The audio exists in a temp file during transcription, then gets cleaned up.
- Works offline. Airplane, VPN, no Wi-Fi — doesn't matter. The model is already on your machine.
- No ongoing cost. SpokenKey is $29 one-time. No subscriptions, no per-minute charges, no usage caps. Dictate all day.
- You control the model. The exact model version you downloaded is the one running. Updates happen when you choose to update.
The Trade-offs Are Real
Local dictation isn't better in every dimension. Here's where cloud still wins:
- Multilingual support. Cloud models like Google's or Whisper cover 100+ languages. SpokenKey's Parakeet model is English-only. If you dictate in multiple languages, cloud is the better choice today.
- Model size vs. accuracy. The largest cloud models (billions of parameters running on GPU clusters) can edge out local models on difficult audio — heavy accents, noisy environments, overlapping speakers. For typical dictation in a quiet room, the gap is negligible.
- First-run setup. Cloud dictation works immediately — sign up, get an API key, start talking. Local dictation requires downloading a model first. SpokenKey handles this automatically on first launch, but it's still a ~540MB download.
What About Apple's Built-In Dictation?
Apple's dictation sits in an interesting middle ground. On recent Macs (Apple Silicon), some processing happens on-device. But Apple's documentation is vague about exactly what gets sent to their servers and when.
The bigger issue is functionality. Apple dictation has a 30-second timeout by default, limited punctuation handling, and no customization. You can't choose the model, can't add vocabulary corrections, and can't control how text gets inserted.
For quick messages, it's fine. For anything longer, it's the wrong tool.
Who Should Use Local Dictation
If any of these apply, local is the better fit:
- You dictate sensitive content — legal notes, medical records, journal entries, client work.
- You work in environments without reliable internet — travel, remote locations, restricted networks.
- You dictate frequently enough that per-minute pricing adds up.
- You simply prefer that your voice data stays on your computer.
SpokenKey was built for these use cases. Hold a key, speak, release — your words appear wherever your cursor is, and the audio never leaves your Mac.
$29 one-time at Gumroad. The terminal workflow is free forever. Learn more about SpokenKey.