Skip to main content
← Blog

Local vs. Cloud Dictation on Mac — What Actually Happens to Your Voice Data

March 2026

Every dictation app falls into one of two categories: it either sends your audio to a server, or it processes everything on your computer. The marketing rarely makes this clear, so here's a plain breakdown of what each approach actually does, what you give up, and what you gain.


How Cloud Dictation Works

When you use a cloud dictation service — Google's Speech API, Amazon Transcribe, Deepgram, or Apple's own server-side dictation — your Mac records audio, compresses it, and uploads it to a remote data center. A large model running on specialized hardware transcribes the audio and sends text back.

The round trip takes time. Even on fast connections, you're looking at 200–500ms of network latency on top of the processing time. The audio leaves your machine, travels across the internet, gets processed, and the result comes back.

What that means in practice:


How Local Dictation Works

Local dictation runs a speech model directly on your Mac. The audio never leaves your machine — it goes from your microphone to memory, gets transcribed, and the result appears at your cursor. No network calls, no accounts, no API keys.

SpokenKey uses NVIDIA's Parakeet TDT model running through sherpa-onnx, an inference runtime optimized for on-device speech recognition. The model downloads once (~540MB) and runs entirely on your CPU.

What that means in practice:


The Trade-offs Are Real

Local dictation isn't better in every dimension. Here's where cloud still wins:


What About Apple's Built-In Dictation?

Apple's dictation sits in an interesting middle ground. On recent Macs (Apple Silicon), some processing happens on-device. But Apple's documentation is vague about exactly what gets sent to their servers and when.

The bigger issue is functionality. Apple dictation has a 30-second timeout by default, limited punctuation handling, and no customization. You can't choose the model, can't add vocabulary corrections, and can't control how text gets inserted.

For quick messages, it's fine. For anything longer, it's the wrong tool.


Who Should Use Local Dictation

If any of these apply, local is the better fit:

SpokenKey was built for these use cases. Hold a key, speak, release — your words appear wherever your cursor is, and the audio never leaves your Mac.

$29 one-time at Gumroad. The terminal workflow is free forever. Learn more about SpokenKey.