Dictation
Dictation is a speech-to-text workflow where spoken words are converted into written text.
Clear definitions for Mac dictation, voice typing, speech recognition, cleanup passes, and the workflow language around using Mallo well.
Start here
Dictation is a speech-to-text workflow where spoken words are converted into written text.
Local-first speech recognition keeps audio processing on your device by default instead of sending every utterance to a remote server.
Speech recognition is the system that turns spoken audio into text tokens a computer can work with.
Voice typing means speaking instead of pressing keys so spoken words become typed text inside an app.
All terms
26 published terms
Accessibility permission lets a macOS app interact with interface elements, which can be necessary for reliable cross-app text insertion.
Cursor insertion means generated text lands directly at the active caret position inside the app you are already using.
Dictation is a speech-to-text workflow where spoken words are converted into written text.
Dictation history is the record of recent voice input results that users can revisit, copy, or recover later.
A dictation HUD is the on-screen status layer that shows whether the app is listening, processing, or ready to insert text.
Dictionary replacement is a rule-based text cleanup step that swaps known terms into the forms you want after speech is recognized.
Focus-safe insertion means inserting dictated text only when the app is confident about the active target field.
A fullscreen overlay is UI that can remain visible and useful even when the user is working inside a fullscreen app.
A global hotkey is a shortcut that works across apps, so dictation can start without focusing a specific window first.
History retention is the rule that decides how long past dictation results remain stored and accessible.
Hold-to-Talk means dictation runs only while you keep a shortcut pressed, giving you tight start-and-stop control.
Input Monitoring is the macOS permission that can allow an app to observe keyboard input for features like global shortcuts.
Local-first speech recognition keeps audio processing on your device by default instead of sending every utterance to a remote server.
Microphone permission is the macOS privacy permission that allows an app to capture audio input from the user’s mic.
Model selection is the product decision of choosing which speech model should handle the current dictation job.
A modifier-only hotkey starts dictation with keys like Command or Control alone instead of a full key combination.
Multilingual dictation means a speech-to-text workflow can handle more than one language in real writing use.
Parakeet is NVIDIA’s ASR model family, often discussed as a high-performance speech recognition option in modern model lineups.
Qwen ASR refers to the Qwen-family automatic speech recognition path used for multilingual and modern open-model dictation setups.
Speech recognition is the system that turns spoken audio into text tokens a computer can work with.
A speech model is the engine that predicts text from audio and largely determines speed, language fit, and accuracy tradeoffs.
Speech-to-text is the process of converting spoken audio into written text.
Toggle dictation starts with one shortcut press and keeps listening until the user stops it with another action.
Voice typing means speaking instead of pressing keys so spoken words become typed text inside an app.
Voice input is the broad idea of using speech as an input method instead of typing by hand.
whisper.cpp is an on-device inference runtime used to run Whisper-family speech models locally.