Mallo is a macOS dictation app that turns speech into text and types directly at your cursor in any app.

Is Mallo a dictation app for Mac?

Yes. Mallo is built for macOS and works like a voice typing layer for ChatGPT, terminals, docs, browsers, chat apps, and other text fields.

What kinds of workflows does Mallo support?

Mallo supports focused, voice-first work: speaking prompts, edits, and notes while it types at your current cursor location.

Which apps does Mallo work with?

Anywhere you can type. Users use Mallo in terminals, docs, browsers, chat apps, and AI tools because it behaves like normal typing at the active cursor.

Does Mallo support multiple languages?

Yes. Mallo is multilingual by design, including Korean and English dictation via local models. Quality can vary by language and speaking style.

Can I use it fully offline?

Yes. Mallo runs with local Whisper via whisper.cpp, so you can use it offline. No account required.

Glossary

Mallo glossary

Clear definitions for Mac dictation, voice typing, speech recognition, cleanup passes, and the workflow language around using Mallo well.

Start here

Core terms for the Mallo workflow

Mac Dictation2 min read

Dictation

Dictation is a speech-to-text workflow where spoken words are converted into written text.

Read term

Speech Recognition2 min read

Local-First Speech Recognition

Local-first speech recognition keeps audio processing on your device by default instead of sending every utterance to a remote server.

Read term

Speech Recognition2 min read

Speech Recognition

Speech recognition is the system that turns spoken audio into text tokens a computer can work with.

Read term

Mac Dictation2 min read

Voice Typing

Voice typing means speaking instead of pressing keys so spoken words become typed text inside an app.

Read term

All terms

Vocabulary for setup, typing, and cleanup

26 published terms

A

Mac Dictation1 min read

Accessibility Permission

Accessibility permission lets a macOS app interact with interface elements, which can be necessary for reliable cross-app text insertion.

Read term

C

Voice Workflow2 min read

Cursor Insertion

Cursor insertion means generated text lands directly at the active caret position inside the app you are already using.

Read term

D

Mac Dictation2 min read

Dictation

Dictation is a speech-to-text workflow where spoken words are converted into written text.

Read term

Text Cleanup1 min read

Dictation History

Dictation history is the record of recent voice input results that users can revisit, copy, or recover later.

Read term

Voice Workflow1 min read

Dictation HUD

A dictation HUD is the on-screen status layer that shows whether the app is listening, processing, or ready to insert text.

Read term

Text Cleanup2 min read

Dictionary Replacement

Dictionary replacement is a rule-based text cleanup step that swaps known terms into the forms you want after speech is recognized.

Read term

F

Voice Workflow1 min read

Focus-Safe Insertion

Focus-safe insertion means inserting dictated text only when the app is confident about the active target field.

Read term

Voice Workflow1 min read

Fullscreen Overlay

A fullscreen overlay is UI that can remain visible and useful even when the user is working inside a fullscreen app.

Read term

G

Voice Workflow1 min read

Global Hotkey

A global hotkey is a shortcut that works across apps, so dictation can start without focusing a specific window first.

Read term

H

Text Cleanup1 min read

History Retention

History retention is the rule that decides how long past dictation results remain stored and accessible.

Read term

Voice Workflow1 min read

Hold-to-Talk

Hold-to-Talk means dictation runs only while you keep a shortcut pressed, giving you tight start-and-stop control.

Read term

I

Mac Dictation1 min read

Input Monitoring

Input Monitoring is the macOS permission that can allow an app to observe keyboard input for features like global shortcuts.

Read term

L

Speech Recognition2 min read

Local-First Speech Recognition

Local-first speech recognition keeps audio processing on your device by default instead of sending every utterance to a remote server.

Read term

M

Mac Dictation1 min read

Microphone Permission

Microphone permission is the macOS privacy permission that allows an app to capture audio input from the user’s mic.

Read term

Speech Recognition1 min read

Model Selection

Model selection is the product decision of choosing which speech model should handle the current dictation job.

Read term

Voice Workflow1 min read

Modifier-Only Hotkey

A modifier-only hotkey starts dictation with keys like Command or Control alone instead of a full key combination.

Read term

Mac Dictation2 min read

Multilingual Dictation

Multilingual dictation means a speech-to-text workflow can handle more than one language in real writing use.

Read term

P

Speech Recognition1 min read

Parakeet

Parakeet is NVIDIA’s ASR model family, often discussed as a high-performance speech recognition option in modern model lineups.

Read term

Q

Speech Recognition1 min read

Qwen ASR

Qwen ASR refers to the Qwen-family automatic speech recognition path used for multilingual and modern open-model dictation setups.

Read term

S

Speech Recognition2 min read

Speech Recognition

Speech recognition is the system that turns spoken audio into text tokens a computer can work with.

Read term

Speech Recognition1 min read

Speech Model

A speech model is the engine that predicts text from audio and largely determines speed, language fit, and accuracy tradeoffs.

Read term

Speech Recognition1 min read

Speech-to-Text

Speech-to-text is the process of converting spoken audio into written text.

Read term

T

Voice Workflow1 min read

Toggle Dictation

Toggle dictation starts with one shortcut press and keeps listening until the user stops it with another action.

Read term

V

Mac Dictation2 min read

Voice Typing

Voice typing means speaking instead of pressing keys so spoken words become typed text inside an app.

Read term

Mac Dictation1 min read

Voice Input

Voice input is the broad idea of using speech as an input method instead of typing by hand.

Read term

W

Speech Recognition1 min read

whisper.cpp

whisper.cpp is an on-device inference runtime used to run Whisper-family speech models locally.

Read term