How can transcription be under 50 milliseconds?

Hearsy uses the Parakeet TDT v2 model optimized for Apple Silicon's Neural Engine, running as a compiled CoreML graph with streaming audio processing.

Does it work on Intel Macs?

Hearsy requires Apple Silicon (M1 or later) for the Neural Engine acceleration that makes sub-50ms transcription possible.

How large is the model download?

The Parakeet TDT v2 model is approximately 450MB, downloaded once during setup.

Does transcription speed vary with recording length?

For typical dictation, processing time is nearly constant thanks to Parakeet's streaming inference.

Does Hearsy affect battery life?

The Neural Engine is designed for efficiency. Active dictation has minimal battery impact, and idle consumption is effectively zero.

Is Whisper slower than Parakeet?

Yes, Whisper takes 1-2 seconds vs Parakeet's sub-200ms, but supports 99+ languages with auto-detection.

Transcription in under 50 milliseconds

Powered by Parakeet TDT v2 on Apple Silicon's Neural Engine. No cloud round-trip, no waiting. Speak and see your words appear instantly.

Download for Mac

Three steps, zero lag

The entire pipeline runs locally — from hotkey press to pasted text.

Rec

Press your hotkey

A single keyboard shortcut starts recording. No menus, no clicks, no delays.

Speak naturally

Audio streams directly to the on-device neural engine. Processing begins instantly.

Text appears

Transcribed text is pasted right where your cursor is. The entire flow takes under 200ms.

Default Engine

Parakeet TDT v2

Parakeet is a transducer-based speech recognition model built by NVIDIA and optimized by FluidAudio for Apple Silicon. It runs as a compiled CoreML graph on the Neural Engine, bypassing Python and CPU-bound inference entirely.

The model processes audio in streaming chunks, meaning transcription begins before you finish speaking. For a typical sentence, the neural engine completes inference in under 50 milliseconds — faster than the blink of an eye.

With support for 25 major world languages, Parakeet covers English, Spanish, French, German, Japanese, Chinese, and more. For less common languages, Hearsy also includes Whisper v3 Turbo with 99+ language support.

Live performance

Typical metrics on Apple Silicon

Transcription< 50ms

End-to-end latency~200ms

Cloud dictation (typical)500ms–2s

< 50ms

Transcription latency

~0.2s

End-to-end

~450MB

Model size

Languages

Local speed vs. cloud round-trip

Cloud dictation adds network latency, server processing time, and response transmission. Hearsy skips all of it.

Hearsy (Local)

✓ Sub-50ms neural engine inference
✓ Zero network latency
✓ Consistent performance every time
✓ Works offline and on airplane mode
✓ No server queue or rate limits

Cloud Dictation

✗ 50-100ms network round-trip
✗ 200-500ms server processing
✗ Variable latency based on load
✗ Requires stable internet
✗ Subject to API rate limits