Rift

Voice ↔ Text. Entirely on your Mac.

Download Learn more

Your voice. Your pace.

Ideas don't arrive in perfect sentences.

They pause. They revise. They find their way.

Rift is built for how people actually think —

patient, precise, and ready when you are.

01

Voice to Text

Speak naturally. Rift transcribes.

→

Listening...

2:34 and counting

You decide
when you're done.

No auto-endpointing

Speak. Pause. Think. Rift waits.
Other apps cut you off after 2 seconds of silence.

Others

"The quick brown—"

Cut off after pause

Rift

"The quick brown fox jumps over the lazy dog."

You press stop when ready

0ms

First-word capture

Your first word is never lost.
A 250ms lead-in buffer starts recording before you even finish pressing the button.

Buffered

Button pressed

Recording

"Hel—" is already captured

0s

Rolling context window

The model considers the last 25 seconds of audio.
It understands context, not just isolated words.

Context window

Now

-30s -25s -20s -15s -10s -5s 0s

Live paste

Text appears in your app as you speak.
Real-time streaming with final reconciliation when you stop.

The quick brown fox jumps over the lazy dog.

Auto-fix

Hallucination detection

If the first transcription guess is wrong, Rift detects it and auto-replaces.
No manual cleanup. No re-recording.

> The whether weather is nice today

detecting... fixed

Real-time

Streaming transcription

Audio is processed in chunks as you speak.
No waiting for you to finish.

Audio

↓

Text

The quick brown fox jumps

02

Text to Voice

Select text. Hear it spoken.

→

First word in
150 milliseconds.

0ms

First-word latency

You hear the first word before the sentence finishes generating.
No loading spinners. No waiting.

Hello, world

"Hello..."

150ms to first sound

Seamless

Clause-level streaming

The next sentence is synthesized while the current one plays.
No gaps. No stutters. Continuous audio.

Playing: "The quick brown fox..."

Buffered: "jumps over the lazy dog."

Generating: "The end."

0ms

Audio poll rate

The audio buffer is checked every 20 milliseconds.
Imperceptible latency between chunks.

0ms 200ms

50 checks per second

Pause anywhere

Tap to pause mid-syllable. Tap again to resume from the exact position.
Your place is never lost.

Tap to pause

0.5× – 2×

Playback speed

Speed up for skimming. Slow down for comprehension.
Adjust in real-time without restarting.

0.5× 1.0× 2×

How it works

Two pipelines. Zero cloud. Everything on your Mac.

01 Voice to Text

Ctrl + 2

Start dictation

1

Capture

Core Audio streams from your microphone with a 250ms lead-in buffer. Your first word is never lost.

2

Process

Parakeet runs on the Neural Engine and GPU via MLX. 25 seconds of rolling context. Real-time streaming.

3

Paste

Text appears at your cursor as you speak. Final reconciliation when you stop. Hallucinations auto-fixed.

02 Text to Voice

Ctrl + 1

Speak selected text

1

Select

Highlight text in any app or copy to clipboard. Rift reads whatever you give it.

2

Synthesize

Kokoro generates audio clause-by-clause. First word in 150ms. Next sentence ready before current ends.

3

Play

Audio streams to system output. Pause anywhere, resume from exact position. 0.5× to 2× speed.

Space Pause / Resume

Esc Stop

Privacy.
That's Rift.

Your voice never leaves your Mac. Ever.

100% on-device processing

No cloud. No servers.

No accounts required

Fully open source

Zero file I/O

Audio is synthesized directly to memory. Nothing is written to disk. Nothing persists after you close the app.

Performance

Tested on real hardware. Real workloads.

M1 MacBook Air

Voice→Text

0.8× realtime

Text→Voice

1.2× realtime

Memory

1.8 GB

M3 MacBook Pro

Voice→Text

1.5× realtime

Text→Voice

2.4× realtime

Memory

2.0 GB

M4 Mac Studio

Voice→Text

2.1× realtime

Text→Voice

3.4× realtime

Memory

2.1 GB

How Rift compares

Rift

Whisper.cpp

macOS Dictation

On-device

✓

Partial

No auto-cutoff

✓

✗

Live paste

✓

✗

✓

First-word buffer

250ms

None

TTS included

✓

✗

Basic

TTS latency

150ms

N/A

~500ms

Privacy

100%

Cloud fallback

Requirements

macOS Sonoma 14.0+
Chip Apple Silicon
RAM 8GB minimum
Disk ~2GB

The visual metaphor

Nothing escapes.

A black hole where your data goes in — and stays in.

The Singularity

Your Mac is the center of gravity. All processing happens here — voice recognition, text synthesis, everything. No servers. No cloud. One machine.

The Accretion Disk

Your voice flows in like matter spiraling toward the event horizon. It gets captured, processed, transformed. The warm glow is energy being released as computation.

→

The Event Horizon

The point of no return — but in a good way. Once your words enter Rift, they never leave your machine. No telemetry, no uploads, no exceptions.

Gravitational Lensing

Just as light bends around a black hole, your voice bends into text. Text bends into voice. Transformation through the most powerful force — local compute.

How the visualization works +

Raymarching

Volumetric rendering via signed distance functions. The sphere-traced shader calculates 128 iterations per pixel to simulate photon paths.

Schwarzschild geodesics

Light follows the curved spacetime geometry of a non-rotating black hole. The photon sphere appears as a bright ring at 1.5× the event horizon radius.

Keplerian disk

Accretion disk particles orbit according to Kepler's laws. Inner particles orbit faster, creating the characteristic spiral structure.

ACES tonemapping

Film-industry-standard color grading compresses the HDR luminance into displayable range while preserving the fiery accretion glow.

Visualization based on Singularity by MisterPrada

Frequently asked

Does it work offline?

Yes, 100%. Rift never connects to the internet. All processing happens locally on your Mac using the MLX framework.

What languages are supported?

Currently English only. The underlying Parakeet model supports multiple languages, and we're working on enabling them in future updates.

Can I use my own voice for text-to-speech?

Not yet. Rift uses the Kokoro model's built-in voices. Custom voice cloning may be added in the future.

Is my voice data stored anywhere?

Never. Audio is processed in memory and discarded immediately. Nothing is written to disk or sent anywhere.

Why is the first run slow?

On first launch, Rift downloads and caches the ML models (~2GB). Subsequent launches are instant.

Does it work on Intel Macs?

No. Rift requires Apple Silicon (M1 or later) for the MLX machine learning framework.

Is Rift open source?

Yes. The full source code is available on GitHub under the MIT license.

How do I install it?

Download the DMG, drag to Applications, and launch. Apple Silicon (M1+) required. If macOS shows a security warning, check the installation guide for a quick fix. First launch downloads ~2GB of ML models.

The Technology

Built different.

Three technologies working together. All running locally on Apple Silicon. No cloud, no latency, no compromises.

The Foundation

MLX

Apple's machine learning framework. Runs entirely on your Mac's Neural Engine and GPU.

Apple Silicon On-device Open source

Voice to Text

Parakeet

NVIDIA's state-of-the-art speech recognition, optimized for Apple Silicon.

0.6B params TDT arch ~800MB

Text to Voice

Kokoro

Neural text-to-speech with natural-sounding voices. Real-time synthesis.

82M params Memory output ~1.2GB

Rift

Your voice. Your Mac. Nothing else.

Download for macOS

Free · Open Source · View on GitHub

Loading version...

Rift

Your voice. Your pace.

Voice to Text

Text to Voice

How it works

Capture

Process

Paste

Select

Synthesize

Play

Privacy.That's Rift.

Performance

How Rift compares

Requirements

Nothing escapes.

The Singularity

The Accretion Disk

The Event Horizon

Gravitational Lensing

Frequently asked

Built different.

MLX

Parakeet

Kokoro

Rift

Privacy.
That's Rift.