Features — Brethof Voice Pro

🔒

Complete Privacy

Every word you speak is processed on your device. No audio, text, or metadata is ever transmitted to any server. There is no cloud backend, no telemetry, no analytics, and no phone-home.

Zero network calls during transcription
Models stored locally after one-time download
Open-source Qwen3-ASR engine — fully auditable

⚡

GPU Acceleration

Brethof Voice Pro uses the GGUF-optimized engine with llama.cpp for blazing-fast inference. Supports all three major GPU vendors out of the box.

NVIDIA — Vulkan acceleration (GTX 10-series and newer)
AMD — Vulkan acceleration (RX 500-series and newer)
Intel — Vulkan acceleration (Arc GPUs and integrated graphics)
CPU fallback — runs without a GPU, just slower

🌐

Offline Transcription — 30 Languages + 22 Chinese Dialects

Powered by Qwen3-ASR via llama.cpp. Lock to a specific language for maximum accuracy, or let the engine auto-detect. Every word stays on your machine.

EnglishChineseCantoneseArabic GermanFrenchSpanishPortuguese ItalianDutchRussianIndonesian KoreanThaiVietnameseJapanese TurkishHindiMalaySwedish DanishFinnishPolishCzech FilipinoPersianGreekRomanian HungarianMacedonian

Plus 22 Chinese regional dialects (Anhui, Dongbei, Fujian, Henan, Hunan, Shandong, Sichuan, Wu, Minnan, and more) recognised automatically when the language is set to Chinese or auto-detect.

💬

Offline Translation — 38 Languages New in v2.0.0

Translate any transcription, voice-keyboard output, plain text, or subtitle file — entirely on your machine. Powered by Tencent Hunyuan MT2: on FLORES-200 (XCOMET-XXL) the Quality tier reaches 97.9% of Google Gemini 3.1 Pro and the compact Fast tier 89.9%, and it surpasses Gemini 3.1 Pro on real-world (WildMTBench) and minority-language translation.

ChineseEnglishFrenchPortuguese SpanishJapaneseTurkishRussian ArabicKoreanThaiItalian GermanVietnameseMalayIndonesian FilipinoHindiTrad. ChinesePolish CzechDutchKhmerBurmese PersianGujaratiUrduTelugu MarathiHebrewBengaliTamil UkrainianCantoneseTibetanKazakh MongolianUyghur

Transcribe + translate — pick a target language in the Transcribe popup; ASR transcribes, MT translates, both render side-by-side
Voice keyboard translation — speak, pick targets from a 3-column language grid, the keyboard types the translation
Subtitle translator — SRT/VTT files in any of the 38 languages, with optional bilingual mode (source + translation per cue)
Two model tiers — Fast (~1 GB) sub-second on CPU or GPU; Quality (~4.3 GB) sub-second on GPU
Independent device picker — run ASR on Vulkan 0, translation on Vulkan 1, or both on CPU

📈

Two Model Sizes

Pick the balance of accuracy, speed, and VRAM that suits your machine. Both run the same Qwen3-ASR architecture; switch any time from Settings → Models.

0.6B — small, fast, runs on integrated GPUs or any 4 GB+ Vulkan card. Recommended default for laptops.
1.7B — larger, higher accuracy on accented or noisy audio. Comfortable on 6 GB+ VRAM. State-of-the-art among open ASR.

Optional add-ons download on demand from Settings → Models: Forced Aligner (~540 MB) for word-level timestamps, Hunyuan MT2 Fast (~1 GB) or Quality (~4.3 GB) for translation.

🎵

AI Noise Reduction

Optional DeepFilter noise suppression for recordings made in noisy rooms — off by default, enable from the Noise popup. Skipping it on clean mic clips actually helps quality (DeepFilter can over-process short, clean audio).

Removes background noise, keyboard clicks, and room echo
Configurable attenuation
No extra hardware needed
Off by default — toggle per-recording or always-on

🎓

Personal Voice Training

Fine-tune the model on your own voice with LoRA — runs end-to-end on your machine. Every time you correct a misrecognised word, the {clip, correction} pair is saved to your local training dataset. The main window's training card shows total samples and minutes captured at a glance — click it to open the dataset browser, then "Start training" in the Training tab.

Adapt to your accent, dialect, and speaking rhythm
Corrections auto-saved — just keep using the app
LoRA fine-tuning — fast, efficient, no full retrain
Auto-picks NVIDIA CUDA (cu128 PyTorch) or CPU backend
Auto-exports the trained model to GGUF when done
Your voice data never leaves your machine
Free for every paid licence

⌨️

Voice Keyboard & Direct Text Injection

Hold the hotkey, speak, and the text lands wherever your cursor is — like a keyboard. Works in browsers, IDEs, terminals, chat apps, anywhere a text field accepts keyboard input.

Default hotkey F9 — configurable, hold-to-record or toggle
Optional Right-Mouse-Button trigger for hands-free recording
Live translation chip — speak in one language, the keyboard types the translation. Pick one or more targets from a 3-column grid: one per line, inline (EN: … || PL: …), or first target only.
Works with any text field, editor, terminal, or chat
X11 and Wayland on Linux, native input on Windows

📚

Hotword Context & Terminology

One field, two uses. Bias the ASR toward proper nouns, brand names, and jargon — reduces "VFIO" being mistranscribed as "VEAF1". Same field doubles as the translation terminology dictionary — pin "Brethof Voice" to stay "Brethof Voice" in every target language.

Add terms in Settings — one per line
Improves recognition of proper nouns and abbreviations
Preserves brand names and technical terms in translations
No retraining needed — applied at inference time

🤖

MCP Server for AI Agents Paid plans

The same binary that runs the GUI can run as a Model Context Protocol server — 19 tools exposing ASR and MT to Claude Desktop, Claude Code, Cursor, Cline, or any MCP-compatible agent. Transport is stdio: no port, no firewall, no localhost binding. The agent owns the lifecycle.

Transcribe audio/video files, mic recordings, or system audio
Translate text, SRT, or VTT (bilingual mode supported)
Switch ASR or MT compute device on the fly
List and switch personal voice profiles
Read/write any app setting from the agent

Run brethof-voice --mcp and the agent connects over stdio. Paid licence required — trial users can't start the server.

Everything in One App

Complete Privacy

GPU Acceleration

Offline Transcription — 30 Languages + 22 Chinese Dialects

Offline Translation — 38 Languages New in v2.0.0

Two Model Sizes

AI Noise Reduction

Personal Voice Training

Voice Keyboard & Direct Text Injection

Hotword Context & Terminology

MCP Server for AI Agents Paid plans

Ready to try it?

Everything we build

Voice Pro

brethof-mind

3D Models

3D Prints

Nova

Awesome lists

Guides

ComfyUI workflows

Anti-dev tier list

About Brethof AI