NEW · v2.0.0

Everything in One App

Transcribe 30 languages + 22 Chinese dialects

Translate 38 languages, fully offline

Timestamp make your own subtitles

MCP server talk to it from your AI stack

Runs local — even on laptopsNo subscription14-day free trial

Brethof Voice Pro — Main Screen
Brethof Voice Pro — Recording
🔒

Complete Privacy

Every word you speak is processed on your device. No audio, text, or metadata is ever transmitted to any server. There is no cloud backend, no telemetry, no analytics, and no phone-home.

  • Zero network calls during transcription
  • Models stored locally after one-time download
  • Open-source Qwen3-ASR engine — fully auditable

GPU Acceleration

Brethof Voice Pro uses the GGUF-optimized engine with llama.cpp for blazing-fast inference. Supports all three major GPU vendors out of the box.

  • NVIDIA — Vulkan acceleration (GTX 10-series and newer)
  • AMD — Vulkan acceleration (RX 500-series and newer)
  • Intel — Vulkan acceleration (Arc GPUs and integrated graphics)
  • CPU fallback — runs without a GPU, just slower
🌐

Offline Transcription — 30 Languages + 22 Chinese Dialects

Powered by Qwen3-ASR via llama.cpp. Lock to a specific language for maximum accuracy, or let the engine auto-detect. Every word stays on your machine.

EnglishChineseCantoneseArabic GermanFrenchSpanishPortuguese ItalianDutchRussianIndonesian KoreanThaiVietnameseJapanese TurkishHindiMalaySwedish DanishFinnishPolishCzech FilipinoPersianGreekRomanian HungarianMacedonian

Plus 22 Chinese regional dialects (Anhui, Dongbei, Fujian, Henan, Hunan, Shandong, Sichuan, Wu, Minnan, and more) recognised automatically when the language is set to Chinese or auto-detect.

💬

Offline Translation — 38 Languages New in v2.0.0

Translate any transcription, voice-keyboard output, plain text, or subtitle file — entirely on your machine. Powered by Tencent Hunyuan MT2: on FLORES-200 (XCOMET-XXL) the Quality tier reaches 97.9% of Google Gemini 3.1 Pro and the compact Fast tier 89.9%, and it surpasses Gemini 3.1 Pro on real-world (WildMTBench) and minority-language translation.

ChineseEnglishFrenchPortuguese SpanishJapaneseTurkishRussian ArabicKoreanThaiItalian GermanVietnameseMalayIndonesian FilipinoHindiTrad. ChinesePolish CzechDutchKhmerBurmese PersianGujaratiUrduTelugu MarathiHebrewBengaliTamil UkrainianCantoneseTibetanKazakh MongolianUyghur
  • Transcribe + translate — pick a target language in the Transcribe popup; ASR transcribes, MT translates, both render side-by-side
  • Voice keyboard translation — speak, pick targets from a 3-column language grid, the keyboard types the translation
  • Subtitle translator — SRT/VTT files in any of the 38 languages, with optional bilingual mode (source + translation per cue)
  • Two model tiers — Fast (~1 GB) sub-second on CPU or GPU; Quality (~4.3 GB) sub-second on GPU
  • Independent device picker — run ASR on Vulkan 0, translation on Vulkan 1, or both on CPU
📈

Two Model Sizes

Pick the balance of accuracy, speed, and VRAM that suits your machine. Both run the same Qwen3-ASR architecture; switch any time from Settings → Models.

  • 0.6B — small, fast, runs on integrated GPUs or any 4 GB+ Vulkan card. Recommended default for laptops.
  • 1.7B — larger, higher accuracy on accented or noisy audio. Comfortable on 6 GB+ VRAM. State-of-the-art among open ASR.

Optional add-ons download on demand from Settings → Models: Forced Aligner (~540 MB) for word-level timestamps, Hunyuan MT2 Fast (~1 GB) or Quality (~4.3 GB) for translation.

🎵

AI Noise Reduction

Optional DeepFilter noise suppression for recordings made in noisy rooms — off by default, enable from the Noise popup. Skipping it on clean mic clips actually helps quality (DeepFilter can over-process short, clean audio).

  • Removes background noise, keyboard clicks, and room echo
  • Configurable attenuation
  • No extra hardware needed
  • Off by default — toggle per-recording or always-on
🎓

Personal Voice Training

Fine-tune the model on your own voice with LoRA — runs end-to-end on your machine. Every time you correct a misrecognised word, the {clip, correction} pair is saved to your local training dataset. The main window's training card shows total samples and minutes captured at a glance — click it to open the dataset browser, then "Start training" in the Training tab.

  • Adapt to your accent, dialect, and speaking rhythm
  • Corrections auto-saved — just keep using the app
  • LoRA fine-tuning — fast, efficient, no full retrain
  • Auto-picks NVIDIA CUDA (cu128 PyTorch) or CPU backend
  • Auto-exports the trained model to GGUF when done
  • Your voice data never leaves your machine
  • Free for every paid licence
⌨️

Voice Keyboard & Direct Text Injection

Hold the hotkey, speak, and the text lands wherever your cursor is — like a keyboard. Works in browsers, IDEs, terminals, chat apps, anywhere a text field accepts keyboard input.

  • Default hotkey F9 — configurable, hold-to-record or toggle
  • Optional Right-Mouse-Button trigger for hands-free recording
  • Live translation chip — speak in one language, the keyboard types the translation. Pick one or more targets from a 3-column grid: one per line, inline (EN: … || PL: …), or first target only.
  • Works with any text field, editor, terminal, or chat
  • X11 and Wayland on Linux, native input on Windows
📚

Hotword Context & Terminology

One field, two uses. Bias the ASR toward proper nouns, brand names, and jargon — reduces "VFIO" being mistranscribed as "VEAF1". Same field doubles as the translation terminology dictionary — pin "Brethof Voice" to stay "Brethof Voice" in every target language.

  • Add terms in Settings — one per line
  • Improves recognition of proper nouns and abbreviations
  • Preserves brand names and technical terms in translations
  • No retraining needed — applied at inference time
🤖

MCP Server for AI Agents Paid plans

The same binary that runs the GUI can run as a Model Context Protocol server — 19 tools exposing ASR and MT to Claude Desktop, Claude Code, Cursor, Cline, or any MCP-compatible agent. Transport is stdio: no port, no firewall, no localhost binding. The agent owns the lifecycle.

  • Transcribe audio/video files, mic recordings, or system audio
  • Translate text, SRT, or VTT (bilingual mode supported)
  • Switch ASR or MT compute device on the fly
  • List and switch personal voice profiles
  • Read/write any app setting from the agent

Run brethof-voice --mcp and the agent connects over stdio. Paid licence required — trial users can't start the server.

Ready to try it?

14-day free trial. All features unlocked. No credit card.

Everything we build

External:   YouTube · GitHub