awesome-local-ai — AI tools that run on your machine

ExLlamaV2

Fast inference for quantised LLMs on consumer NVIDIA GPUs. EXL2 format outperforms GGUF on the same hardware in many benchmarks.

LinuxWindowsopen sourcefreePython

GPT4All

Privacy-first desktop chat with curated quantised models. Strong CPU performance.

LinuxWindowsmacOSopen sourcefreeC/C++

Jan

Open-source ChatGPT alternative. Bundles llama.cpp + a clean UI.

LinuxWindowsmacOSopen sourcefreeTypeScript

KoboldCpp

Single-binary llama.cpp wrapper with KoboldAI UI for chat, story-writing, RP.

LinuxWindowsmacOSopen sourcefreeC/C++

llama.cpp

Reference C++ implementation for running LLaMA-family and other transformer models with GGUF quantization. Powers most of the others in this section.

LinuxWindowsmacOSopen sourceC/C++

LM Studio

Polished desktop app for discovering, downloading, and running local LLMs. OpenAI-compatible server mode. Free for personal + commercial.

LinuxWindowsmacOSclosed sourcefreeC/C++

LocalAI

Self-hosted, OpenAI-compatible inference server. Text, image, audio, embeddings — all on your machine.

LinuxWindowsmacOSopen sourcefreeGo

Mistral.rs

Rust LLM inference platform with quantization, vision, MoE, and speculative decoding.

LinuxWindowsmacOSopen sourcefreeRust

MLC LLM

Compile-once, deploy-anywhere LLM runtime. Targets WebGPU, Vulkan, CUDA, Metal, iOS, and Android from a single source.

LinuxWindowsmacOSmobilewebopen sourcefreePython

Ollama

Single-binary server with a built-in model library. Pull, run, and swap models with one command.

LinuxWindowsmacOSopen sourcefreeGo

SGLang

Fast LLM and VLM serving runtime with RadixAttention cache and structured-output support.

Linuxopen sourcefreePython

Text Generation WebUI

Gradio-based web UI for local LLMs. Supports GGUF, GPTQ, AWQ, EXL2.

LinuxWindowsmacOSopen sourcefreePython

vLLM

High-throughput inference engine with PagedAttention. Designed for serving, not desktop chat — pair with Open WebUI or LiteLLM.

Linuxopen sourcefreePython

Anything LLM

Workspace-style chat with built-in RAG. Works fully offline with a local LLM provider.

LinuxWindowsmacOSopen sourcefreeTypeScript

Faraday

Local-only character / role-play chat. Bundles inference, no API key needed.

LinuxWindowsmacOSclosed sourcefreeC/C++

Msty

Fast desktop chat with branching conversations and parallel-model comparison. Free tier covers personal local use.

LinuxWindowsmacOSclosed sourcefreeTypeScript

Open WebUI

Self-hosted "ChatGPT clone" of the open-source world. Pair with Ollama or any OpenAI-compatible local server.

LinuxWindowsmacOSwebopen sourcefreePython

Desktop dictation app built on Qwen3-ASR + GGUF + llama.cpp. 36 languages, hotkey-anywhere transcription, file/microphone/system-audio input, LoRA personal voice training. 100% offline, no account required to transcribe. Disclosure: maintained by us.

LinuxWindowsclosed sourcefreepaidC/C++

faster-whisper

CTranslate2-based reimplementation. ~4× faster than reference Whisper at the same accuracy.

LinuxWindowsmacOSopen sourcefreePython

OpenAI Whisper

Reference Python implementation. Accurate but slower than the C++ ports; useful when you need the exact research behaviour.

LinuxWindowsmacOSopen sourcefreePython

RealtimeSTT

Low-latency streaming wrapper around faster-whisper for live dictation pipelines.

LinuxWindowsmacOSopen sourcefreePython

Vosk

Lightweight offline speech recognizer with 20+ language models. Real-time on CPU.

LinuxWindowsmacOSmobileopen sourcefreePython

Whisper.cpp

C++ port of OpenAI Whisper with GGUF quantization. Runs on CPU, Metal, CUDA, Vulkan.

LinuxWindowsmacOSmobileopen sourcefreeC/C++

WhisperX

faster-whisper plus forced alignment, voice-activity detection, and speaker diarization.

LinuxWindowsmacOSopen sourcefreePython

Bark

Multilingual generative audio. Speech, sound effects, and music cues from text prompts.

LinuxWindowsmacOSopen sourcefreePython

Coqui TTS

Comprehensive TTS toolkit. Multiple architectures (Tacotron, VITS, XTTS) and voice cloning.

LinuxWindowsmacOSopen sourcefreePython

Kokoro

Tiny ~80M-param TTS model, surprisingly natural for the size. Suitable for low-end hardware.

LinuxWindowsmacOSopen sourcefreePython

Mimic 3

Mycroft's neural TTS engine. Lightweight, multilingual.

LinuxWindowsmacOSmobileopen sourcefreePython

Piper

Fast neural TTS. ONNX runtime, dozens of voices and languages. Designed for Raspberry Pi-class hardware.

LinuxWindowsmacOSmobileopen sourcefreeC/C++

StyleTTS 2

High-fidelity expressive TTS with style transfer. Strong reference voice cloning.

LinuxWindowsmacOSopen sourcefreePython

AUTOMATIC1111 / Stable Diffusion WebUI

The original ergonomic SD UI. Heavy plugin ecosystem.

LinuxWindowsmacOSopen sourcefreePython

ComfyUI

Node-graph workflow editor for diffusion models. Powers most modern local image and video pipelines.

LinuxWindowsmacOSopen sourcefreePython

Fooocus

Image generator with sane defaults — minimal knobs for great results. Built on top of Stable Diffusion.

LinuxWindowsmacOSopen sourcefreePython

Forge

Performance-tuned A1111 fork by lllyasviel. Lower VRAM, faster on modern GPUs.

LinuxWindowsmacOSopen sourcefreePython

InvokeAI

Pro-grade SD UI with strong canvas / inpainting tools. Enterprise tier; free local install remains open source.

LinuxWindowsmacOSopen sourceclosed sourcefreepaidPython

SD.Next

All-in-one fork of A1111 with broader backend support (Diffusers, ONNX, ROCm).

LinuxWindowsmacOSopen sourcefreePython

SwarmUI

Modular UI built on top of ComfyUI. User-friendly mode out of the box, full node-graph available when you need it.

LinuxWindowsmacOSopen sourcefreeTypeScript

ComfyUI + LTX Video

ComfyUI nodes drive Lightricks LTX video models for text-to-video and image-to-video generation. The chunked-loop pattern (released in our [comfyui-workflows](https://github.com/BrethofAI/comfyui-workflows)) produces longer outputs than vanilla LTX allows.

LinuxWindowsmacOSopen sourcefreePython

Wan2GP

Stripped-down Wan2.2 video pipeline for low-VRAM consumer GPUs.

LinuxWindowsmacOSopen sourcefreePython

Aider

Terminal pair-programming. Bring-your-own-LLM via LiteLLM — run with Ollama or any OpenAI-compatible local endpoint.

LinuxWindowsmacOSopen sourcefreePython

Continue

IDE assistant with first-class local-LLM support. Defaults can be set to Ollama / LM Studio. VS Code + JetBrains.

LinuxWindowsmacOSopen sourcefreeTypeScript

Llama.vim

Vim plugin that streams llama.cpp completions inline. No cloud.

LinuxWindowsmacOSopen sourcefreeC/C++

Tabby

Self-hosted GitHub Copilot alternative. Local model serving with IDE plugins.

LinuxWindowsmacOSopen sourcefreeRust

twinny

Free local AI extension for VS Code. Chat + autocomplete via Ollama.

LinuxWindowsmacOSopen sourcefreeTypeScript

Aider in /architect mode

Aider's planning mode separates "decide" and "edit" steps; works well with strong local reasoning models.

LinuxWindowsmacOSopen sourcefreePython

Continue Agent mode

Agentic editing flow inside Continue. Pair with a local model for fully-offline coding agents.

LinuxWindowsmacOSopen sourcefreeTypeScript

Open Interpreter

Code-execution agent that runs Python/shell on your machine. Local-LLM friendly.

LinuxWindowsmacOSopen sourcefreePython

Chroma

Embedding database designed for local-first usage. SQLite-style single-file or client/server.

LinuxWindowsmacOSopen sourcefreePython

Faiss

Library for similarity search. The retrieval engine inside many of the others.

LinuxWindowsmacOSopen sourcefreeC/C++

LanceDB

Embedded, columnar vector DB. Single-file, no server.

LinuxWindowsmacOSopen sourcefreeRust

Marqo

End-to-end vector search; OSS core, paid hosted version.

LinuxWindowsmacOSopen sourceclosed sourcefreepaidPython

Qdrant

High-performance vector DB. Self-host the open-source binary.

LinuxWindowsmacOSopen sourcefreeRust

Weaviate

Hybrid (vector + keyword) DB. Self-host the OSS distribution; cloud is optional.

LinuxWindowsmacOSopen sourcefreeGo

BGE

BAAI's BGE family. Strong English + multilingual variants. Run via llama.cpp, sentence-transformers, or fastembed.

LinuxWindowsmacOSopen sourcefreePython

fastembed

Lightweight CPU-friendly embedding library by Qdrant.

LinuxWindowsmacOSopen sourcefreePython

Sentence Transformers

Reference Python library for sentence + paragraph embeddings.

LinuxWindowsmacOSopen sourcefreePython

Axolotl

Config-driven fine-tuning framework. LoRA, QLoRA, full fine-tunes.

Linuxopen sourcefreePython

diffusion-pipe

Pipeline-parallel trainer for diffusion models. Multi-GPU LoRA on large image / video models.

Linuxopen sourcefreePython

MLX

Apple's native ML framework for Apple Silicon. Train and infer on M-series Macs without CUDA workarounds.

macOSopen sourcefreePython

Ostris ai-toolkit

LoRA training UI for Flux, SD3, SDXL, LTX. Works on consumer hardware.

LinuxWindowsopen sourcefreePython

Unsloth

Fine-tune LLMs 2× faster with 70% less VRAM than reference HuggingFace pipelines.

LinuxWindowsmacOSopen sourcefreePython

Anything LLM

Self-hosted workspace tool with integrated RAG. Listed twice intentionally — strong both as a chat app and a RAG layer.

LinuxWindowsmacOSopen sourcefreeTypeScript

LlamaIndex

Toolkit for building RAG pipelines. Works fully offline with local models + vector DBs.

LinuxWindowsmacOSopen sourcefreePython

Perplexica

Open-source AI search powered by SearXNG + your local LLM.

LinuxWindowsmacOSopen sourcefreeTypeScript

PrivateGPT

Ingest documents locally and query them with an offline LLM.

LinuxWindowsmacOSopen sourcefreePython

SearXNG

Self-hosted meta-search engine. Pair with a local LLM for an offline Perplexity-style assistant.

LinuxWindowsmacOSopen sourcefreePython

Bazzite

Container-native gaming and AI distro. Steam Deck-friendly, latest drivers, easy CUDA.

Linuxopen sourcefree

Bluefin

Fedora-based, atomic, container-first. Good "drop you in a known state" workstation for AI work.

Linuxopen sourcefree

CachyOS

Arch-based desktop distro with a tuned kernel and recent NVIDIA / AMD drivers. Sane out-of-the-box for new GPUs (Blackwell, RDNA 4).

Linuxopen sourcefree

NixOS

Reproducible system config. Best when you need identical CUDA + ML toolchain across machines.

Linuxopen sourcefree

Pop!_OS

System76's NVIDIA-friendly desktop distro. ISO ships with proprietary drivers for plug-and-play GPU work.

Linuxopen sourcefree

MLX

Apple Silicon-native ML library. Already listed under training; it also ships an inference runtime competitive with llama.cpp on M-series.

macOSopen sourcefreePython

NVIDIA TensorRT-LLM

NVIDIA's optimised LLM runtime for their data-center and consumer GPUs. Closed-weights binary; fastest CUDA path for many models.

LinuxWindowsclosed sourcefreePython

OpenVINO

Intel's inference toolkit. CPU, iGPU, dGPU (Arc), and NPU support for Intel laptops.

LinuxWindowsmacOSopen sourcefreeC/C++

ROCm + llama.cpp HIP

AMD GPU inference path. Llama.cpp's HIP backend now reaches CUDA parity on RDNA 3/4 in many workloads.

LinuxWindowsmacOSopen sourceC/C++

awesome-local-ai AI tools that run entirely on your machine

Runs entirely on your machine

Inference Runtimes · 13

Desktop Chat Apps · 4

Voice — Speech-to-Text · 7

Voice — Text-to-Speech · 6

Image Generation · 7

Video Generation · 2

Code Assistants · 5

Local Agents · 3

Vector Databases · 6

Embeddings · 3

Training & Fine-tuning · 5

Local Search & RAG · 5

Operating Systems Tuned for AI · 5

Hardware-Specific Runtimes · 4

Run something great locally? Tell us.

Everything we build

Voice Pro

brethof-mind

3D Models

3D Prints

Nova

Awesome lists

Guides

ComfyUI workflows

Anti-dev tier list

About Brethof AI

awesome-local-ai
AI tools that run entirely on your machine