Live · curated

awesome-local-ai
AI tools that run entirely on your machine

Models, runtimes, UIs, integrations. Everything you can install once and run forever without an API key. Aggregating from real-world use, not feature lists.

Runs entirely on your machine

75 curated, receipts-backed across 14 categories.

Inference Runtimes · 13

ExLlamaV2

Fast inference for quantised LLMs on consumer NVIDIA GPUs. EXL2 format outperforms GGUF on the same hardware in many benchmarks.

LinuxWindowsopen sourcefreePython

GPT4All

Privacy-first desktop chat with curated quantised models. Strong CPU performance.

LinuxWindowsmacOSopen sourcefreeC/C++

Jan

Open-source ChatGPT alternative. Bundles llama.cpp + a clean UI.

LinuxWindowsmacOSopen sourcefreeTypeScript

KoboldCpp

Single-binary llama.cpp wrapper with KoboldAI UI for chat, story-writing, RP.

LinuxWindowsmacOSopen sourcefreeC/C++

llama.cpp

Reference C++ implementation for running LLaMA-family and other transformer models with GGUF quantization. Powers most of the others in this section.

LinuxWindowsmacOSopen sourceC/C++

LM Studio

Polished desktop app for discovering, downloading, and running local LLMs. OpenAI-compatible server mode. Free for personal + commercial.

LinuxWindowsmacOSclosed sourcefreeC/C++

LocalAI

Self-hosted, OpenAI-compatible inference server. Text, image, audio, embeddings — all on your machine.

LinuxWindowsmacOSopen sourcefreeGo

Mistral.rs

Rust LLM inference platform with quantization, vision, MoE, and speculative decoding.

LinuxWindowsmacOSopen sourcefreeRust

MLC LLM

Compile-once, deploy-anywhere LLM runtime. Targets WebGPU, Vulkan, CUDA, Metal, iOS, and Android from a single source.

LinuxWindowsmacOSmobilewebopen sourcefreePython

Ollama

Single-binary server with a built-in model library. Pull, run, and swap models with one command.

LinuxWindowsmacOSopen sourcefreeGo

SGLang

Fast LLM and VLM serving runtime with RadixAttention cache and structured-output support.

Linuxopen sourcefreePython

Text Generation WebUI

Gradio-based web UI for local LLMs. Supports GGUF, GPTQ, AWQ, EXL2.

LinuxWindowsmacOSopen sourcefreePython

vLLM

High-throughput inference engine with PagedAttention. Designed for serving, not desktop chat — pair with Open WebUI or LiteLLM.

Linuxopen sourcefreePython

Desktop Chat Apps · 4

Anything LLM

Workspace-style chat with built-in RAG. Works fully offline with a local LLM provider.

LinuxWindowsmacOSopen sourcefreeTypeScript

Faraday

Local-only character / role-play chat. Bundles inference, no API key needed.

LinuxWindowsmacOSclosed sourcefreeC/C++

Msty

Fast desktop chat with branching conversations and parallel-model comparison. Free tier covers personal local use.

LinuxWindowsmacOSclosed sourcefreeTypeScript

Open WebUI

Self-hosted "ChatGPT clone" of the open-source world. Pair with Ollama or any OpenAI-compatible local server.

LinuxWindowsmacOSwebopen sourcefreePython

Voice — Speech-to-Text · 7

Brethof Voice Pro

Desktop dictation app built on Qwen3-ASR + GGUF + llama.cpp. 36 languages, hotkey-anywhere transcription, file/microphone/system-audio input, LoRA personal voice training. 100% offline, no account required to transcribe. Disclosure: maintained by us.

LinuxWindowsclosed sourcefreepaidC/C++

faster-whisper

CTranslate2-based reimplementation. ~4× faster than reference Whisper at the same accuracy.

LinuxWindowsmacOSopen sourcefreePython

OpenAI Whisper

Reference Python implementation. Accurate but slower than the C++ ports; useful when you need the exact research behaviour.

LinuxWindowsmacOSopen sourcefreePython

RealtimeSTT

Low-latency streaming wrapper around faster-whisper for live dictation pipelines.

LinuxWindowsmacOSopen sourcefreePython

Vosk

Lightweight offline speech recognizer with 20+ language models. Real-time on CPU.

LinuxWindowsmacOSmobileopen sourcefreePython

Whisper.cpp

C++ port of OpenAI Whisper with GGUF quantization. Runs on CPU, Metal, CUDA, Vulkan.

LinuxWindowsmacOSmobileopen sourcefreeC/C++

WhisperX

faster-whisper plus forced alignment, voice-activity detection, and speaker diarization.

LinuxWindowsmacOSopen sourcefreePython

Voice — Text-to-Speech · 6

Bark

Multilingual generative audio. Speech, sound effects, and music cues from text prompts.

LinuxWindowsmacOSopen sourcefreePython

Coqui TTS

Comprehensive TTS toolkit. Multiple architectures (Tacotron, VITS, XTTS) and voice cloning.

LinuxWindowsmacOSopen sourcefreePython

Kokoro

Tiny ~80M-param TTS model, surprisingly natural for the size. Suitable for low-end hardware.

LinuxWindowsmacOSopen sourcefreePython

Mimic 3

Mycroft's neural TTS engine. Lightweight, multilingual.

LinuxWindowsmacOSmobileopen sourcefreePython

Piper

Fast neural TTS. ONNX runtime, dozens of voices and languages. Designed for Raspberry Pi-class hardware.

LinuxWindowsmacOSmobileopen sourcefreeC/C++

StyleTTS 2

High-fidelity expressive TTS with style transfer. Strong reference voice cloning.

LinuxWindowsmacOSopen sourcefreePython

Image Generation · 7

AUTOMATIC1111 / Stable Diffusion WebUI

The original ergonomic SD UI. Heavy plugin ecosystem.

LinuxWindowsmacOSopen sourcefreePython

ComfyUI

Node-graph workflow editor for diffusion models. Powers most modern local image and video pipelines.

LinuxWindowsmacOSopen sourcefreePython

Fooocus

Image generator with sane defaults — minimal knobs for great results. Built on top of Stable Diffusion.

LinuxWindowsmacOSopen sourcefreePython

Forge

Performance-tuned A1111 fork by lllyasviel. Lower VRAM, faster on modern GPUs.

LinuxWindowsmacOSopen sourcefreePython

InvokeAI

Pro-grade SD UI with strong canvas / inpainting tools. Enterprise tier; free local install remains open source.

LinuxWindowsmacOSopen sourceclosed sourcefreepaidPython

SD.Next

All-in-one fork of A1111 with broader backend support (Diffusers, ONNX, ROCm).

LinuxWindowsmacOSopen sourcefreePython

SwarmUI

Modular UI built on top of ComfyUI. User-friendly mode out of the box, full node-graph available when you need it.

LinuxWindowsmacOSopen sourcefreeTypeScript

Video Generation · 2

ComfyUI + LTX Video

ComfyUI nodes drive Lightricks LTX video models for text-to-video and image-to-video generation. The chunked-loop pattern (released in our [comfyui-workflows](https://github.com/BrethofAI/comfyui-workflows)) produces longer outputs than vanilla LTX allows.

LinuxWindowsmacOSopen sourcefreePython

Wan2GP

Stripped-down Wan2.2 video pipeline for low-VRAM consumer GPUs.

LinuxWindowsmacOSopen sourcefreePython

Code Assistants · 5

Aider

Terminal pair-programming. Bring-your-own-LLM via LiteLLM — run with Ollama or any OpenAI-compatible local endpoint.

LinuxWindowsmacOSopen sourcefreePython

Continue

IDE assistant with first-class local-LLM support. Defaults can be set to Ollama / LM Studio. VS Code + JetBrains.

LinuxWindowsmacOSopen sourcefreeTypeScript

Llama.vim

Vim plugin that streams llama.cpp completions inline. No cloud.

LinuxWindowsmacOSopen sourcefreeC/C++

Tabby

Self-hosted GitHub Copilot alternative. Local model serving with IDE plugins.

LinuxWindowsmacOSopen sourcefreeRust

twinny

Free local AI extension for VS Code. Chat + autocomplete via Ollama.

LinuxWindowsmacOSopen sourcefreeTypeScript

Local Agents · 3

Aider in /architect mode

Aider's planning mode separates "decide" and "edit" steps; works well with strong local reasoning models.

LinuxWindowsmacOSopen sourcefreePython

Continue Agent mode

Agentic editing flow inside Continue. Pair with a local model for fully-offline coding agents.

LinuxWindowsmacOSopen sourcefreeTypeScript

Open Interpreter

Code-execution agent that runs Python/shell on your machine. Local-LLM friendly.

LinuxWindowsmacOSopen sourcefreePython

Vector Databases · 6

Chroma

Embedding database designed for local-first usage. SQLite-style single-file or client/server.

LinuxWindowsmacOSopen sourcefreePython

Faiss

Library for similarity search. The retrieval engine inside many of the others.

LinuxWindowsmacOSopen sourcefreeC/C++

LanceDB

Embedded, columnar vector DB. Single-file, no server.

LinuxWindowsmacOSopen sourcefreeRust

Marqo

End-to-end vector search; OSS core, paid hosted version.

LinuxWindowsmacOSopen sourceclosed sourcefreepaidPython

Qdrant

High-performance vector DB. Self-host the open-source binary.

LinuxWindowsmacOSopen sourcefreeRust

Weaviate

Hybrid (vector + keyword) DB. Self-host the OSS distribution; cloud is optional.

LinuxWindowsmacOSopen sourcefreeGo

Embeddings · 3

BGE

BAAI's BGE family. Strong English + multilingual variants. Run via llama.cpp, sentence-transformers, or fastembed.

LinuxWindowsmacOSopen sourcefreePython

fastembed

Lightweight CPU-friendly embedding library by Qdrant.

LinuxWindowsmacOSopen sourcefreePython

Sentence Transformers

Reference Python library for sentence + paragraph embeddings.

LinuxWindowsmacOSopen sourcefreePython

Training & Fine-tuning · 5

Axolotl

Config-driven fine-tuning framework. LoRA, QLoRA, full fine-tunes.

Linuxopen sourcefreePython

diffusion-pipe

Pipeline-parallel trainer for diffusion models. Multi-GPU LoRA on large image / video models.

Linuxopen sourcefreePython

MLX

Apple's native ML framework for Apple Silicon. Train and infer on M-series Macs without CUDA workarounds.

macOSopen sourcefreePython

Ostris ai-toolkit

LoRA training UI for Flux, SD3, SDXL, LTX. Works on consumer hardware.

LinuxWindowsopen sourcefreePython

Unsloth

Fine-tune LLMs 2× faster with 70% less VRAM than reference HuggingFace pipelines.

LinuxWindowsmacOSopen sourcefreePython

Local Search & RAG · 5

Anything LLM

Self-hosted workspace tool with integrated RAG. Listed twice intentionally — strong both as a chat app and a RAG layer.

LinuxWindowsmacOSopen sourcefreeTypeScript

LlamaIndex

Toolkit for building RAG pipelines. Works fully offline with local models + vector DBs.

LinuxWindowsmacOSopen sourcefreePython

Perplexica

Open-source AI search powered by SearXNG + your local LLM.

LinuxWindowsmacOSopen sourcefreeTypeScript

PrivateGPT

Ingest documents locally and query them with an offline LLM.

LinuxWindowsmacOSopen sourcefreePython

SearXNG

Self-hosted meta-search engine. Pair with a local LLM for an offline Perplexity-style assistant.

LinuxWindowsmacOSopen sourcefreePython

Operating Systems Tuned for AI · 5

Bazzite

Container-native gaming and AI distro. Steam Deck-friendly, latest drivers, easy CUDA.

Linuxopen sourcefree

Bluefin

Fedora-based, atomic, container-first. Good "drop you in a known state" workstation for AI work.

Linuxopen sourcefree

CachyOS

Arch-based desktop distro with a tuned kernel and recent NVIDIA / AMD drivers. Sane out-of-the-box for new GPUs (Blackwell, RDNA 4).

Linuxopen sourcefree

NixOS

Reproducible system config. Best when you need identical CUDA + ML toolchain across machines.

Linuxopen sourcefree

Pop!_OS

System76's NVIDIA-friendly desktop distro. ISO ships with proprietary drivers for plug-and-play GPU work.

Linuxopen sourcefree

Hardware-Specific Runtimes · 4

MLX

Apple Silicon-native ML library. Already listed under training; it also ships an inference runtime competitive with llama.cpp on M-series.

macOSopen sourcefreePython

NVIDIA TensorRT-LLM

NVIDIA's optimised LLM runtime for their data-center and consumer GPUs. Closed-weights binary; fastest CUDA path for many models.

LinuxWindowsclosed sourcefreePython

OpenVINO

Intel's inference toolkit. CPU, iGPU, dGPU (Arc), and NPU support for Intel laptops.

LinuxWindowsmacOSopen sourcefreeC/C++

ROCm + llama.cpp HIP

AMD GPU inference path. Llama.cpp's HIP backend now reaches CUDA parity on RDNA 3/4 in many workloads.

LinuxWindowsmacOSopen sourceC/C++

Run something great locally? Tell us.

Open a PR — or email us the tool and we'll check it. Real VRAM/disk numbers, not vibes.

Open a PR Email a suggestion

Everything we build

External:   YouTube · GitHub