Build your own AI agent.

You can get a working AI agent running on your own hardware this weekend. The hard part isn't any single tool — it's that four different things all get called “an AI agent.” Here's the honest map, the tools worth knowing, and where paying someone actually earns its keep.

A self-hosted agent is just four layers stacked on each other. Confusing them is why most people stall before they start — so here's what each one is.

The four layers

The model is the brain — Llama, Qwen, DeepSeek or a Hermes fine-tune. You either run it on your own machine or call a hosted API with your own key (OpenAI, Anthropic, or AWS Bedrock).
The runtime loads and serves that model. On a Mac mini, Ollama or LM Studio is a one-command start.
The agent framework (or “harness”) is what turns a model that only talks into something that acts: it reads files, runs commands, calls your tools, and remembers.
The chat UI is the front door — the window or messenger you actually talk to it through.

Almost every tool below is free to self-host. Your real cost is the model: either a machine with enough memory, or a few cents per request to a hosted API. Keep that in mind as the price tag — not the software.

1 · Run a model on your own hardware

The bottleneck is memory. A Mac mini with Apple's unified memory is a capable single-user box: 16 GB runs a 3–7B model at 4-bit, 32 GB handles 14B comfortably and 24–32B with tighter quantization, and a 48–64 GB M4 Pro reaches 30B-class models at roughly 12–18 tokens a second — about reading speed, fine for chat, slow for bulk work. Many people at once, or the very largest models? Then a GPU server (an AWS EC2 GPU instance, say) — or just a hosted API key — is the saner path.

The fastest way to see it work: install Ollama, pull a model like Qwen or Llama with one command, and point a chat UI such as Open WebUI at it. Ten minutes later you have private AI running on your own machine — no agent yet, but the floor everything else stands on.

Runtimes — load and serve the model

Ollama — The simplest start: one command downloads and runs a quantized model, with a local API other tools can call. Open source (MIT). (GitHub)
LM Studio — A polished desktop app to find, download and run local models, with a chat window and an OpenAI-compatible server. Closed-source but free for personal and work use.
llama.cpp — The dependency-free C/C++ engine most local tools are built on; runs models efficiently on CPUs and Apple Silicon. Open source (MIT). (GitHub)
vLLM — When you outgrow a single user: a high-throughput serving engine built for production and concurrency. Open source (Apache-2.0). (GitHub)

The model itself is a separate download. A fair warning before you pick one: “open weights” rarely means “open source” in the strict sense — each family ships its own licence, and a few restrict commercial use or attach an acceptable-use policy.

Open-weight models

Meta Llama — The most widely supported open-weight family, from 8B up to the largest open-weight releases. Source-available under Meta's own Llama Community Licence — not OSI open source, with a clause for very large deployments. (GitHub)
Qwen — Alibaba's family, from tiny 0.6B models to large mixture-of-experts, strong at reasoning and code. Properly permissive (Apache-2.0). (GitHub)
DeepSeek — The V3 base and R1 reasoning models. R1 and the current V3 weights are MIT-licensed; the original V3 release used DeepSeek's own model licence — check the version you pull. (GitHub)
Mistral / Mixtral — Lean European models with strong cost/performance; the classic Mistral 7B and Mixtral are Apache-2.0, though some newer ones aren't. (GitHub)
Nous Hermes — Fine-tunes of open base models tuned for steerability and tool-calling — a good “agent brain.” Inherits its base model's licence (e.g. Llama's). (GitHub)

2 · The agent — frameworks that act

A model on its own just produces text. A harness gives it hands: tools, file access, a command line, memory. A new crop of “personal assistant you run yourself” projects let you drive an agent from a chat app you already use — WhatsApp, Telegram, Slack — and there's a mature set of developer-focused agents if your work is code.

Personal & general-purpose agents

OpenClaw — A self-hosted personal agent you run on your own machine and drive from chat apps — WhatsApp, Telegram, Slack, Signal and more; it reads your files, handles calendar and email, browses, and runs commands. Open source (MIT). (GitHub)
NanoClaw — A deliberately tiny, container-isolated take on the same idea — small enough to actually read the code, with each agent sandboxed in Docker. Open source (MIT). (GitHub)
Hermes Agent — Nous Research's self-improving CLI agent with a learning loop, ~40 built-in tools and multi-channel access. Model-agnostic and open source (MIT). Note: the agent, not the same-named model. (GitHub)
Goose — Block's extensible on-machine agent (desktop, CLI and API) that connects to tools via MCP and works with any LLM. Open source (Apache-2.0). (GitHub)
Open Interpreter — Lets an LLM run code on your machine through a natural-language prompt — the simplest “make my computer do things” agent. Open source (Apache-2.0). (GitHub)

If your work is code

OpenHands — If your “tools” are codebases: an agent that edits code, runs commands and calls APIs like a developer would. Open source (MIT), with a paid cloud. Formerly OpenDevin. (GitHub)
Aider — A terminal AI pair-programmer with deep git integration that edits across a whole repo and works with 100+ models. Open source (Apache-2.0). (GitHub)

A caution before you go further: these agents run commands and touch your files. That's what makes them useful, and it's why the security section below isn't optional.

3 · Orchestration & no-code builders

For multi-step processes, or to coordinate several sub-agents — or if you'd just rather wire logic on a canvas than write code — you want an orchestration layer. These run from code libraries to drag-and-drop builders; AG2 and Rivet are worth a look too.

Frameworks & builders

LangGraph — A low-level library for stateful, graph-shaped agent workflows with durable execution and human-in-the-loop checkpoints. Open source (MIT). (GitHub)
CrewAI — Orchestrates a “crew” of role-playing agents that split a task between them. Lean and standalone. Open source (MIT). (GitHub)
Letta — Built around persistent long-term memory, so agents remember across sessions. Formerly MemGPT. Open source (Apache-2.0). (GitHub)
smolagents — Hugging Face's ~1,000-line library whose agents write their actions as Python — minimal and model-agnostic. Open source (Apache-2.0). (GitHub)
Dify — A mostly visual platform for agentic workflows, RAG and LLM apps, self-hostable. Source-available under its own licence (Apache base plus a few restrictions). (GitHub)
Flowise — Drag-and-drop builder for chatbots and agents — a visual way in without much code. Open source (Apache-2.0). (GitHub)
n8n — Workflow automation with hundreds of integrations and native AI nodes — the bridge between “agent” and “plain automation.” Source-available (fair-code), free to self-host. (GitHub)

4 · A face for it — chat UIs

If you want a clean window to talk to your model — in a browser, on your phone, across your team — a self-hosted chat UI gives you that without sending a word to anyone else's servers.

Open WebUI — A feature-rich self-hosted UI for Ollama and OpenAI-compatible APIs, with RAG and tools. Source-available (BSD with a branding clause). (GitHub)
LibreChat — A multi-model, ChatGPT-style web UI with agents, MCP, search and multi-user support. Open source (MIT). (GitHub)
AnythingLLM — An all-in-one desktop/Docker app for private chat over your own documents, with agents and workspaces. Open source (MIT). (GitHub)
Khoj — A self-hostable “second brain” that answers from your docs or the web and schedules its own automations. Open source (AGPL-3.0). (GitHub)

How it plugs into your tools — MCP

The standard that connects an agent to the rest of your stack is the Model Context Protocol (MCP), Anthropic's open standard — now community-run — for connecting an agent to your tools and data. Gmail, your calendar, a database, your CRM: each becomes an “MCP server” the agent can call. Most of the frameworks above speak MCP, so a connector you set up once works across them.

Model Context Protocol — The official docs for the open standard that connects agents to tools and data — start here to understand the wiring.
awesome-mcp-servers — A large community-curated index of ready-made MCP servers, from Gmail and Slack to Postgres and filesystems. Open source (MIT). (GitHub)

The part most people skip — security

An agent that can act for you can also do the wrong thing in your name. A few rules before you point one at real accounts:

Keep secrets in a vault, never in a prompt.
Give it the fewest permissions it can actually work with.
Make it ask first before anything destructive, or anything that leaves your network.
Don't put it on the open internet.

OWASP publishes short risk lists written specifically for this — worth a read before you connect anything live.

OWASP Top 10 for LLM Applications — The canonical list of LLM risks — prompt injection, data leakage, excessive agency and more. The baseline checklist.
OWASP Top 10 for Agentic Applications — The agent-specific follow-up: goal hijacking, identity abuse, runaway autonomy, and where to keep a human in the loop.
OWASP Secrets Management Cheat Sheet — Practical guidance on storing, rotating and least-privileging the API keys and logins your agent needs.

Want to go deeper?

Anthropic — Building Effective AI Agents — The most-cited primer on the difference between workflows and agents, with patterns you'll reuse. Free to read.
OpenAI — A Practical Guide to Building Agents — A free PDF covering models, tools, guardrails and orchestration, drawn from real deployments.
Hugging Face — AI Agents Course — A free, hands-on course from the fundamentals up through smolagents and LangGraph. (GitHub)

Where I fit. Everything above is real and reachable — you can wire it together yourself, and if you want to learn, those links are a real place to start. What I sell is different: a clean build with your actual tools, running on a platform that's already in production for paying customers, with the security set up right and someone keeping it running. If you'd rather build it yourself and just want a sounding board, ask — happy to point the way.

Or have it built — your own AI agent

Common questions.

Do I need a GPU?

Not necessarily. A Mac mini with 16–32 GB of unified memory runs small-to-mid models fine for one person. You only need a dedicated GPU for large models or many simultaneous users — and even then, a hosted API key is often cheaper than buying hardware.

Which model should I start with?

Whatever your runtime makes easy. Ollama pulls Llama or Qwen in one command, and an 8–14B model is plenty to learn on. Move up only when you hit a real limit.

Is my data safe if I self-host?

Safer by default — nothing leaves your machine unless you send it out. But “self-hosted” isn't automatically “secure”: an agent holding your API keys with broad permissions is a real attack surface. Read the OWASP LLM Top 10 before you connect it to live accounts.

Open source or your platform — what's the real difference?

The open-source pieces are free and capable; the work is assembling them cleanly, securing them, and keeping them running. My platform is that work already done and kept running. If you enjoy tinkering, build it yourself. If you'd rather it were just handled, that's what I sell.

Can you help if I get stuck building it myself?

Yes. Plenty of people start solo and call when an integration or the security gets fiddly. A short call is usually enough to unblock you.