How to Deploy Your First AI Agent to a VPS with Docker

By Alex M · Posted on 1st June 2026

You don't need an agent framework to put an AI agent into production. A 30-line Python file, a Dockerfile, and a $5 VPS will get you running. This tutorial walks through building a minimal AI agent in FastAPI, packaging it with Docker, and shipping it to your own server with Git-based auto-deploys — so every push to main rolls out cleanly, with one-click rollback if it goes wrong.

By the end you'll have a working /ask endpoint that takes a prompt, calls an LLM, and returns a response — running on infrastructure you control, with a clear path to harden, monitor, and grow into something bigger.

What you'll build

A small HTTP service that wraps an LLM call:

FastAPI for the HTTP layer — lightweight, async, good docs
The OpenAI Python SDK for the model call — swap freely for Anthropic, OpenRouter, or any OpenAI-compatible API
Docker to package the app
A VPS running Ubuntu — Hetzner, DigitalOcean, Vultr, and Linode all work fine
DeployHQ to connect your GitHub repo to the server, so a git push becomes a deploy

We're deliberately skipping the agent frameworks (LangChain, LangGraph, CrewAI, Mastra). Frameworks become useful once you need tool calls, memory, or multi-step planning — but they obscure what's actually happening on your first deploy. Start raw, add what you need.

Why a VPS instead of a serverless platform? Serverless is fine for very bursty agents, but it punishes long-running tool calls (some agent steps take 30+ seconds), cold starts hurt latency, and the per-invocation pricing gets expensive once traffic is sustained. A VPS costs ~$5/month, has predictable latency, and lets you run background workers and persistent connections without contortion.

Prerequisites

An OpenAI API key (or any compatible provider)
A GitHub account with a fresh empty repo
A VPS with Ubuntu 22.04 or 24.04 and SSH access
A DeployHQ account (free trial available — sign-up link at the end)
Python 3.12+ and Docker installed locally

Step 1: Build a minimal agent

Create a project directory:

mkdir my-first-agent && cd my-first-agent
mkdir app

Add app/main.py:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI
import os

app = FastAPI(title="My First Agent")
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

SYSTEM_PROMPT = (
    "You are a concise assistant. Answer in one short paragraph "
    "unless the user asks for more detail."
)

class AskRequest(BaseModel):
    prompt: str

class AskResponse(BaseModel):
    reply: str

@app.post("/ask", response_model=AskResponse)
def ask(req: AskRequest):
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": req.prompt},
            ],
        )
        return AskResponse(reply=response.choices[0].message.content)
    except Exception as exc:
        raise HTTPException(status_code=500, detail=str(exc))

@app.get("/health")
def health():
    return {"status": "ok"}

Two things matter here:

The API key is read from the environment. Never hardcode secrets. We'll inject OPENAI_API_KEY from DeployHQ later.
There's a /health endpoint. Docker, load balancers, and DeployHQ all use health checks to know whether the agent is alive. Add one from day one.

Install dependencies and freeze them:

python -m venv .venv
source .venv/bin/activate
pip install fastapi 'uvicorn[standard]' openai
pip freeze > requirements.txt

The pip freeze step pins exact versions so the container builds the same way every time. Skip it and you'll eventually get a working laptop build and a broken server build that nobody can reproduce.

Step 2: Package it with Docker

Add a Dockerfile at the project root:

FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app/ ./app/

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

And a .dockerignore to keep the build context lean:

.venv
__pycache__
.env
.git

For local testing and the eventual deploy, add docker-compose.yml:

services:
  agent:
    build: .
    ports:
      - "8000:8000"
    env_file:
      - .env
    restart: unless-stopped

Compose makes environment variables, port mapping, and restart policies declarative — easier to reason about than long docker run flags.

Step 3: Run it locally

Create a .env file (and make sure .env is in .gitignore):

OPENAI_API_KEY=sk-...

Then:

docker compose up --build

In another terminal:

curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Explain Docker layers in one sentence."}'

You should get JSON back with a one-sentence answer. If you do, the agent works — everything from here is about getting it onto a VPS and keeping it there.

Step 4: Prepare the VPS

SSH into your VPS and install Docker:

ssh root@your-vps-ip
curl -fsSL https://get.docker.com | sh

Create a non-root user for deploys — DeployHQ will SSH in as this user:

adduser deploy
usermod -aG docker deploy
mkdir -p /home/deploy/agent
chown deploy:deploy /home/deploy/agent

Copy your SSH key to the deploy user so DeployHQ can authenticate without a password. From your local machine:

ssh-copy-id deploy@your-vps-ip

You should now be able to ssh deploy@your-vps-ip without a password prompt. If that works, DeployHQ can do the same.

Step 5: Wire it up with DeployHQ

Push your code to GitHub first:

git init && git add . && git commit -m "Initial agent"
git remote add origin git@github.com:you/my-first-agent.git
git push -u origin main

In DeployHQ:

Create a new project and point it at your GitHub repository. DeployHQ supports deploying directly from a GitHub repo to your server without you wrangling webhooks by hand.
Add a server. Use the VPS IP, port 22, username deploy, and the SSH key you just authorised. Set the deployment path to /home/deploy/agent.
Configure the SSH command. This is where DeployHQ earns its keep on Docker workflows. Under the server's SSH commands, add a single deploy step: bash cd /home/deploy/agent && docker compose up -d --build
Set the OpenAI API key as a config file. In DeployHQ's server config, add a config file at /home/deploy/agent/.env containing OPENAI_API_KEY=sk-.... DeployHQ writes it to disk before every deploy, so Compose picks it up — and the secret never lives in your repo or your image.
Enable auto-deploy on push to main. Every merge now triggers a build.

Trigger the first deploy manually from the DeployHQ dashboard. Watch the log: DeployHQ checks out the repo, writes the env file, SSHes in, and runs docker compose up -d --build. About a minute later, hit http://your-vps-ip:8000/ask and your agent responds.

If a deploy ever goes wrong, DeployHQ's one-click rollback puts the previous version back on the server in seconds — no need to revert commits or rebuild containers manually.

Step 6: Production hardening

A working deploy is the floor, not the ceiling. Five things to add before the agent does anything real:

1. Put Nginx in front. Don't expose port 8000 to the internet. Bind the container to localhost in Compose:

ports:
  - "127.0.0.1:8000:8000"

Then run Nginx as a reverse proxy with Let's Encrypt for HTTPS. Same pattern works for any service on the box.

2. Add a real health check so Docker restarts unhealthy containers:

healthcheck:
  test: ["CMD", "curl", "-fsS", "http://localhost:8000/health"]
  interval: 30s
  timeout: 5s
  retries: 3
  start_period: 10s

3. Cap resource use so a runaway agent doesn't take the VPS down:

deploy:
  resources:
    limits:
      memory: 512M
      cpus: "1.0"

4. Pin the base image by digest, not just by tag. Use python:3.12-slim@sha256:... so an upstream rebuild can't change your build silently.

5. Watch your spend. LLM calls are the expensive part of an agent — far more than the server. Add per-request token logging now, so you know what each /ask costs before traffic grows. A 30-line wrapper around client.chat.completions.create is enough.

For the deploys themselves, DeployHQ runs zero downtime deployments by default — when you graduate to multiple replicas behind a load balancer, the rollout pattern stays the same.

Where to go from here

The agent you just deployed is intentionally tiny: one endpoint, one LLM call, no tools, no memory. That's a good starting point, but most useful agents need more. Two paths from here:

Add capabilities to your own code. Function calling (tools) is the natural next step — the OpenAI SDK supports it out of the box, no framework required. You add a tools list to chat.completions.create, the model returns the tool it wants to call, you execute it, and you loop. A few hundred more lines of Python turns the agent above into something genuinely useful.

Adopt an orchestrator when you start hand-rolling skills, memory, or multi-step workflows. A few we've covered in depth:

Self-host the Paperclip agent orchestrator on a VPS with Docker — a Docker-first, pre-built orchestrator that uses the same DeployHQ deploy pattern as this tutorial.
Deploy OpenClaw on a VPS with SSL and auto-deploys — a self-hosted AI assistant with a plugin (Skills) system you can extend.
Deploy Hermes Agent on a VPS — for self-improving agents that learn persistent skills over time.

If you want agents inside your deployment pipeline rather than as a deployed service, see how AI agents fit into CI/CD pipelines from GitHub issue to production deploy — same building blocks, very different shape.

And if you'd rather drive deploys from a terminal AI agent than build your own service, the DeployHQ CLI lets Claude Code, Cursor, or Codex trigger deploys for you, and you can pair it with Google's open-source Gemini CLI for an interactive terminal agent.

Ready to wire your first agent — or your tenth — to a Git-based deploy pipeline? Start a free DeployHQ trial and have your first agent deploying from main in under ten minutes.

Need a hand? Email us at support@deployhq.com or find us on X at @deployhq.