You don't need an agent framework to put an AI agent into production. A 30-line Python file, a Dockerfile, and a $5 VPS will get you running. This tutorial walks through building a minimal AI agent in FastAPI, packaging it with Docker, and shipping it to your own server with Git-based auto-deploys — so every push to main rolls out cleanly, with one-click rollback if it goes wrong.
By the end you'll have a working /ask endpoint that takes a prompt, calls an LLM, and returns a response — running on infrastructure you control, with a clear path to harden, monitor, and grow into something bigger.
What you'll build
A small HTTP service that wraps an LLM call:
- FastAPI for the HTTP layer — lightweight, async, good docs
- The OpenAI Python SDK for the model call — swap freely for Anthropic, OpenRouter, or any OpenAI-compatible API
- Docker to package the app
- A VPS running Ubuntu — Hetzner, DigitalOcean, Vultr, and Linode all work fine
- DeployHQ to connect your GitHub repo to the server, so a
git pushbecomes a deploy
We're deliberately skipping the agent frameworks (LangChain, LangGraph, CrewAI, Mastra). Frameworks become useful once you need tool calls, memory, or multi-step planning — but they obscure what's actually happening on your first deploy. Start raw, add what you need.
Why a VPS instead of a serverless platform? Serverless is fine for very bursty agents, but it punishes long-running tool calls (some agent steps take 30+ seconds), cold starts hurt latency, and the per-invocation pricing gets expensive once traffic is sustained. A VPS costs ~$5/month, has predictable latency, and lets you run background workers and persistent connections without contortion.
Prerequisites
- An OpenAI API key (or any compatible provider)
- A GitHub account with a fresh empty repo
- A VPS with Ubuntu 22.04 or 24.04 and SSH access
- A DeployHQ account (free trial available — sign-up link at the end)
- Python 3.12+ and Docker installed locally
Step 1: Build a minimal agent
Create a project directory:
mkdir my-first-agent && cd my-first-agent
mkdir app
Add app/main.py:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI
import os
app = FastAPI(title="My First Agent")
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
SYSTEM_PROMPT = (
"You are a concise assistant. Answer in one short paragraph "
"unless the user asks for more detail."
)
class AskRequest(BaseModel):
prompt: str
class AskResponse(BaseModel):
reply: str
@app.post("/ask", response_model=AskResponse)
def ask(req: AskRequest):
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": req.prompt},
],
)
return AskResponse(reply=response.choices[0].message.content)
except Exception as exc:
raise HTTPException(status_code=500, detail=str(exc))
@app.get("/health")
def health():
return {"status": "ok"}
Two things matter here:
- The API key is read from the environment. Never hardcode secrets. We'll inject
OPENAI_API_KEYfrom DeployHQ later. - There's a
/healthendpoint. Docker, load balancers, and DeployHQ all use health checks to know whether the agent is alive. Add one from day one.
Install dependencies and freeze them:
python -m venv .venv
source .venv/bin/activate
pip install fastapi 'uvicorn[standard]' openai
pip freeze > requirements.txt
The pip freeze step pins exact versions so the container builds the same way every time. Skip it and you'll eventually get a working laptop build and a broken server build that nobody can reproduce.
Step 2: Package it with Docker
Add a Dockerfile at the project root:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app/ ./app/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
And a .dockerignore to keep the build context lean:
.venv
__pycache__
.env
.git
For local testing and the eventual deploy, add docker-compose.yml:
services:
agent:
build: .
ports:
- "8000:8000"
env_file:
- .env
restart: unless-stopped
Compose makes environment variables, port mapping, and restart policies declarative — easier to reason about than long docker run flags.
Step 3: Run it locally
Create a .env file (and make sure .env is in .gitignore):
OPENAI_API_KEY=sk-...
Then:
docker compose up --build
In another terminal:
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"prompt": "Explain Docker layers in one sentence."}'
You should get JSON back with a one-sentence answer. If you do, the agent works — everything from here is about getting it onto a VPS and keeping it there.
Step 4: Prepare the VPS
SSH into your VPS and install Docker:
ssh root@your-vps-ip
curl -fsSL https://get.docker.com | sh
Create a non-root user for deploys — DeployHQ will SSH in as this user:
adduser deploy
usermod -aG docker deploy
mkdir -p /home/deploy/agent
chown deploy:deploy /home/deploy/agent
Copy your SSH key to the deploy user so DeployHQ can authenticate without a password. From your local machine:
ssh-copy-id deploy@your-vps-ip
You should now be able to ssh deploy@your-vps-ip without a password prompt. If that works, DeployHQ can do the same.
Step 5: Wire it up with DeployHQ
Push your code to GitHub first:
git init && git add . && git commit -m "Initial agent"
git remote add origin git@github.com:you/my-first-agent.git
git push -u origin main
In DeployHQ:
- Create a new project and point it at your GitHub repository. DeployHQ supports deploying directly from a GitHub repo to your server without you wrangling webhooks by hand.
- Add a server. Use the VPS IP, port 22, username
deploy, and the SSH key you just authorised. Set the deployment path to/home/deploy/agent. - Configure the SSH command. This is where DeployHQ earns its keep on Docker workflows. Under the server's SSH commands, add a single deploy step:
bash cd /home/deploy/agent && docker compose up -d --build - Set the OpenAI API key as a config file. In DeployHQ's server config, add a config file at
/home/deploy/agent/.envcontainingOPENAI_API_KEY=sk-.... DeployHQ writes it to disk before every deploy, so Compose picks it up — and the secret never lives in your repo or your image. - Enable auto-deploy on push to
main. Every merge now triggers a build.
Trigger the first deploy manually from the DeployHQ dashboard. Watch the log: DeployHQ checks out the repo, writes the env file, SSHes in, and runs docker compose up -d --build. About a minute later, hit http://your-vps-ip:8000/ask and your agent responds.
If a deploy ever goes wrong, DeployHQ's one-click rollback puts the previous version back on the server in seconds — no need to revert commits or rebuild containers manually.
Step 6: Production hardening
A working deploy is the floor, not the ceiling. Five things to add before the agent does anything real:
1. Put Nginx in front. Don't expose port 8000 to the internet. Bind the container to localhost in Compose:
ports:
- "127.0.0.1:8000:8000"
Then run Nginx as a reverse proxy with Let's Encrypt for HTTPS. Same pattern works for any service on the box.
2. Add a real health check so Docker restarts unhealthy containers:
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost:8000/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
3. Cap resource use so a runaway agent doesn't take the VPS down:
deploy:
resources:
limits:
memory: 512M
cpus: "1.0"
4. Pin the base image by digest, not just by tag. Use python:3.12-slim@sha256:... so an upstream rebuild can't change your build silently.
5. Watch your spend. LLM calls are the expensive part of an agent — far more than the server. Add per-request token logging now, so you know what each /ask costs before traffic grows. A 30-line wrapper around client.chat.completions.create is enough.
For the deploys themselves, DeployHQ runs zero downtime deployments by default — when you graduate to multiple replicas behind a load balancer, the rollout pattern stays the same.
Where to go from here
The agent you just deployed is intentionally tiny: one endpoint, one LLM call, no tools, no memory. That's a good starting point, but most useful agents need more. Two paths from here:
Add capabilities to your own code. Function calling (tools) is the natural next step — the OpenAI SDK supports it out of the box, no framework required. You add a tools list to chat.completions.create, the model returns the tool it wants to call, you execute it, and you loop. A few hundred more lines of Python turns the agent above into something genuinely useful.
Adopt an orchestrator when you start hand-rolling skills, memory, or multi-step workflows. A few we've covered in depth:
- Self-host the Paperclip agent orchestrator on a VPS with Docker — a Docker-first, pre-built orchestrator that uses the same DeployHQ deploy pattern as this tutorial.
- Deploy OpenClaw on a VPS with SSL and auto-deploys — a self-hosted AI assistant with a plugin (Skills) system you can extend.
- Deploy Hermes Agent on a VPS — for self-improving agents that learn persistent skills over time.
If you want agents inside your deployment pipeline rather than as a deployed service, see how AI agents fit into CI/CD pipelines from GitHub issue to production deploy — same building blocks, very different shape.
And if you'd rather drive deploys from a terminal AI agent than build your own service, the DeployHQ CLI lets Claude Code, Cursor, or Codex trigger deploys for you, and you can pair it with Google's open-source Gemini CLI for an interactive terminal agent.
Ready to wire your first agent — or your tenth — to a Git-based deploy pipeline? Start a free DeployHQ trial and have your first agent deploying from main in under ten minutes.
Need a hand? Email us at support@deployhq.com or find us on X at @deployhq.