You don't need an agent framework to put an AI agent into production. A 30-line Python file, a Dockerfile, and a $5 VPS will get you running. This tutorial walks through building a minimal AI agent in FastAPI, packaging it with Docker, and shipping it to your own server with Git-based auto-deploys — so every push to `main` rolls out cleanly, with one-click rollback if it goes wrong.

By the end you'll have a working `/ask` endpoint that takes a prompt, calls an LLM, and returns a response — running on infrastructure you control, with a clear path to harden, monitor, and grow into something bigger.

## What you'll build

A small HTTP service that wraps an LLM call:

- **FastAPI** for the HTTP layer — lightweight, async, good docs
- **The OpenAI Python SDK** for the model call — swap freely for Anthropic, [OpenRouter](https://www.deployhq.com/blog/openrouter-practical-guide-teams), or any OpenAI-compatible API
- **Docker** to package the app
- **A VPS** running Ubuntu — [Hetzner](https://www.deployhq.com/blog/how-to-deploy-django-on-a-budget-with-hetzner-and-deployhq), DigitalOcean, Vultr, and Linode all work fine
- **DeployHQ** to connect your GitHub repo to the server, so a `git push` becomes a deploy

We're deliberately skipping the agent frameworks (LangChain, LangGraph, CrewAI, Mastra). Frameworks become useful once you need tool calls, memory, or multi-step planning — but they obscure what's actually happening on your first deploy. Start raw, add what you need.

> **Why a VPS instead of a serverless platform?** Serverless is fine for very bursty agents, but it punishes long-running tool calls (some agent steps take 30+ seconds), cold starts hurt latency, and the per-invocation pricing gets expensive once traffic is sustained. A VPS costs ~$5/month, has predictable latency, and lets you run background workers and persistent connections without contortion.

## Prerequisites

- An OpenAI API key (or any compatible provider)
- A GitHub account with a fresh empty repo
- A VPS with Ubuntu 22.04 or 24.04 and SSH access
- A [DeployHQ](https://www.deployhq.com) account (free trial available — sign-up link at the end)
- Python 3.12+ and Docker installed locally

## Step 1: Build a minimal agent

Create a project directory:

```
mkdir my-first-agent && cd my-first-agent
mkdir app
```

Add `app/main.py`:

```
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI
import os

app = FastAPI(title="My First Agent")
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

SYSTEM_PROMPT = (
    "You are a concise assistant. Answer in one short paragraph "
    "unless the user asks for more detail."
)

class AskRequest(BaseModel):
    prompt: str

class AskResponse(BaseModel):
    reply: str

@app.post("/ask", response_model=AskResponse)
def ask(req: AskRequest):
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": req.prompt},
            ],
        )
        return AskResponse(reply=response.choices[0].message.content)
    except Exception as exc:
        raise HTTPException(status_code=500, detail=str(exc))

@app.get("/health")
def health():
    return {"status": "ok"}
```

Two things matter here:

1. **The API key is read from the environment.** Never hardcode secrets. We'll inject `OPENAI_API_KEY` from [DeployHQ](https://www.deployhq.com) later.
2. **There's a `/health` endpoint.** Docker, load balancers, and [DeployHQ](https://www.deployhq.com) all use health checks to know whether the agent is alive. Add one from day one.

Install dependencies and freeze them:

```
python -m venv .venv
source .venv/bin/activate
pip install fastapi 'uvicorn[standard]' openai
pip freeze > requirements.txt
```

The `pip freeze` step pins exact versions so the container builds the same way every time. Skip it and you'll eventually get a working laptop build and a broken server build that nobody can reproduce.

## Step 2: Package it with Docker

Add a `Dockerfile` at the project root:

```
FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app/ ./app/

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

And a `.dockerignore` to keep the build context lean:

```
.venv
__pycache__.env
.git
```

For local testing and the eventual deploy, add `docker-compose.yml`:

```
services:
  agent:
    build: .
    ports:
      - "8000:8000"
    env_file:
      - .env
    restart: unless-stopped
```

Compose makes environment variables, port mapping, and restart policies declarative — easier to reason about than long `docker run` flags.

## Step 3: Run it locally

Create a `.env` file (and make sure `.env` is in `.gitignore`):

```
OPENAI_API_KEY=sk-...
```

Then:

```
docker compose up --build
```

In another terminal:

```
curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Explain Docker layers in one sentence."}'
```

You should get JSON back with a one-sentence answer. If you do, the agent works — everything from here is about getting it onto a VPS and keeping it there.

## Step 4: Prepare the VPS

SSH into your VPS and install Docker:

```
ssh root@your-vps-ip
curl -fsSL https://get.docker.com | sh
```

Create a non-root user for deploys — [DeployHQ](https://www.deployhq.com) will SSH in as this user:

```
adduser deploy
usermod -aG docker deploy
mkdir -p /home/deploy/agent
chown deploy:deploy /home/deploy/agent
```

Copy your SSH key to the `deploy` user so [DeployHQ](https://www.deployhq.com) can authenticate without a password. From your local machine:

```
ssh-copy-id deploy@your-vps-ip
```

You should now be able to `ssh deploy@your-vps-ip` without a password prompt. If that works, [DeployHQ](https://www.deployhq.com) can do the same.

## Step 5: Wire it up with DeployHQ

Push your code to GitHub first:

```
git init && git add . && git commit -m "Initial agent"
git remote add origin git@github.com:you/my-first-agent.git
git push -u origin main
```

In DeployHQ:

1. **Create a new project** and point it at your GitHub repository. [DeployHQ](https://www.deployhq.com) supports [deploying directly from a GitHub repo to your server](https://www.deployhq.com/deploy-from-github) without you wrangling webhooks by hand.
2. **Add a server.** Use the VPS IP, port 22, username `deploy`, and the SSH key you just authorised. Set the deployment path to `/home/deploy/agent`.
3. **Configure the SSH command.** This is where [DeployHQ](https://www.deployhq.com) earns its keep on Docker workflows. Under the server's SSH commands, add a single deploy step:`bash
cd /home/deploy/agent && docker compose up -d --build
`
4. **Set the OpenAI API key as a config file.** In DeployHQ's server config, add a config file at `/home/deploy/agent/.env` containing `OPENAI_API_KEY=sk-...`. [DeployHQ](https://www.deployhq.com) writes it to disk before every deploy, so Compose picks it up — and the secret never lives in your repo or your image.
5. **Enable auto-deploy on push to `main`.** Every merge now triggers a build.

Trigger the first deploy manually from the [DeployHQ](https://www.deployhq.com) dashboard. Watch the log: [DeployHQ](https://www.deployhq.com) checks out the repo, writes the env file, SSHes in, and runs `docker compose up -d --build`. About a minute later, hit `http://your-vps-ip:8000/ask` and your agent responds.

If a deploy ever goes wrong, [DeployHQ's one-click rollback](https://www.deployhq.com/features/one-click-rollback) puts the previous version back on the server in seconds — no need to revert commits or rebuild containers manually.

## Step 6: Production hardening

A working deploy is the floor, not the ceiling. Five things to add before the agent does anything real:

**1. Put [Nginx](https://www.deployhq.com/blog/nginx-vs-apache-vs-caddy-choosing-the-right-web-server) in front.** Don't expose port 8000 to the internet. Bind the container to localhost in Compose:

```
ports:
  - "127.0.0.1:8000:8000"
```

Then run Nginx as a reverse proxy with Let's Encrypt for HTTPS. Same pattern works for any service on the box.

**2. Add a real health check** so Docker restarts unhealthy containers:

```
healthcheck:
  test: ["CMD", "curl", "-fsS", "http://localhost:8000/health"]
  interval: 30s
  timeout: 5s
  retries: 3
  start_period: 10s
```

**3. Cap resource use** so a runaway agent doesn't take the VPS down:

```
deploy:
  resources:
    limits:
      memory: 512M
      cpus: "1.0"
```

**4. Pin the base image by digest** , not just by tag. Use `python:3.12-slim@sha256:...` so an upstream rebuild can't change your build silently.

**5. Watch your spend.** LLM calls are the expensive part of an agent — far more than the server. Add per-request token logging now, so you know what each `/ask` costs before traffic grows. A 30-line wrapper around `client.chat.completions.create` is enough.

For the deploys themselves, [DeployHQ](https://www.deployhq.com) runs [zero downtime deployments](https://www.deployhq.com/features/zero-downtime-deployments) by default — when you graduate to multiple replicas behind a load balancer, the rollout pattern stays the same.

## Where to go from here

The agent you just deployed is intentionally tiny: one endpoint, one LLM call, no tools, no memory. That's a good starting point, but most useful agents need more. Two paths from here:

**Add capabilities to your own code.** Function calling (tools) is the natural next step — the OpenAI SDK supports it out of the box, no framework required. You add a `tools` list to `chat.completions.create`, the model returns the tool it wants to call, you execute it, and you loop. A few hundred more lines of Python turns the agent above into something genuinely useful.

**Adopt an orchestrator** when you start hand-rolling skills, memory, or multi-step workflows. A few we've covered in depth:

- [Self-host the Paperclip agent orchestrator on a VPS with Docker](https://www.deployhq.com/blog/self-host-paperclip-vps-docker-deployhq) — a Docker-first, pre-built orchestrator that uses the same [DeployHQ](https://www.deployhq.com) deploy pattern as this tutorial.
- [Deploy OpenClaw on a VPS with SSL and auto-deploys](https://www.deployhq.com/blog/deploy-configure-openclaw-vps) — a self-hosted AI assistant with a plugin (Skills) system you can extend.
- [Deploy Hermes Agent on a VPS](https://www.deployhq.com/blog/deploy-hermes-agent-vps) — for self-improving agents that learn persistent skills over time.

If you want agents _inside_ your deployment pipeline rather than as a deployed service, see [how AI agents fit into CI/CD pipelines from GitHub issue to production deploy](https://www.deployhq.com/blog/ai-agents-cicd-pipelines-github-issue-to-production-deploy) — same building blocks, very different shape.

And if you'd rather drive deploys _from_ a terminal AI agent than build your own service, [the](https://www.deployhq.com/blog/deployhq-cli-deploy-from-terminal)[DeployHQ](https://www.deployhq.com) CLI lets Claude Code, Cursor, or Codex trigger deploys for you, and you can pair it with [Google's open-source Gemini CLI for an interactive terminal agent](https://www.deployhq.com/blog/getting-started-with-google-gemini-cli-open-source-ai-agent-for-your-terminal).

* * *

Ready to wire your first agent — or your tenth — to a Git-based deploy pipeline? [Start a free](https://www.deployhq.com/signup)[DeployHQ](https://www.deployhq.com) trial and have your first agent deploying from `main` in under ten minutes.

Need a hand? Email us at **[support@deployhq.com](mailto:support@deployhq.com)** or find us on X at [@deployhq](https://x.com/deployhq).