How to Self-Host Your Own AI Chat Interface on a VPS with Open WebUI and Ollama

AI, Devops & Infrastructure, Docker, Tutorials, and VPS

How to Self-Host Your Own AI Chat Interface on a VPS with Open WebUI and Ollama

Running your own AI chat interface on a VPS gives you full control over data privacy, model selection, and costs. Instead of paying per-token API fees or sending sensitive prompts to third-party servers, you can self-host an interface that connects to local open-source models or cloud APIs — your choice, your rules.

This guide walks through deploying Open WebUI with Ollama on a Linux VPS. Open WebUI is the most popular self-hosted ChatGPT alternative (50k+ GitHub stars), and Ollama makes running large language models locally as simple as ollama pull llama3. We will also set up automated deployments with DeployHQ so configuration changes and customisations flow through a proper CI/CD pipeline.

Self-hosted AI in 2026: what has changed

The self-hosted AI landscape has matured significantly. Here is how it compares to using hosted APIs directly:

Cloud API (OpenAI, Anthropic) Self-hosted (Ollama + Open WebUI)
Privacy Prompts sent to third-party servers Everything stays on your VPS
Cost model Per-token billing, scales with usage Fixed VPS cost, unlimited local inference
Model choice Locked to provider's models Run any open model (Llama 3, Mistral, Qwen, DeepSeek, Gemma)
Latency Network round-trip + queue time Local inference, no network dependency
Customisation Limited to API parameters Full control over system prompts, RAG pipelines, tools
Offline capability None Works without internet once models are downloaded

Important clarification: ChatGPT is OpenAI's proprietary hosted service — you cannot install ChatGPT itself on a VPS. What you can do is run an equivalent chat interface backed by open-source models that run locally, or connect to cloud APIs (OpenAI, Anthropic, Google) through a unified self-hosted interface. That is exactly what Open WebUI provides.


Architecture overview

flowchart LR
    Browser["Browser"]
    Nginx["Nginx\n(TLS + reverse proxy)"]
    OW["Open WebUI\n(:3000)"]
    Ollama["Ollama\n(model runtime)"]
    Models["Local Models\n(Llama 3, Mistral, etc.)"]
    CloudAPI["Cloud APIs\n(OpenAI, Anthropic)\n(optional)"]
    DeployHQ["DeployHQ"]
    Git["Git Repo"]

    Browser -->|HTTPS :443| Nginx
    Nginx -->|HTTP :3000| OW
    OW -->|HTTP :11434| Ollama
    Ollama --> Models
    OW -.->|optional| CloudAPI
    Git -->|push| DeployHQ
    DeployHQ -->|SSH deploy| OW

Prerequisites

  • A VPS with at least 4 vCPUs and 8 GB RAM (16 GB recommended for larger models)
  • Ubuntu 22.04 or 24.04
  • A domain name pointed at your VPS (e.g. chat.example.com)
  • SSH access with a sudo-capable user
  • Docker Engine and Docker Compose v2

GPU is optional. Ollama runs on CPU with quantised models (Q4/Q5). A 7B parameter model like Llama 3.2 runs comfortably on 8 GB RAM without a GPU. For faster inference or larger models (70B+), a GPU with 24 GB+ VRAM is recommended.


Step 1: Install Docker

sudo apt update && sudo apt upgrade -y
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER

Log out and back in, then verify:

docker compose version

Step 2: Create the project structure

mkdir -p ~/ai-chat/{nginx,ollama-data,webui-data}
cd ~/ai-chat

Step 3: Write docker-compose.yml

services:
  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    volumes:
      - ./ollama-data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
    ports:
      - "127.0.0.1:11434:11434"
    healthcheck:
      test: ["CMD", "ollama", "list"]
      interval: 30s
      timeout: 10s
      retries: 3

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    restart: unless-stopped
    depends_on:
      ollama:
        condition: service_healthy
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - ENABLE_SIGNUP=false
    volumes:
      - ./webui-data:/app/backend/data
    ports:
      - "127.0.0.1:3000:8080"

  nginx:
    image: nginx:alpine
    restart: unless-stopped
    depends_on:
      - open-webui
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/default.conf:/etc/nginx/conf.d/default.conf:ro
      - /etc/letsencrypt:/etc/letsencrypt:ro

Key decisions:

  • Ollama and Open WebUI bind to 127.0.0.1 only — Nginx handles all external traffic
  • ENABLE_SIGNUP=false prevents strangers from creating accounts on your instance
  • Persistent volumes ensure models and chat history survive container restarts

Step 4: Configure Nginx with TLS

Obtain a certificate:

sudo apt install certbot -y
sudo certbot certonly --standalone -d chat.example.com --email you@example.com --agree-tos --no-eff-email

Create nginx/default.conf:

upstream webui {
    server open-webui:8080;
}

server {
    listen 80;
    server_name chat.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name chat.example.com;

    ssl_certificate     /etc/letsencrypt/live/chat.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/chat.example.com/privkey.pem;

    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;

    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    client_max_body_size 50m;

    location / {
        proxy_pass http://webui;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (required for streaming responses)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_read_timeout 300s;
    }
}

The WebSocket configuration is critical — without it, streaming chat responses will not work.


Step 5: Start the stack and pull your first model

cd ~/ai-chat
docker compose up -d

Wait for the containers to start, then pull a model:

docker exec ollama ollama pull llama3.2:3b

This downloads the Llama 3.2 3B model (~2 GB). For a more capable model:

docker exec ollama ollama pull llama3.1:8b     # 4.7 GB, good general purpose
docker exec ollama ollama pull mistral:7b       # 4.1 GB, strong at code
docker exec ollama ollama pull qwen2.5:7b       # 4.7 GB, multilingual

Model sizing guide

Model RAM needed Best for
Llama 3.2 3B ~4 GB Quick responses, light tasks, low-resource VPS
Llama 3.1 8B ~8 GB General purpose, good quality/speed balance
Mistral 7B ~8 GB Code generation, technical writing
Qwen 2.5 14B ~12 GB Complex reasoning, multilingual
Llama 3.1 70B ~40 GB Maximum quality (requires GPU)

Step 6: Access the interface

Open https://chat.example.com in your browser. On first visit:

  1. Create your admin account (since ENABLE_SIGNUP=false, only you can do this on first access)
  2. Select a model from the dropdown (you should see llama3.2:3b or whichever you pulled)
  3. Start chatting

Open WebUI provides:

  • Multiple model switching — swap between models mid-conversation
  • Document upload with RAG — upload PDFs or text files and ask questions about them
  • Web search integration — augment model responses with live web results
  • System prompts — customise model behaviour per conversation
  • Chat history and export — full conversation management
  • Multi-user support — create accounts for your team with role-based access

Step 7: (Optional) Connect cloud APIs

Open WebUI can also act as a unified interface for cloud APIs. In the admin panel:

  1. Go to Settings > Connections
  2. Add an OpenAI-compatible endpoint:
    • URL: https://api.openai.com/v1
    • API Key: your OpenAI key
  3. You can also add Anthropic, Google, or any OpenAI-compatible API

This lets you compare local model responses against cloud models side-by-side, or fall back to cloud APIs for tasks that exceed your local model's capability.


Step 8: Automate deployments with DeployHQ

As you customise Open WebUI (system prompts, model configurations, Nginx rules, Docker Compose changes), you want those changes version-controlled and automatically deployed.

8a: Repository structure

ai-chat-config/
  docker-compose.yml
  nginx/
    default.conf
  scripts/
    deploy.sh
    pull-models.sh
  .env.example

8b: Connect to DeployHQ

  1. Sign up or log in to DeployHQ
  2. Create a new project and connect your GitHub or GitLab repository
  3. Add an SSH server pointing to your VPS
  4. Set the deploy path to /home/deploy/ai-chat/
  5. Add a config file for .env to keep secrets out of Git

8c: Post-deploy command

In DeployHQ's SSH Commands section:

cd /home/deploy/ai-chat && bash scripts/deploy.sh

Your scripts/deploy.sh:

#!/usr/bin/env bash
set -euo pipefail

# Pull latest images
docker compose pull

# Restart with updated configuration
docker compose up -d --remove-orphans

# Pull any new models defined in the model list
bash scripts/pull-models.sh

echo "AI chat stack deployed successfully"

Now every git push updates your configuration, restarts services if needed, and ensures new models are pulled.


Performance tuning

CPU inference optimisation

If running on CPU only, these environment variables can improve Ollama's performance:

# Add to the ollama service in docker-compose.yml
environment:
  - OLLAMA_HOST=0.0.0.0
  - OLLAMA_NUM_PARALLEL=2      # concurrent requests
  - OLLAMA_MAX_LOADED_MODELS=1 # keep 1 model in memory

Memory management

Ollama unloads models after 5 minutes of inactivity by default. On a RAM-constrained VPS, this is desirable. To keep models loaded longer:

environment:
  - OLLAMA_KEEP_ALIVE=30m  # keep model loaded for 30 minutes

Monitoring

Add a simple health check to your monitoring:

# Ollama health
curl -sf http://localhost:11434/api/tags > /dev/null && echo "Ollama OK" || echo "Ollama DOWN"

# Open WebUI health
curl -sf http://localhost:3000/health > /dev/null && echo "WebUI OK" || echo "WebUI DOWN"

Security checklist

  • Disable public signup (ENABLE_SIGNUP=false) — only you should create accounts
  • Set a strong WEBUI_SECRET_KEY — used for session token signing
  • Keep Ollama off the public internet — bind to 127.0.0.1 only (done in our Compose file)
  • Enable automatic TLS renewal: sudo certbot renew --deploy-hook "docker compose -f /home/deploy/ai-chat/docker-compose.yml restart nginx"
  • Update regularly: docker compose pull && docker compose up -d
  • Back up chat data: the webui-data/ volume contains all conversations and user data

Troubleshooting

Symptom Likely cause Fix
No models available in Open WebUI Models not pulled yet Run docker exec ollama ollama pull llama3.2:3b
Open WebUI cannot connect to Ollama OLLAMA_BASE_URL wrong Verify it is http://ollama:11434 (Docker service name)
Streaming responses hang Missing WebSocket proxy config Add proxy_http_version 1.1 and Upgrade headers in Nginx
Out of memory when loading model Model too large for available RAM Use a smaller quantised model (3B or 7B)
Slow inference CPU-only with large model Switch to a smaller model or add GPU passthrough

What to do next

  1. Experiment with models — try different models for different tasks (code, writing, analysis)
  2. Set up RAG — upload your documentation and create a knowledge-augmented assistant
  3. Create team accounts — Open WebUI supports multi-user with role-based access
  4. Explore function calling — Open WebUI supports tool use with compatible models
  5. Add GPU acceleration — if you need faster inference, look into NVIDIA Container Toolkit for Docker GPU passthrough

For more on automating your deployment pipelines and managing Docker-based deployments, check out the DeployHQ blog.

If you have questions or need help, reach out at support@deployhq.com or on Twitter/X.