Python Application Servers in 2026: From WSGI to Modern ASGI Solutions

By Facundo F · Updated on 7th May 2026

Devops & Infrastructure, Open Source, Python, and Tips & Tricks

Python Application Servers in 2026: From WSGI to Modern ASGI Solutions

The Python web server landscape has matured considerably. The shift from synchronous WSGI to asynchronous ASGI is no longer experimental — it is the default for new projects on FastAPI, Starlette, and modern async-capable Django (3.0+). But choosing the right application server still depends on your workload, your framework, and how much operational complexity you are willing to manage in production.

This guide compares the most widely used Python application servers in production today — Gunicorn, uWSGI, Uvicorn, Hypercorn, and Granian — with practical configuration examples, head-to-head trade-offs (Granian vs Uvicorn, Granian vs Gunicorn), and deployment patterns we see across DeployHQ customer pipelines.

flowchart LR
    Client["Client"] --> LB["Nginx / Load Balancer"]
    LB --> AS["App Server\n(Gunicorn / Uvicorn / Granian)"]
    AS --> App["Python App\n(Django / FastAPI / Flask)"]
    App --> DB["Database"]
    App --> Cache["Redis / Memcached"]

How to Choose: A 30-Second Decision Tree

Before diving into specifics, here is the practical filter most teams should apply:

Running Django or Flask with sync views? → Gunicorn (sync workers). Stop reading benchmarks.
FastAPI, Starlette, or async Django? → Uvicorn, ideally managed by Gunicorn (gunicorn -k uvicorn.workers.UvicornWorker).
Profiled the app, confirmed the server is the bottleneck (rare)? → Granian. Almost no application reaches this point.
Need HTTP/3, QUIC, or Trio? → Hypercorn.
Already running uWSGI in production and it works? → Leave it. There is no migration ROI.

Most which server is fastest debates ignore the fact that the application server is rarely the bottleneck. Your database queries, ORM serialization, and external API calls dominate response time. The decision that actually matters is matching the worker model to your concurrency pattern (request-response vs. WebSockets / streaming / long-polling).

Traditional WSGI Servers

Gunicorn

Gunicorn remains the default choice for Django, Flask, and other WSGI applications. Its pre-fork worker model has been battle-tested in production for over a decade and is simple to reason about.

Strengths:

Production-proven reliability across thousands of deployments
Simple configuration with sensible defaults
Excellent process management and graceful restarts
Works out of the box with Django and Flask

# gunicorn.conf.py
bind = "0.0.0.0:8000"
workers = 4
worker_class = "sync"
max_requests = 1000
max_requests_jitter = 50

Limitations:

No native async support (though uvicorn.workers.UvicornWorker is a stable bridge)
No WebSocket support in sync mode
Each worker holds its own memory, so RAM usage scales linearly with worker count
Pre-fork model means a single slow request can occupy a worker until completion (use --timeout and an async worker class for long-running endpoints)

When to use it: If you are running a Django monolith or a Flask API, Gunicorn with sync workers is the right default. Do not switch to ASGI just because it is newer — unless you genuinely need async features (WebSockets, server-sent events, high-concurrency async I/O), Gunicorn is the correct choice and will save you operational complexity.

uWSGI

uWSGI is a full-featured application server that goes far beyond serving Python — it supports multiple languages, protocols, and deployment patterns. That power comes at the cost of complexity.

# uwsgi.ini
[uwsgi]
http = :8000
processes = 4
threads = 2
master = true
vacuum = true
die-on-term = true

Strengths: Multiple protocol support, built-in caching, load balancing, mature process management.

Limitations: Steep learning curve, hundreds of configuration options, higher memory footprint. Maintenance activity has slowed, and the Python community has largely settled on Gunicorn (sync) and Uvicorn (async) as the modern defaults.

When to use it: If you are already running uWSGI and it works, there is no urgent reason to migrate. For new projects, Gunicorn or a modern ASGI server is a simpler starting point.

Modern ASGI Servers

Uvicorn

Uvicorn is the most popular ASGI server and the default for FastAPI and Starlette. It uses uvloop (a Cython-based drop-in replacement for asyncio's event loop) and httptools for HTTP parsing — both contribute to its low per-request overhead.

import uvicorn

if __name__ == "__main__":
    uvicorn.run(
        "app:app",
        host="0.0.0.0",
        port=8000,
        workers=4,
        log_level="info",
    )

Strengths:

High throughput for async workloads
Native WebSocket support
Low memory footprint (~20MB per worker)
Simple configuration and excellent documentation
First-class FastAPI integration

Limitations:

Process management is basic compared to Gunicorn — for production, most teams run Uvicorn workers under Gunicorn: gunicorn -k uvicorn.workers.UvicornWorker app:app
No graceful reload of in-flight WebSocket connections during deploys (you need a separate strategy for connection draining)

When to use it: Any async Python application. If you are using FastAPI, Uvicorn is the standard.

Hypercorn

Hypercorn supports HTTP/1.1, HTTP/2, HTTP/3 (QUIC), and WebSockets. It is the most protocol-complete ASGI server available.

from hypercorn.config import Config
from hypercorn.asyncio import serve

config = Config()
config.bind = ["0.0.0.0:8000"]
config.workers = 4

Strengths: HTTP/3 and QUIC support, multiple worker types (asyncio, uvloop, trio), built-in TLS configuration.

Limitations: Smaller community than Uvicorn, fewer production deployment examples, slightly lower throughput for standard HTTP/1.1 workloads in published benchmarks.

When to use it: If you need HTTP/3 / QUIC, or if you are using the trio async library instead of asyncio.

Performance-Focused Solutions

Granian

Granian is a Rust-based Python application server that focuses on raw performance. Its HTTP parsing, connection handling, and worker scheduling are implemented in Rust, while the Python application code runs through standard ASGI/WSGI interfaces — or through RSGI, Granian's own protocol designed to minimize Python ↔ Rust serialization overhead.

# Granian is typically configured via CLI or environment variables
granian --interface asgi \
        --host 0.0.0.0 \
        --port 8000 \
        --workers 4 \
        --threads 2 \
        --loop uvloop \
        app:app

Strengths:

Highest throughput in synthetic benchmarks due to Rust-based I/O
Low memory footprint (~15MB per worker)
Supports WSGI, ASGI, and RSGI in a single binary
Built-in HTTP/2, TLS, and graceful reloads
Single static binary — no system dependencies beyond Python

Limitations:

Newer project (first stable releases in 2023, still pre-1.0 as of 2026) with a smaller community
Fewer production case studies — you are more likely to be the first person debugging a given edge case
Debugging is harder when issues cross the Rust/Python boundary; stack traces can be opaque
Library compatibility: some asyncio-deep libraries (especially older ones) assume specific event-loop internals and may behave differently under Granian's runtime

Granian vs Uvicorn: How to Decide

This is the comparison most teams actually have to make in 2026, so it is worth being concrete:

Dimension	Granian	Uvicorn
Synthetic req/sec (hello-world)	~2× higher in most benchmarks	Baseline
Real-world API response time (DB-bound)	Within 5–10% of each other	Within 5–10% of each other
Memory per worker	~15MB	~20MB
Maturity	Pre-1.0 (2023+)	Stable, ubiquitous since 2018
Community / Stack Overflow answers	Small	Very large
Production case studies	Limited	Extensive (Netflix, Microsoft, etc.)
FastAPI tooling integration	Works, but less default	First-class everywhere
Process management story	Built-in, simple	Usually run under Gunicorn
Best fit	High-RPS, low-logic edge services	Anything async, anything FastAPI

Verdict: If you are choosing today and have no specific reason to push for raw req/sec, Uvicorn (under Gunicorn for prod) is the safer, lower-operational-cost choice. Pick Granian when you have measured a server-level bottleneck, when a 10-15% reduction in compute cost meaningfully changes your infra bill, or when you specifically want RSGI's lower per-request overhead for a high-volume internal service.

Granian vs Gunicorn: A Different Question

Gunicorn and Granian are not really competitors — they live at different layers. Gunicorn is a process manager that supervises worker classes (sync, threaded, gevent, or UvicornWorker). Granian is the worker. The honest comparison:

Sync Django/Flask, no async needs: Gunicorn (sync workers) — Granian's WSGI support works, but you lose Gunicorn's mature process supervision and battle-tested deployment patterns.
Async FastAPI: Gunicorn + UvicornWorker is the conservative choice. Granian replaces this whole stack with one binary, which is appealing operationally but means betting on a younger project.

If you are already comfortable running Gunicorn, switching to Granian to save 5MB per worker is rarely worth the migration cost.

Performance: A Realistic Perspective

Synthetic benchmarks (hello world req/sec) are widely published but rarely reflect production performance. From what we see in real deployments:

ASGI servers are 2–4× faster than WSGI servers for async workloads with high concurrency (many simultaneous connections, WebSocket streams, long-polling). For a typical CRUD API where each request blocks on a database query, the gap is much smaller — often within 10%.
Granian's Rust-based I/O shows the largest gains in connection-heavy scenarios with minimal application logic. Once your app does real work — database queries, template rendering, JSON serialization — the I/O layer stops being the bottleneck.
The biggest performance win is usually not the server. Caching, query optimization, and reducing N+1 queries typically yield 10× the improvement of switching application servers.

Profile before you optimize. Tools that pay off: py-spy (sampling profiler, no code changes), cProfile (deterministic, for hot paths), and OpenTelemetry traces (production-safe, see our OpenTelemetry setup guide for spans-and-metrics setup).

Approximate baseline memory per worker:

Server	Base Memory
Granian	~15MB
Uvicorn	~20MB
Hypercorn	~25MB
Gunicorn	~30MB
uWSGI	~40MB

These are baseline figures with a minimal app. Your imports (especially Django's), data structures, and caches will add to this — typically 50–200MB extra in production.

Worker Sizing: The Rule Most Teams Get Wrong

The classic Gunicorn rule of workers = (2 × CPU cores) + 1 assumes sync, CPU-bound workloads. In 2026, most Python apps are I/O-bound (waiting on databases, APIs, caches), and the right worker count looks different:

Sync WSGI (Gunicorn sync): (2 × cores) + 1 is a sensible default.
Async ASGI (Uvicorn / Granian): Async workers handle thousands of concurrent connections each. cores (or cores + 1) is usually plenty — running 8 async workers on a 4-core box mostly wastes RAM.
Mixed (Gunicorn + UvicornWorker): Treat it like async. cores + 1 workers, one event loop each.
Memory-constrained containers: Workers × per-worker RAM must fit comfortably in the container limit, with headroom for traffic spikes. Set --max-requests to recycle workers and contain memory leaks.

Modern Features Implementation

WebSocket Support (FastAPI Example)

from fastapi import FastAPI, WebSocket

app = FastAPI()

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    while True:
        data = await websocket.receive_text()
        await websocket.send_text(f"Message received: {data}")

For WebSockets in production, watch out for: connection-draining on deploy (in-flight sockets get killed unless you handle SIGTERM gracefully), sticky load balancing, and per-worker connection limits.

HTTP/2 and HTTP/3 Configuration

# Hypercorn HTTP/2 + HTTP/3 config
config = Config()
config.h2_enabled = True
config.alpn_protocols = ["h3", "h2", "http/1.1"]
config.certfile = "cert.pem"
config.keyfile = "key.pem"

In most setups, HTTP/2 termination happens at your reverse proxy (Nginx, Caddy, or a CDN) rather than at the Python server. See our breakdown of reverse-proxy choices for Nginx, Apache, and Caddy and the Nginx vs Apache vs Caddy comparison.

Production Deployment

Docker Configuration

FROM python:3.13-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

A common production refinement is multi-stage builds (compile wheels in a builder stage, copy to a slim runtime), --no-binary pinning for security-sensitive packages, and a non-root USER directive.

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: python-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: python-app
        image: python-app:1.0
        resources:
          limits:
            memory: "512Mi"
            cpu: "500m"
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
        lifecycle:
          preStop:
            exec:
              command: ["sleep", "10"]

The preStop hook gives the load balancer time to remove the pod from rotation before SIGTERM hits — a small detail that prevents brief 502 spikes during rollouts. For end-to-end Kubernetes pipelines, see our walkthrough on Kubernetes deployments using DeployHQ Shell Servers with kubectl.

If you are deploying with Docker or Kubernetes, you will need a Shell Server in DeployHQ to manage the deployment pipeline. DeployHQ connects to your server via SSH and runs the commands you define — pulling new images, restarting containers, or running migrations. For deploys without containers, our walkthrough on zero-downtime deployment without Docker or Kubernetes covers symlink-swap patterns that work cleanly with Gunicorn or Uvicorn under systemd.

Hybrid Worker Configuration

# Gunicorn worker optimization
import multiprocessing

# Async workloads: cores + 1 is plenty
workers = multiprocessing.cpu_count() + 1
threads = 4
worker_class = "uvicorn.workers.UvicornWorker"

This hybrid approach gives you Gunicorn's process management with Uvicorn's async performance — the most common production pattern for FastAPI in 2026.

Connection Pooling

# Database connection pooling with async
from databases import Database

database = Database(
    "postgresql://user:pass@localhost/db",
    min_size=5,
    max_size=20,
)

async def startup():
    await database.connect()

async def shutdown():
    await database.disconnect()

Pool sizing: each worker has its own pool, so total connections to your database = workers × max_size. It is easy to exhaust PostgreSQL's max_connections (default 100) with a handful of pods. For a deeper dive into database deployment patterns alongside your app server, see our guide on code-first database deployments with Flask and SQLAlchemy and our breakdown of SQLite vs PostgreSQL vs MySQL trade-offs.

Monitoring and Observability

Prometheus Metrics

from prometheus_client import Counter, Histogram
from fastapi import FastAPI
from prometheus_fastapi_instrumentator import Instrumentator

app = FastAPI()
Instrumentator().instrument(app).expose(app)

REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint", "status"]
)

Track the four golden signals (latency, traffic, errors, saturation) plus per-worker memory — workers steadily growing in RSS over hours is the classic memory-leak signature, and --max-requests with jitter is the cheap fix.

OpenTelemetry Integration

from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.sdk.trace import TracerProvider

trace.set_tracer_provider(TracerProvider())
FastAPIInstrumentor.instrument_app(app)

For a deeper dive into setting up metrics, traces, and logs, see our guide on OpenTelemetry in practice.

Deploying Python Apps with DeployHQ

DeployHQ deploys any Python application server setup — Gunicorn behind Nginx on a VPS, Uvicorn under systemd, or containers on Kubernetes. A typical FastAPI + Uvicorn workflow:

Connect your repository — DeployHQ supports GitHub, GitLab, Bitbucket, and self-hosted Git servers
Configure a build pipeline — install dependencies with pip install -r requirements.txt, run pytest, and build any static assets. See our overview of build pipelines in DeployHQ for advanced patterns
Set up SSH commands — after file transfer, restart your application server (e.g., sudo systemctl restart myapp or a kubectl rollout restart)
Deploy — push to your branch and DeployHQ handles the rest, with detailed logs and one-click rollback if a release misbehaves

For Django-specific deployments, see our guide on deploying Django on a budget with Hetzner and DeployHQ. For Python ERP platforms, our Odoo on Ubuntu deployment guide covers the systemd + Nginx pattern. To wire pytest into the pipeline so a failing test blocks the release, see our walkthrough on test automation in your deployment pipeline.

Deployment Considerations

Process Management

# Supervisor config
[program:python-app]
command=uvicorn app:app --host 0.0.0.0 --port 8000
directory=/app
user=www-data
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true

For systemd-based servers (most modern Linux distributions), a unit file is often simpler:

# /etc/systemd/system/myapp.service
[Unit]
Description=Python App
After=network.target

[Service]
User=www-data
WorkingDirectory=/app
ExecStart=/app/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000
Restart=always
KillSignal=SIGTERM
TimeoutStopSec=30

[Install]
WantedBy=multi-user.target

KillSignal=SIGTERM plus a generous TimeoutStopSec lets in-flight requests finish before the worker exits — the difference between a clean rollout and a wave of 502s.

Load Balancing

# Nginx configuration
upstream python_servers {
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
}

server {
    listen 80;
    location / {
        proxy_pass http://python_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

The forwarded headers matter for any framework that respects the original protocol/IP — Django's SECURE_PROXY_SSL_HEADER, FastAPI's --forwarded-allow-ips, and rate limiters all need them.

Conclusion

The right Python application server depends on what you are building. For traditional Django and Flask applications, Gunicorn remains the reliable default. For async applications built on FastAPI or Starlette, Uvicorn — typically managed by Gunicorn — is the standard. Granian and Hypercorn serve more specialised needs: maximum throughput on low-logic services, and advanced protocol support (HTTP/3, QUIC, Trio) respectively.

The honest take: most teams should pick the default for their framework, profile their application before chasing benchmark numbers, and spend the saved engineering time on database queries and caching — that is where real performance wins live.

Whatever server you choose, DeployHQ's automated deployment pipeline makes it straightforward to ship from your Git repository to your servers. Start a free DeployHQ trial and have your Python application deploying in minutes.

Have questions about deploying Python applications? Reach out to us at support@deployhq.com or find us on Twitter/X.