NGINX 502 Bad Gateway: Causes and Fixes

Devops & Infrastructure and Tips & Tricks

NGINX 502 Bad Gateway: Causes and Fixes

502 Bad Gateway from NGINX means NGINX accepted your request, tried to hand it off to an upstream process (PHP-FPM, Node.js, a backend API, a Docker container), and got nothing usable back — a connection refused, a timeout, a malformed response, or a process that died mid-reply. The 502 is what NGINX shows the user; the real cause is always upstream.

This post covers the seven causes that account for almost every 502 in production, in roughly the order of how often they show up, with the exact diagnostic commands and configuration fixes for each.

Always start with the error log

Before reading any further, run this:

sudo tail -n 100 /var/log/nginx/error.log

NGINX writes the actual cause of every 502 to the error log with a clear upstream line. A typical entry:

2026/05/23 14:23:01 [error] 1812#0: *5234 connect() failed (111: Connection refused)
  while connecting to upstream, client: 203.0.113.45, server: example.com,
  request: "GET /api/users HTTP/1.1", upstream: "http://127.0.0.1:3000/api/users",
  host: "example.com"

The bracketed error code (111: Connection refused, 110: Connection timed out, 104: Connection reset by peer) plus the upstream: URL tell you everything you need to start. The rest of this post is essentially a translation table for those error codes — keep the log open in another terminal while you work through the causes below.

If /var/log/nginx/error.log is empty or doesn't exist, your NGINX may be writing to journald instead — try sudo journalctl -u nginx -n 100. The Linux server commands guide covers the wider set of log-reading commands.

Cause 1: The upstream process isn't running

By a comfortable margin, the most common 502 cause is your backend application crashed or never started. The error log line reads:

connect() failed (111: Connection refused) while connecting to upstream

Connection refused is the kernel saying nothing is listening on that port. NGINX did its job; there's just no application to forward to.

Confirm whether the upstream is up:

sudo ss -tlnp | grep :3000        # replace with whatever port NGINX is forwarding to

If you see nothing, the upstream is down. Start it (or check why it crashed) — for a systemd-managed service:

sudo systemctl status myapp
sudo systemctl start myapp
sudo journalctl -u myapp -n 200

For a Docker container:

docker ps -a | grep myapp                  # is it running, or exited?
docker logs myapp --tail 200                # what did it say before dying?
docker start myapp                          # bring it back

A surprising number of 502s in production are this — a deploy crashed silently, OOMKiller reaped the process, a startup error wasn't caught. Fix the upstream and the 502 goes away.

Cause 2: The upstream is slow and NGINX gives up waiting

The error log line:

upstream timed out (110: Connection timed out) while reading response header from upstream

or:

upstream prematurely closed connection while reading response header from upstream

NGINX has built-in timeouts for connecting to the upstream, sending the request, and reading the response. When any of those expire, you get a 502 even though the upstream is technically running — it just didn't respond in time.

The relevant settings live in the http or server block:

http {
    proxy_connect_timeout   75s;     # how long to wait to establish the TCP connection
    proxy_send_timeout      75s;     # how long to wait to send the request body
    proxy_read_timeout      75s;     # how long to wait for the response (the big one)
}

Defaults are 60 seconds. For an API call that does a slow database query, an LLM request, or anything that legitimately takes a while, bump proxy_read_timeout to whatever your real upper-bound is:

location /api/slow-report {
    proxy_pass http://backend;
    proxy_read_timeout 300s;
}

Apply the change and reload (not restart — reload picks up config without dropping in-flight connections):

sudo nginx -t                              # test the config first
sudo systemctl reload nginx

Watch out for the flip side: raising the timeout doesn't fix a slow query; it just hides the symptom. If you're hitting 502 from a 60-second timeout on what should be a 2-second endpoint, the upstream has a real performance problem to investigate — adding 4 minutes of patience won't make it faster.

Cause 3: Buffer issues on large response headers

The error log line:

upstream sent too big header while reading response header from upstream

NGINX has fixed-size buffers for the upstream's response headers. Some applications (Laravel with verbose session cookies, some SSO implementations, anything that returns a lot of Set-Cookie headers) exceed the default and trigger a 502.

Bump the buffer sizes:

location / {
    proxy_pass http://backend;
    proxy_buffer_size          128k;
    proxy_buffers              4 256k;
    proxy_busy_buffers_size    256k;
}

proxy_buffer_size is the buffer for the response header line; proxy_buffers is the number and size of buffers for the response body. The default proxy_buffer_size is 4k or 8k depending on platform, which is often too small for modern apps.

This is the 502 that's hardest to diagnose without reading the log carefully — the upstream is healthy, response times are normal, but the response itself is malformed-looking to NGINX. The fix is purely configuration.

Cause 4: The upstream address in nginx.conf is wrong

The error log line:

connect() failed (111: Connection refused) while connecting to upstream, upstream: "http://10.0.0.5:8080/..."

If the upstream IP or port is wrong — typo, a container that was renamed, a service that moved to a different port after a refactor — you get the same connection refused as Cause 1, but the actual upstream is fine. Compare the upstream: URL in the error log to what's actually listening:

sudo ss -tlnp                              # what's actually listening locally
docker ps                                  # what containers are running and on what ports

For dynamic backends (upstream backend { server backend.example.com:8080; }), confirm DNS resolves correctly from the NGINX host:

dig backend.example.com
curl -v http://backend.example.com:8080/health

A common variant: NGINX caches DNS resolution at config-load time. If your backend's IP changed (autoscaling, a container restart in Docker Compose), NGINX keeps using the old address until you reload it. Adding a resolver directive and a variable in proxy_pass forces per-request resolution:

location / {
    resolver 1.1.1.1 valid=30s;
    set $backend "backend.example.com";
    proxy_pass http://$backend:8080;
}

Cause 5: PHP-FPM specific — socket permissions and pool config

When NGINX is fronting PHP-FPM, the 502 patterns get specific. The error log line is one of these:

connect() to unix:/var/run/php/php8.3-fpm.sock failed (13: Permission denied)
connect() to unix:/var/run/php/php8.3-fpm.sock failed (2: No such file or directory)
recv() failed (104: Connection reset by peer) while reading response header from upstream

Permission denied means NGINX (running as www-data on Debian/Ubuntu, nginx on RHEL/AlmaLinux) can't read the FPM socket. Check the socket's permissions:

ls -la /var/run/php/php8.3-fpm.sock

The user/group on the socket should match the user NGINX runs as. The fix is in the FPM pool config (/etc/php/8.3/fpm/pool.d/www.conf or equivalent):

listen.owner = www-data
listen.group = www-data
listen.mode = 0660

Reload PHP-FPM (sudo systemctl reload php8.3-fpm) after changing.

No such file or directory means FPM isn't running, or it's using a different socket path than what NGINX expects. Check FPM status and confirm the socket path matches:

sudo systemctl status php8.3-fpm
sudo find /var/run /run -name "php*-fpm.sock" 2>/dev/null

Connection reset by peer usually means an FPM worker died mid-request — most often from a PHP fatal error, a memory limit being hit, or pm.max_children being exhausted. Check the FPM error log:

sudo tail -n 100 /var/log/php8.3-fpm.log

If pm.max_children is exhausted (server reached pm.max_children setting), raise it in the pool config — but understand why traffic exceeded the previous limit before adding capacity blindly.

Cause 6: SELinux or AppArmor blocking the upstream connection

On AlmaLinux, Rocky, RHEL, and Amazon Linux 2023, SELinux is on by default. A 502 with this exact signature usually means SELinux is the culprit:

connect() failed (13: Permission denied) while connecting to upstream

NGINX-to-upstream connections are governed by the httpd_can_network_connect boolean. By default, NGINX (which runs under the httpd_t context on RHEL-family systems) can only connect to a specific allowlist of ports — port 3000 for a Node.js backend is not on that list.

Confirm SELinux is the cause:

sudo ausearch -m AVC -ts recent | tail -20      # recent AVC denials

A line like denied { name_connect } for pid=1812 comm="nginx" dest=3000 confirms it. The fix:

sudo setsebool -P httpd_can_network_connect 1

-P makes the change persistent across reboots. There's no NGINX reload needed — the boolean takes effect immediately.

On Ubuntu, AppArmor is the equivalent system but typically has a more permissive NGINX profile. AppArmor denials show up in:

sudo dmesg | grep -i denied
sudo aa-status                                  # which AppArmor profiles are loaded

The Linux distros for deployment guide covers which distros ship SELinux enforcing by default — this is one of the cases where the distro choice has real operational consequences.

A 502 that only happens on HTTPS and only from some clients (often browsers, not curl) is sometimes an HSTS / TLS interaction. The pattern usually goes: NGINX terminates TLS, proxies to an upstream that also speaks HTTPS, and the upstream's certificate (or TLS version) doesn't match what NGINX expects.

Common manifestation in the error log:

SSL_do_handshake() failed (SSL: error:...) while SSL handshaking to upstream

If you're proxying HTTPS to an upstream with a self-signed or expired certificate, NGINX rejects the handshake by default. The right fix is to fix the upstream cert; the workaround for internal-only upstreams is:

location / {
    proxy_pass https://upstream.internal;
    proxy_ssl_verify off;            # disable cert validation on the upstream connection
}

Use proxy_ssl_verify off only for trusted internal networks — disabling it on a public upstream defeats the security of TLS to that backend.

For the HSTS header itself (which controls the client-side, not NGINX-to-upstream), the typical correct configuration in the server block:

add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;

HSTS doesn't cause 502s directly, but it does mean a downgrade-attack or misconfigured-TLS-upgrade flow can put a browser into a state where any TLS error becomes user-visible — which sometimes manifests as a 502-like experience even when the actual response was something else.

Diagnostic checklist

When a 502 appears in production, run through this in order:

  1. Read the error log. sudo tail -n 100 /var/log/nginx/error.log. The upstream: URL and the bracketed error code (111, 110, 13, SSL_do_handshake) tell you which cause above applies.
  2. Confirm the upstream is running. sudo ss -tlnp | grep :PORT. If nothing is listening, the upstream is down (Cause 1).
  3. Check the upstream's logs. sudo journalctl -u myapp -n 200 or docker logs myapp --tail 200. The real cause of the crash is usually here.
  4. Test the upstream directly from the NGINX host. curl -v http://127.0.0.1:3000/health. If this works, the problem is between NGINX and the upstream (config, SELinux, buffers). If it fails, the problem is the upstream itself.
  5. Test the NGINX config. sudo nginx -t. Reload with sudo systemctl reload nginx if you've changed anything.
  6. Check system resources. df -h, free -h, dmesg | tail -50. A 502 right after a deploy can be OOMKiller, full disk, or open-file-descriptor limits — these tools are covered above and in the Related guides.

If you've done all six and still have 502s, the problem is almost always in the application — not NGINX itself.

For the deploy itself, DeployHQ's build pipelines can include a post-deploy nginx -t && systemctl reload nginx step so config drift is caught at release time, not at first 502 in production. Start a free trial to wire NGINX reloads into a Git-driven deploy workflow.


Need help? Email support@deployhq.com or follow @deployhq on X.