If you're experiencing connection errors specifically during the "Copying cache files to release" step of your zero-downtime deployment, while regular deployments work fine, this guide will help you troubleshoot the issue.

## Symptom

Your deployments fail with an "Error connecting to server..." message during the "Copying cache files to release" phase, but:

- Regular deployments (without zero-downtime enabled) work perfectly
- The server configuration is correct
- You can connect via SSH/SFTP manually

## Understanding the Issue

Zero-downtime deployments use additional SFTP operations beyond regular deployments to manage the atomic directory structure (releases, cache, shared, current). When the "Copying cache files to release" step fails with a connection error, it typically indicates one of these issues:

### 1. SSH Connection Timeout

The SSH connection established at the start of the deployment may timeout or drop by the time it reaches the cache copy phase. This can happen when:

- The server has aggressive SSH timeout settings
- Previous deployment steps take longer than expected
- Network instability causes the connection to become stale

### 2. SFTP Subsystem Issues

Zero-downtime deployments rely heavily on SFTP operations for checking directories, resolving symlinks, and executing commands. Some SSH servers have:

- Restrictive SFTP subsystem configurations
- Rate limiting on SFTP operations
- Different permission models for SSH vs SFTP

### 3. Chroot or Restricted Shell Environments

If your server uses a chroot jail or restricted shell environment, the SFTP subsystem may have different access to paths than regular SSH commands, causing operations like `realpath()` or `directory?()` to fail.

## Solutions

### Check Your Atomic Deployment Strategy

The "Copying cache files to release" step only occurs when using the **cache-based atomic strategy**. You can try switching to the alternative strategy:

1. Edit your server settings in DeployHQ
2. Change "Atomic deployment strategy" from:

   - "Upload changes to a cache directory and copy new release from there"
   to:
   - "Copy previous release before uploading changes to new release"

This alternative strategy doesn't use a separate cache directory and may avoid the connection issue.

### Increase SSH Timeout Settings

On your server, adjust the SSH daemon configuration to prevent connections from timing out:

```bash
# Edit /etc/ssh/sshd_config
ClientAliveInterval 60
ClientAliveCountMax 10
TCPKeepAlive yes
```

Then restart the SSH service:

```bash
sudo systemctl restart sshd
```

### Verify SFTP Subsystem Access

Ensure the SFTP subsystem has access to your deployment paths:

```bash
# Test SFTP access to your deployment directory
sftp user@server
> ls /path/to/deployment/directory
> ls /path/to/deployment/cache
> pwd
> realpath current
> exit
```

If any of these commands fail or return permission errors, you may need to adjust your SSH configuration.

### Check Path Consistency

Verify that the same paths are accessible via both SSH and SFTP:

```bash
# Via SSH
ssh user@server "ls -la /path/to/deployment/cache"

# Via SFTP
echo "ls -la /path/to/deployment/cache" | sftp user@server
```

Both should return identical results. If not, you may have chroot or path mapping differences.

### Review Firewall and Network Settings

If the connection drops intermittently:

1. Check firewall rules for connection tracking timeouts
2. Review any load balancers or proxies between DeployHQ and your server
3. Ensure consistent routing during the deployment process

## Additional Notes

Both SSH and SFTP protocols are required for zero-downtime deployments, as different operations use different protocols:

- **SSH**: Used for executing commands (cp, ln, rm, etc.)
- **SFTP**: Used for checking directories, resolving symlinks, listing files

Make sure your server configuration allows both protocols with the same user credentials and path access.

## Still Having Issues?

If the problem persists after trying these solutions:

1. Check your deployment logs for specific error messages
2. Test a deployment from scratch to rule out directory state issues
3. Consider using a non-cache atomic strategy as a workaround
4. Contact DeployHQ support with your deployment logs and server SSH configuration details
