8.1 KiB
Multi-Worker Deployment Guide
Problem Statement
When running FastAPI with multiple workers (e.g., uvicorn app:app --workers 4), the lifespan function runs in every worker process. This causes singleton services to run multiple times:
- ❌ Email schedulers send duplicate notifications (4x emails if 4 workers)
- ❌ Background tasks run redundantly across all workers
- ❌ Database migrations/hashing may cause race conditions
Solution: File-Based Worker Coordination
We use file-based locking to ensure only ONE worker runs singleton services. This approach:
- ✅ Works across different process managers (uvicorn, gunicorn, systemd)
- ✅ No external dependencies (Redis, databases)
- ✅ Automatic failover (if primary worker crashes, another can acquire lock)
- ✅ Simple and reliable
Implementation
1. Worker Coordination Module
The worker_coordination.py module provides:
from alpine_bits_python.worker_coordination import is_primary_worker
# In your lifespan function
is_primary, worker_lock = is_primary_worker()
if is_primary:
# Start schedulers, background tasks, etc.
start_email_scheduler()
else:
# This is a secondary worker - skip singleton services
pass
2. How It Works
┌─────────────────────────────────────────────────────┐
│ uvicorn --workers 4 │
└─────────────────────────────────────────────────────┘
│
├─── Worker 0 (PID 1001) ─┐
├─── Worker 1 (PID 1002) ─┤
├─── Worker 2 (PID 1003) ─┤ All try to acquire
└─── Worker 3 (PID 1004) ─┘ /tmp/alpinebits_primary_worker.lock
│
▼
Worker 0: ✓ Lock acquired → PRIMARY
Worker 1: ✗ Lock busy → SECONDARY
Worker 2: ✗ Lock busy → SECONDARY
Worker 3: ✗ Lock busy → SECONDARY
3. Lifespan Function
async def lifespan(app: FastAPI):
# Determine primary worker using file lock
is_primary, worker_lock = is_primary_worker()
_LOGGER.info("Worker startup: pid=%d, primary=%s", os.getpid(), is_primary)
# All workers: shared setup
config = load_config()
engine = create_async_engine(DATABASE_URL)
# Only primary worker: singleton services
if is_primary:
# Start email scheduler
email_handler, report_scheduler = setup_logging(
config, email_service, loop, enable_scheduler=True
)
report_scheduler.start()
# Run database migrations/hashing
await hash_existing_customers()
else:
# Secondary workers: skip schedulers
email_handler, report_scheduler = setup_logging(
config, email_service, loop, enable_scheduler=False
)
yield
# Cleanup
if report_scheduler:
report_scheduler.stop()
# Release lock
if worker_lock:
worker_lock.release()
Deployment Scenarios
Development (Single Worker)
# No special configuration needed
uvicorn alpine_bits_python.api:app --reload
Result: Single worker becomes primary automatically.
Production (Multiple Workers)
# 4 workers for handling concurrent requests
uvicorn alpine_bits_python.api:app --workers 4 --host 0.0.0.0 --port 8000
Result:
- Worker 0 becomes PRIMARY → runs schedulers
- Workers 1-3 are SECONDARY → handle requests only
With Gunicorn
gunicorn alpine_bits_python.api:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000
Result: Same as uvicorn - one primary, rest secondary.
Docker Compose
services:
api:
image: alpinebits-api
command: uvicorn alpine_bits_python.api:app --workers 4 --host 0.0.0.0
volumes:
- /tmp:/tmp # Important: Share lock file location
Important: When using multiple containers, ensure they share the same lock file location or use Redis-based coordination instead.
Monitoring & Debugging
Check Which Worker is Primary
Look for log messages at startup:
Worker startup: pid=1001, primary=True
Worker startup: pid=1002, primary=False
Worker startup: pid=1003, primary=False
Worker startup: pid=1004, primary=False
Check Lock File
# See which PID holds the lock
cat /tmp/alpinebits_primary_worker.lock
# Output: 1001
# Verify process is running
ps aux | grep 1001
Testing Worker Coordination
Run the test script:
uv run python test_worker_coordination.py
Expected output:
Worker 0 (PID 30773): ✓ I am PRIMARY
Worker 1 (PID 30774): ✗ I am SECONDARY
Worker 2 (PID 30775): ✗ I am SECONDARY
Worker 3 (PID 30776): ✗ I am SECONDARY
Failover Behavior
Primary Worker Crashes
- Primary worker holds lock
- Primary worker crashes/exits → lock is automatically released by OS
- Existing secondary workers remain secondary (they already failed to acquire lock)
- Next restart: First worker to start becomes new primary
Graceful Restart
- Send SIGTERM to workers
- Primary worker releases lock in shutdown
- New workers start, one becomes primary
Lock File Location
Default: /tmp/alpinebits_primary_worker.lock
Change Lock Location
from alpine_bits_python.worker_coordination import WorkerLock
# Custom location
lock = WorkerLock("/var/run/alpinebits/primary.lock")
is_primary = lock.acquire()
Production recommendation: Use /var/run/ or /run/ for lock files (automatically cleaned on reboot).
Common Issues
Issue: All workers think they're primary
Cause: Lock file path not accessible or workers running in separate containers.
Solution:
- Check file permissions on lock directory
- For containers: Use shared volume or Redis-based coordination
Issue: No worker becomes primary
Cause: Lock file from previous run still exists.
Solution:
# Clean up stale lock file
rm /tmp/alpinebits_primary_worker.lock
# Restart application
Issue: Duplicate emails still being sent
Cause: Email handler running on all workers (not just schedulers).
Solution: Email alert handler runs on all workers (to catch errors from any worker). Email scheduler only runs on primary. This is correct behavior - alerts come from any worker, scheduled reports only from primary.
Alternative Approaches
Redis-Based Coordination
For multi-container deployments, consider Redis-based locks:
import redis
from redis.lock import Lock
redis_client = redis.Redis(host='redis', port=6379)
lock = Lock(redis_client, "alpinebits_primary_worker", timeout=60)
if lock.acquire(blocking=False):
# This is the primary worker
start_schedulers()
Pros: Works across containers Cons: Requires Redis dependency
Environment Variable (Not Recommended)
# Manually set primary worker
ALPINEBITS_PRIMARY_WORKER=true uvicorn app:app
Pros: Simple Cons: Manual configuration, no automatic failover
Best Practices
- ✅ Use file locks for single-host deployments (our implementation)
- ✅ Use Redis locks for multi-container deployments
- ✅ Log primary/secondary status at startup
- ✅ Always release locks on shutdown
- ✅ Keep lock files in
/var/run/or/tmp/ - ❌ Don't rely on process names (unreliable with uvicorn)
- ❌ Don't use environment variables (no automatic failover)
- ❌ Don't skip coordination (will cause duplicate notifications)
Summary
With file-based worker coordination:
- ✅ Only ONE worker runs singleton services (schedulers, migrations)
- ✅ All workers handle HTTP requests normally
- ✅ Automatic failover if primary worker crashes
- ✅ No external dependencies needed
- ✅ Works with uvicorn, gunicorn, and other ASGI servers
This ensures you get the benefits of multiple workers (concurrency) without duplicate email notifications or race conditions.