109 lines
2.7 KiB
Markdown
109 lines
2.7 KiB
Markdown
# Multi-Worker Quick Reference
|
|
|
|
## TL;DR
|
|
|
|
**Problem**: Using 4 workers causes duplicate emails and race conditions.
|
|
|
|
**Solution**: File-based locking ensures only ONE worker runs schedulers.
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
# Development (1 worker - auto primary)
|
|
uvicorn alpine_bits_python.api:app --reload
|
|
|
|
# Production (4 workers - one becomes primary)
|
|
uvicorn alpine_bits_python.api:app --workers 4 --host 0.0.0.0 --port 8000
|
|
|
|
# Test worker coordination
|
|
uv run python test_worker_coordination.py
|
|
|
|
# Run all tests
|
|
uv run pytest tests/ -v
|
|
```
|
|
|
|
## Check Which Worker is Primary
|
|
|
|
Look for startup logs:
|
|
|
|
```
|
|
[INFO] Worker startup: pid=1001, primary=True ← PRIMARY
|
|
[INFO] Worker startup: pid=1002, primary=False ← SECONDARY
|
|
[INFO] Worker startup: pid=1003, primary=False ← SECONDARY
|
|
[INFO] Worker startup: pid=1004, primary=False ← SECONDARY
|
|
[INFO] Daily report scheduler started ← Only on PRIMARY
|
|
```
|
|
|
|
## Lock File
|
|
|
|
**Location**: `/tmp/alpinebits_primary_worker.lock`
|
|
|
|
**Check lock status**:
|
|
```bash
|
|
# See which PID holds the lock
|
|
cat /tmp/alpinebits_primary_worker.lock
|
|
# Output: 1001
|
|
|
|
# Verify process is running
|
|
ps aux | grep 1001
|
|
```
|
|
|
|
**Clean stale lock** (if needed):
|
|
```bash
|
|
rm /tmp/alpinebits_primary_worker.lock
|
|
# Then restart application
|
|
```
|
|
|
|
## What Runs Where
|
|
|
|
| Service | Primary Worker | Secondary Workers |
|
|
|---------|---------------|-------------------|
|
|
| HTTP requests | ✓ Yes | ✓ Yes |
|
|
| Email scheduler | ✓ Yes | ✗ No |
|
|
| Error alerts | ✓ Yes | ✓ Yes (all workers can send) |
|
|
| DB migrations | ✓ Yes | ✗ No |
|
|
| Customer hashing | ✓ Yes | ✗ No |
|
|
|
|
## Troubleshooting
|
|
|
|
### All workers think they're primary
|
|
**Cause**: Lock file not accessible
|
|
**Fix**: Check permissions on `/tmp/` or change lock location
|
|
|
|
### No worker becomes primary
|
|
**Cause**: Stale lock file
|
|
**Fix**: `rm /tmp/alpinebits_primary_worker.lock` and restart
|
|
|
|
### Still getting duplicate emails
|
|
**Check**: Are you seeing duplicate **scheduled reports** or **error alerts**?
|
|
- Scheduled reports should only come from primary ✓
|
|
- Error alerts can come from any worker (by design) ✓
|
|
|
|
## Code Example
|
|
|
|
```python
|
|
from alpine_bits_python.worker_coordination import is_primary_worker
|
|
|
|
async def lifespan(app: FastAPI):
|
|
# Acquire lock - only one worker succeeds
|
|
is_primary, worker_lock = is_primary_worker()
|
|
|
|
if is_primary:
|
|
# Start singleton services
|
|
scheduler.start()
|
|
|
|
# All workers handle requests
|
|
yield
|
|
|
|
# Release lock on shutdown
|
|
if worker_lock:
|
|
worker_lock.release()
|
|
```
|
|
|
|
## Documentation
|
|
|
|
- **Full guide**: `docs/MULTI_WORKER_DEPLOYMENT.md`
|
|
- **Solution summary**: `SOLUTION_SUMMARY.md`
|
|
- **Implementation**: `src/alpine_bits_python/worker_coordination.py`
|
|
- **Test script**: `test_worker_coordination.py`
|