Mostly ready for first test run but there is one improvement I want to implement first

This commit is contained in:
Jonas Linter
2025-10-21 17:46:27 +02:00
parent 6e4cc7ed1d
commit ec10ca51e0
8 changed files with 1612 additions and 28 deletions

486
DEPLOYMENT.md Normal file
View File

@@ -0,0 +1,486 @@
# Deployment Guide - Meta API Grabber
## Quick Start (Test Deployment for Tonight)
### 1. Get a Fresh Access Token
Run the OAuth flow to get a new long-lived token (60 days):
```bash
uv run python -m meta_api_grabber.auth
```
This will:
- Open browser for OAuth authorization
- Exchange short-lived token for long-lived token (60 days)
- Save token to `.env` and `.meta_token.json`
- Token will auto-refresh before expiry ✅
### 2. Verify Your `.env` File
Ensure your `.env` has these variables:
```bash
# Meta API Credentials
META_APP_ID=your_app_id
META_APP_SECRET=your_app_secret
META_ACCESS_TOKEN=your_long_lived_token # From step 1
# Database (docker-compose handles this)
DATABASE_URL=postgresql://meta_user:meta_password@localhost:5555/meta_insights
```
### 3. Build and Start Everything
```bash
# Build the Docker image and start all services
docker-compose up -d --build
```
This starts:
- **timescaledb**: Database for storing insights
- **meta-grabber**: Your data collection service ⭐
- **grafana**: Visualization dashboard (optional)
### 4. Monitor the Logs
```bash
# Watch the grabber logs in real-time
docker-compose logs -f meta-grabber
# Expected output:
# ============================================================
# SCHEDULED INSIGHTS GRABBER STARTED
# ============================================================
# ✅ Token valid (X days remaining)
# Loading accessible ad accounts...
# Loaded X ad account(s)
# Collection interval: 2.0 hours
# ============================================================
#
# COLLECTION CYCLE - 2025-10-21T...
# ============================================================
# Processing X ad account(s)
# ...
```
### 5. Verify It's Running
```bash
# Check container status
docker-compose ps
# Should show:
# NAME STATUS PORTS
# meta_timescaledb Up (healthy) 0.0.0.0:5555->5432/tcp
# meta_api_grabber Up
# meta_grafana Up 0.0.0.0:3555->3000/tcp
```
### 6. Let It Run Overnight
The service will:
- ✅ Collect "today" data every 2 hours
- ✅ Detect when a new day starts
- ✅ Fetch "yesterday" data immediately when new day is detected
- ✅ Update "yesterday" data every 12 hours
- ✅ Auto-refresh the access token before it expires
- ✅ Restart automatically if it crashes (`restart: unless-stopped`)
## Token Auto-Refresh
### How It Works
The system uses `MetaTokenManager` which:
1. **On startup**: Checks if token expires within 7 days
2. **If expiring soon**: Exchanges current token for a new long-lived token
3. **Saves new token**: Updates both `.env` and `.meta_token.json`
4. **Every cycle**: Re-checks token validity before fetching data
### Token Lifecycle
```
New Token (via OAuth)
60 days validity
Day 53 (7 days before expiry)
Auto-refresh triggered
New 60-day token issued
Cycle repeats indefinitely ♾️
```
### What If Token Expires?
If the token somehow expires (e.g., manual revocation):
- Container will **error out immediately** with clear message
- Logs will show: `❌ Fatal error - Token validation failed`
- Container stops (won't waste API calls)
- You'll see it in: `docker-compose logs meta-grabber`
**To fix**:
1. Stop the container: `docker-compose stop meta-grabber`
2. Get new token: `uv run python -m meta_api_grabber.auth`
3. Restart: `docker-compose up -d meta-grabber`
## Data Collection Schedule
### Normal Operation (Same Day)
```
00:00 - Cycle 1: Fetch "today" (2025-10-21)
02:00 - Cycle 2: Fetch "today" (2025-10-21)
04:00 - Cycle 3: Fetch "today" (2025-10-21)
...
22:00 - Cycle 12: Fetch "today" (2025-10-21)
```
### When New Day Starts
```
00:00 - Cycle 13:
- Fetch "today" (2025-10-22) ← New date detected!
- 📅 New day detected: 2025-10-21 -> 2025-10-22
- Fetch "yesterday" (2025-10-21) immediately
02:00 - Cycle 14:
- Fetch "today" (2025-10-22)
- Skip "yesterday" (< 12h since last fetch)
...
12:00 - Cycle 19:
- Fetch "today" (2025-10-22)
- Update "yesterday" (12h passed since last fetch)
```
## Checking Data in Database
### Connect to Database
```bash
# From host machine
docker exec -it meta_timescaledb psql -U meta_user -d meta_insights
# Or using psql directly
psql -h localhost -p 5555 -U meta_user -d meta_insights
# Password: meta_password
```
### Query Today's Data
```sql
SELECT
time,
account_id,
date_preset,
date_start,
impressions,
spend
FROM account_insights
WHERE date_preset = 'today'
ORDER BY time DESC
LIMIT 10;
```
### Query Yesterday's Data
```sql
SELECT
time,
account_id,
date_preset,
date_start,
impressions,
spend
FROM account_insights
WHERE date_preset = 'yesterday'
ORDER BY time DESC
LIMIT 10;
```
### Check Last Collection Time
```sql
SELECT
date_preset,
MAX(fetched_at) as last_fetch,
COUNT(*) as total_records
FROM account_insights
GROUP BY date_preset;
```
## Stopping and Restarting
### Stop Everything
```bash
docker-compose down
```
This stops all containers but **preserves data**:
- ✅ Database data (in volume `timescale_data`)
- ✅ Token files (mounted from host: `.env`, `.meta_token.json`)
- ✅ Grafana dashboards (in volume `grafana_data`)
### Stop Just the Grabber
```bash
docker-compose stop meta-grabber
```
### Restart the Grabber
```bash
docker-compose restart meta-grabber
```
### View Logs
```bash
# Follow logs in real-time
docker-compose logs -f meta-grabber
# Last 100 lines
docker-compose logs --tail=100 meta-grabber
# All services
docker-compose logs -f
```
## Configuration
### Adjusting Collection Interval
Edit [scheduled_grabber.py](src/meta_api_grabber/scheduled_grabber.py) line 522:
```python
await grabber.run_scheduled(
interval_hours=2.0, # ← Change this (in hours)
refresh_metadata_every_n_cycles=12,
)
```
Then rebuild:
```bash
docker-compose up -d --build meta-grabber
```
### Adjusting Number of Accounts
Edit [scheduled_grabber.py](src/meta_api_grabber/scheduled_grabber.py) line 519:
```python
grabber = ScheduledInsightsGrabber(
max_accounts=3, # ← Change this (None = all accounts)
)
```
### Adjusting Yesterday Fetch Interval
Currently hardcoded to 12 hours in `_should_fetch_yesterday()` method at line 175.
To change, edit:
```python
return hours_since_last_fetch >= 12.0 # ← Change to 6.0 for 6 hours, etc.
```
## Troubleshooting
### Container Keeps Restarting
```bash
# Check logs for error
docker-compose logs meta-grabber
# Common issues:
# 1. Token invalid → Get new token
# 2. Database not ready → Wait for timescaledb health check
# 3. Missing .env file → Create from .env.example
```
### No Data Being Collected
```bash
# Check if grabber is running
docker-compose ps
# Check logs for API errors
docker-compose logs meta-grabber | grep "Error"
# Verify token
uv run python -m meta_api_grabber.token_manager
```
### Database Connection Failed
```bash
# Check if TimescaleDB is healthy
docker-compose ps timescaledb
# Should show: "Up (healthy)"
# If not healthy, check TimescaleDB logs
docker-compose logs timescaledb
```
### Yesterday Data Not Appearing
Check logs for:
```
📅 New day detected: YYYY-MM-DD -> YYYY-MM-DD
Fetching yesterday's data (first time)
```
If you don't see this, the system hasn't detected a new day yet.
To force a test:
1. Stop grabber: `docker-compose stop meta-grabber`
2. Manually insert yesterday data (see manual testing section)
3. Restart: `docker-compose start meta-grabber`
## Manual Testing (Before Overnight Run)
### Test Token Validity
```bash
# This will check token and auto-refresh if needed
uv run python -m meta_api_grabber.token_manager
```
### Test Single Collection Cycle
```bash
# Run one cycle without Docker
uv run python -c "
import asyncio
from src.meta_api_grabber.scheduled_grabber import ScheduledInsightsGrabber
async def test():
grabber = ScheduledInsightsGrabber(max_accounts=1)
await grabber.db.connect()
await grabber.db.initialize_schema()
await grabber.load_ad_accounts()
await grabber.run_collection_cycle()
await grabber.db.close()
asyncio.run(test())
"
```
### Verify Database Schema
```bash
docker exec -it meta_timescaledb psql -U meta_user -d meta_insights -c "\dt"
# Should show:
# account_insights
# campaign_insights
# adset_insights
# ad_accounts
# campaigns
# adsets
```
## Monitoring in Production
### Health Checks
The container has a built-in health check:
```bash
docker inspect meta_api_grabber | grep -A 5 Health
```
### Resource Usage
```bash
# Monitor container resources
docker stats meta_api_grabber
```
### Log Rotation
Logs are automatically rotated (see docker-compose.yml):
- Max size: 10MB per file
- Max files: 3
- Total max: ~30MB of logs
## Backup Considerations
### What to Backup
1. **Database** (most important):
```bash
docker exec meta_timescaledb pg_dump -U meta_user meta_insights > backup.sql
```
2. **Token files**:
```bash
cp .env .env.backup
cp .meta_token.json .meta_token.json.backup
```
3. **Configuration**:
- `.env`
- `docker-compose.yml`
### Restore from Backup
```bash
# Restore database
docker exec -i meta_timescaledb psql -U meta_user meta_insights < backup.sql
# Restore token files
cp .env.backup .env
cp .meta_token.json.backup .meta_token.json
# Restart
docker-compose restart meta-grabber
```
## Production Checklist
Before leaving it running overnight:
- [ ] Fresh access token obtained (60 days validity)
- [ ] `.env` file has all required variables
- [ ] `.meta_token.json` exists with token metadata
- [ ] `docker-compose up -d --build` succeeded
- [ ] All containers show "Up" in `docker-compose ps`
- [ ] Logs show successful data collection
- [ ] Database contains data (`SELECT COUNT(*) FROM account_insights`)
- [ ] Token auto-refresh is enabled (`auto_refresh_token=True`)
- [ ] Restart policy is set (`restart: unless-stopped`)
## Summary
To deploy for overnight testing:
```bash
# 1. Get token
uv run python -m meta_api_grabber.auth
# 2. Start everything
docker-compose up -d --build
# 3. Verify it's working
docker-compose logs -f meta-grabber
# 4. Let it run!
# Come back tomorrow and check:
docker-compose logs meta-grabber | grep "New day detected"
```
The system will handle everything automatically:
- ✅ Data collection every 2 hours
- ✅ New day detection
- ✅ Yesterday data collection
- ✅ Token auto-refresh
- ✅ Auto-restart on failures
Sleep well! 😴