269 lines
7.7 KiB
Markdown
269 lines
7.7 KiB
Markdown
# Yesterday Data Collection Feature
|
|
|
|
## Overview
|
|
|
|
This implementation extends the Meta API grabber to automatically collect yesterday's data using Meta's `yesterday` date preset. The system intelligently detects when a new day starts and manages yesterday data collection with the following logic:
|
|
|
|
## Key Features
|
|
|
|
### 1. New Day Detection
|
|
- Monitors the `date_start` field from "today" preset data
|
|
- Detects when `date_start` changes to a new date
|
|
- Triggers immediate fetch of yesterday's data when a new day is detected
|
|
|
|
### 2. Yesterday Data Collection Strategy
|
|
- **First fetch**: When a new day is detected, fetch yesterday's data immediately
|
|
- **Periodic updates**: Update yesterday's data every 12 hours
|
|
- **Rationale**: Meta updates historical data, so refreshing ensures accuracy
|
|
|
|
### 3. Token Validation & Error Handling
|
|
- Validates access token on every API request
|
|
- Catches OAuth token errors (error codes 190, 102)
|
|
- Raises `ValueError` with clear error messages for invalid tokens
|
|
- Stops execution immediately if token is invalid (fail-fast approach)
|
|
|
|
## Implementation Details
|
|
|
|
### Modified Files
|
|
|
|
#### `src/meta_api_grabber/scheduled_grabber.py`
|
|
|
|
**New State Tracking:**
|
|
```python
|
|
# Track current date for detecting day changes
|
|
self.current_date: Optional[date] = None
|
|
|
|
# Track when yesterday data was last fetched
|
|
self.yesterday_last_fetched: Optional[datetime] = None
|
|
```
|
|
|
|
**New Methods:**
|
|
|
|
1. `_check_new_day(today_date_start: Optional[date]) -> bool`
|
|
- Compares today's date_start with tracked current_date
|
|
- Returns True if a new day has been detected
|
|
|
|
2. `_should_fetch_yesterday() -> bool`
|
|
- Returns True if yesterday data has never been fetched
|
|
- Returns True if 12+ hours have passed since last fetch
|
|
|
|
3. `_validate_token()`
|
|
- Validates the access token using token_manager
|
|
- Raises ValueError if token is invalid
|
|
|
|
**Enhanced Methods:**
|
|
|
|
- `grab_account_insights()`: Now returns the `date_start` value and handles token errors
|
|
- `grab_campaign_insights()`: Added token error handling
|
|
- `grab_adset_insights()`: Added token error handling
|
|
- `refresh_token_if_needed()`: Now raises ValueError on token validation failure
|
|
- `run_collection_cycle()`: Implements the yesterday data collection logic
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
Collection Cycle Start
|
|
↓
|
|
Fetch "today" data for all accounts
|
|
↓
|
|
Extract date_start from today's data
|
|
↓
|
|
Check if new day detected
|
|
↓ (if yes)
|
|
Reset yesterday_last_fetched
|
|
↓
|
|
Check if should fetch yesterday
|
|
↓ (if yes)
|
|
Fetch "yesterday" data for all accounts
|
|
↓
|
|
Update yesterday_last_fetched timestamp
|
|
↓
|
|
Collection Cycle Complete
|
|
```
|
|
|
|
## Testing Instructions
|
|
|
|
### 1. Local Testing (Without API Calls)
|
|
|
|
Check syntax and imports:
|
|
```bash
|
|
uv run python -m py_compile src/meta_api_grabber/scheduled_grabber.py
|
|
```
|
|
|
|
### 2. Docker Container Test
|
|
|
|
Start the TimescaleDB container:
|
|
```bash
|
|
docker-compose up -d timescaledb
|
|
```
|
|
|
|
Wait for the database to be healthy:
|
|
```bash
|
|
docker-compose ps
|
|
```
|
|
|
|
### 3. Test Token Validation
|
|
|
|
Intentionally use an invalid token to verify error handling:
|
|
```bash
|
|
# Set an invalid token in .env temporarily
|
|
META_ACCESS_TOKEN=invalid_token_for_testing
|
|
|
|
# Run the grabber - it should error out immediately
|
|
uv run python src/meta_api_grabber/scheduled_grabber.py
|
|
```
|
|
|
|
Expected output: Clear error message about invalid token
|
|
|
|
### 4. Production Test Run
|
|
|
|
**Important**: Before running with real token:
|
|
|
|
1. Ensure `.env` has valid credentials:
|
|
```
|
|
META_ACCESS_TOKEN=<your_valid_token>
|
|
META_APP_ID=<your_app_id>
|
|
META_APP_SECRET=<your_app_secret>
|
|
DATABASE_URL=postgresql://meta_user:meta_password@localhost:5555/meta_insights
|
|
```
|
|
|
|
2. Run a single cycle to verify:
|
|
```bash
|
|
# This will run one collection cycle and exit
|
|
uv run python -c "
|
|
import asyncio
|
|
from src.meta_api_grabber.scheduled_grabber import ScheduledInsightsGrabber
|
|
|
|
async def test():
|
|
grabber = ScheduledInsightsGrabber(max_accounts=1)
|
|
async with grabber.db:
|
|
await grabber.db.connect()
|
|
await grabber.db.initialize_schema()
|
|
await grabber.load_ad_accounts()
|
|
await grabber.run_collection_cycle()
|
|
|
|
asyncio.run(test())
|
|
"
|
|
```
|
|
|
|
### 5. Monitor Yesterday Data Collection
|
|
|
|
The system will:
|
|
- **First run**: Collect today's data, detect current date
|
|
- **Subsequent runs**: Continue collecting today's data every 2 hours
|
|
- **When new day starts**:
|
|
- Log message: "📅 New day detected: YYYY-MM-DD -> YYYY-MM-DD"
|
|
- Immediately fetch yesterday's data
|
|
- **Every 12 hours**: Update yesterday's data
|
|
|
|
Check database to verify yesterday data is being stored:
|
|
```sql
|
|
-- Connect to TimescaleDB
|
|
psql -U meta_user -d meta_insights -h localhost -p 5555
|
|
|
|
-- Check yesterday data
|
|
SELECT
|
|
time,
|
|
account_id,
|
|
date_preset,
|
|
date_start,
|
|
date_stop,
|
|
impressions,
|
|
spend
|
|
FROM account_insights
|
|
WHERE date_preset = 'yesterday'
|
|
ORDER BY time DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
## Expected Behavior
|
|
|
|
### Scenario 1: Fresh Start
|
|
- Cycle 1: Fetch today's data, initialize current_date
|
|
- Cycle 2: Fetch today's data, fetch yesterday's data (first time)
|
|
- Cycle 3-6: Fetch today's data only
|
|
- Cycle 7: Fetch today's data, update yesterday's data (12h passed)
|
|
|
|
### Scenario 2: Day Change
|
|
- Cycle N: Today is 2025-10-21, fetch today's data
|
|
- Cycle N+1: Today is 2025-10-22 (new day!)
|
|
- Log: "📅 New day detected: 2025-10-21 -> 2025-10-22"
|
|
- Fetch today's data (2025-10-22)
|
|
- Fetch yesterday's data (2025-10-21)
|
|
|
|
### Scenario 3: Invalid Token
|
|
- Any cycle with invalid token:
|
|
- Error immediately with clear message
|
|
- Stop execution (don't continue to other accounts)
|
|
- Exit with non-zero status code
|
|
|
|
## Deployment Notes
|
|
|
|
### Docker Production Deployment
|
|
|
|
The implementation is designed to run continuously in a Docker container. If token authentication fails:
|
|
|
|
1. Container will error out and stop
|
|
2. This prevents unnecessary API calls with invalid credentials
|
|
3. You'll see clear error messages in container logs
|
|
4. Fix token issues before restarting
|
|
|
|
### Monitoring Recommendations
|
|
|
|
1. **Check logs regularly** for:
|
|
- "📅 New day detected" messages
|
|
- "Fetching yesterday's data" messages
|
|
- Token validation errors
|
|
|
|
2. **Database monitoring**:
|
|
- Verify yesterday data is being updated
|
|
- Check for gaps in date_start/date_stop values
|
|
|
|
3. **Token expiry**:
|
|
- System uses automatic token refresh (if enabled)
|
|
- Monitor for token expiration warnings
|
|
|
|
## Configuration Options
|
|
|
|
In `src/meta_api_grabber/scheduled_grabber.py` `async_main()`:
|
|
|
|
```python
|
|
grabber = ScheduledInsightsGrabber(
|
|
max_accounts=3, # Limit number of accounts for testing
|
|
auto_refresh_token=True # Enable automatic token refresh
|
|
)
|
|
|
|
await grabber.run_scheduled(
|
|
interval_hours=2.0, # How often to collect today's data
|
|
refresh_metadata_every_n_cycles=12, # How often to refresh metadata cache
|
|
)
|
|
```
|
|
|
|
**Yesterday fetch interval is hardcoded to 12 hours** in the `_should_fetch_yesterday()` method.
|
|
|
|
## Troubleshooting
|
|
|
|
### Yesterday data not being fetched
|
|
- Check logs for "Fetching yesterday's data" messages
|
|
- Verify `date_start` is being extracted from today's data
|
|
- Check `self.current_date` is being initialized
|
|
|
|
### Token errors not stopping execution
|
|
- Ensure `ValueError` is being raised in grab methods
|
|
- Check that `except ValueError` blocks are re-raising the exception
|
|
|
|
### Database issues
|
|
- Verify `date_start` and `date_stop` columns exist in all insights tables
|
|
- Run schema initialization: `await self.db.initialize_schema()`
|
|
|
|
## Summary
|
|
|
|
The implementation successfully adds:
|
|
- ✅ New day detection via `date_start` monitoring
|
|
- ✅ Automatic yesterday data collection on day change
|
|
- ✅ 12-hour update cycle for yesterday data
|
|
- ✅ Token validation with fail-fast error handling
|
|
- ✅ Clear logging for debugging and monitoring
|
|
|
|
This ensures complete and accurate historical data collection with minimal API usage.
|