Files
meta_api_grabber/YESTERDAY_DATA_FEATURE.md

269 lines
7.7 KiB
Markdown

# Yesterday Data Collection Feature
## Overview
This implementation extends the Meta API grabber to automatically collect yesterday's data using Meta's `yesterday` date preset. The system intelligently detects when a new day starts and manages yesterday data collection with the following logic:
## Key Features
### 1. New Day Detection
- Monitors the `date_start` field from "today" preset data
- Detects when `date_start` changes to a new date
- Triggers immediate fetch of yesterday's data when a new day is detected
### 2. Yesterday Data Collection Strategy
- **First fetch**: When a new day is detected, fetch yesterday's data immediately
- **Periodic updates**: Update yesterday's data every 12 hours
- **Rationale**: Meta updates historical data, so refreshing ensures accuracy
### 3. Token Validation & Error Handling
- Validates access token on every API request
- Catches OAuth token errors (error codes 190, 102)
- Raises `ValueError` with clear error messages for invalid tokens
- Stops execution immediately if token is invalid (fail-fast approach)
## Implementation Details
### Modified Files
#### `src/meta_api_grabber/scheduled_grabber.py`
**New State Tracking:**
```python
# Track current date for detecting day changes
self.current_date: Optional[date] = None
# Track when yesterday data was last fetched
self.yesterday_last_fetched: Optional[datetime] = None
```
**New Methods:**
1. `_check_new_day(today_date_start: Optional[date]) -> bool`
- Compares today's date_start with tracked current_date
- Returns True if a new day has been detected
2. `_should_fetch_yesterday() -> bool`
- Returns True if yesterday data has never been fetched
- Returns True if 12+ hours have passed since last fetch
3. `_validate_token()`
- Validates the access token using token_manager
- Raises ValueError if token is invalid
**Enhanced Methods:**
- `grab_account_insights()`: Now returns the `date_start` value and handles token errors
- `grab_campaign_insights()`: Added token error handling
- `grab_adset_insights()`: Added token error handling
- `refresh_token_if_needed()`: Now raises ValueError on token validation failure
- `run_collection_cycle()`: Implements the yesterday data collection logic
### Data Flow
```
Collection Cycle Start
Fetch "today" data for all accounts
Extract date_start from today's data
Check if new day detected
↓ (if yes)
Reset yesterday_last_fetched
Check if should fetch yesterday
↓ (if yes)
Fetch "yesterday" data for all accounts
Update yesterday_last_fetched timestamp
Collection Cycle Complete
```
## Testing Instructions
### 1. Local Testing (Without API Calls)
Check syntax and imports:
```bash
uv run python -m py_compile src/meta_api_grabber/scheduled_grabber.py
```
### 2. Docker Container Test
Start the TimescaleDB container:
```bash
docker-compose up -d timescaledb
```
Wait for the database to be healthy:
```bash
docker-compose ps
```
### 3. Test Token Validation
Intentionally use an invalid token to verify error handling:
```bash
# Set an invalid token in .env temporarily
META_ACCESS_TOKEN=invalid_token_for_testing
# Run the grabber - it should error out immediately
uv run python src/meta_api_grabber/scheduled_grabber.py
```
Expected output: Clear error message about invalid token
### 4. Production Test Run
**Important**: Before running with real token:
1. Ensure `.env` has valid credentials:
```
META_ACCESS_TOKEN=<your_valid_token>
META_APP_ID=<your_app_id>
META_APP_SECRET=<your_app_secret>
DATABASE_URL=postgresql://meta_user:meta_password@localhost:5555/meta_insights
```
2. Run a single cycle to verify:
```bash
# This will run one collection cycle and exit
uv run python -c "
import asyncio
from src.meta_api_grabber.scheduled_grabber import ScheduledInsightsGrabber
async def test():
grabber = ScheduledInsightsGrabber(max_accounts=1)
async with grabber.db:
await grabber.db.connect()
await grabber.db.initialize_schema()
await grabber.load_ad_accounts()
await grabber.run_collection_cycle()
asyncio.run(test())
"
```
### 5. Monitor Yesterday Data Collection
The system will:
- **First run**: Collect today's data, detect current date
- **Subsequent runs**: Continue collecting today's data every 2 hours
- **When new day starts**:
- Log message: "📅 New day detected: YYYY-MM-DD -> YYYY-MM-DD"
- Immediately fetch yesterday's data
- **Every 12 hours**: Update yesterday's data
Check database to verify yesterday data is being stored:
```sql
-- Connect to TimescaleDB
psql -U meta_user -d meta_insights -h localhost -p 5555
-- Check yesterday data
SELECT
time,
account_id,
date_preset,
date_start,
date_stop,
impressions,
spend
FROM account_insights
WHERE date_preset = 'yesterday'
ORDER BY time DESC
LIMIT 10;
```
## Expected Behavior
### Scenario 1: Fresh Start
- Cycle 1: Fetch today's data, initialize current_date
- Cycle 2: Fetch today's data, fetch yesterday's data (first time)
- Cycle 3-6: Fetch today's data only
- Cycle 7: Fetch today's data, update yesterday's data (12h passed)
### Scenario 2: Day Change
- Cycle N: Today is 2025-10-21, fetch today's data
- Cycle N+1: Today is 2025-10-22 (new day!)
- Log: "📅 New day detected: 2025-10-21 -> 2025-10-22"
- Fetch today's data (2025-10-22)
- Fetch yesterday's data (2025-10-21)
### Scenario 3: Invalid Token
- Any cycle with invalid token:
- Error immediately with clear message
- Stop execution (don't continue to other accounts)
- Exit with non-zero status code
## Deployment Notes
### Docker Production Deployment
The implementation is designed to run continuously in a Docker container. If token authentication fails:
1. Container will error out and stop
2. This prevents unnecessary API calls with invalid credentials
3. You'll see clear error messages in container logs
4. Fix token issues before restarting
### Monitoring Recommendations
1. **Check logs regularly** for:
- "📅 New day detected" messages
- "Fetching yesterday's data" messages
- Token validation errors
2. **Database monitoring**:
- Verify yesterday data is being updated
- Check for gaps in date_start/date_stop values
3. **Token expiry**:
- System uses automatic token refresh (if enabled)
- Monitor for token expiration warnings
## Configuration Options
In `src/meta_api_grabber/scheduled_grabber.py` `async_main()`:
```python
grabber = ScheduledInsightsGrabber(
max_accounts=3, # Limit number of accounts for testing
auto_refresh_token=True # Enable automatic token refresh
)
await grabber.run_scheduled(
interval_hours=2.0, # How often to collect today's data
refresh_metadata_every_n_cycles=12, # How often to refresh metadata cache
)
```
**Yesterday fetch interval is hardcoded to 12 hours** in the `_should_fetch_yesterday()` method.
## Troubleshooting
### Yesterday data not being fetched
- Check logs for "Fetching yesterday's data" messages
- Verify `date_start` is being extracted from today's data
- Check `self.current_date` is being initialized
### Token errors not stopping execution
- Ensure `ValueError` is being raised in grab methods
- Check that `except ValueError` blocks are re-raising the exception
### Database issues
- Verify `date_start` and `date_stop` columns exist in all insights tables
- Run schema initialization: `await self.db.initialize_schema()`
## Summary
The implementation successfully adds:
- ✅ New day detection via `date_start` monitoring
- ✅ Automatic yesterday data collection on day change
- ✅ 12-hour update cycle for yesterday data
- ✅ Token validation with fail-fast error handling
- ✅ Clear logging for debugging and monitoring
This ensures complete and accurate historical data collection with minimal API usage.