Mostly ready for first test run but there is one improvement I want to implement first
This commit is contained in:
268
YESTERDAY_DATA_FEATURE.md
Normal file
268
YESTERDAY_DATA_FEATURE.md
Normal file
@@ -0,0 +1,268 @@
|
||||
# Yesterday Data Collection Feature
|
||||
|
||||
## Overview
|
||||
|
||||
This implementation extends the Meta API grabber to automatically collect yesterday's data using Meta's `yesterday` date preset. The system intelligently detects when a new day starts and manages yesterday data collection with the following logic:
|
||||
|
||||
## Key Features
|
||||
|
||||
### 1. New Day Detection
|
||||
- Monitors the `date_start` field from "today" preset data
|
||||
- Detects when `date_start` changes to a new date
|
||||
- Triggers immediate fetch of yesterday's data when a new day is detected
|
||||
|
||||
### 2. Yesterday Data Collection Strategy
|
||||
- **First fetch**: When a new day is detected, fetch yesterday's data immediately
|
||||
- **Periodic updates**: Update yesterday's data every 12 hours
|
||||
- **Rationale**: Meta updates historical data, so refreshing ensures accuracy
|
||||
|
||||
### 3. Token Validation & Error Handling
|
||||
- Validates access token on every API request
|
||||
- Catches OAuth token errors (error codes 190, 102)
|
||||
- Raises `ValueError` with clear error messages for invalid tokens
|
||||
- Stops execution immediately if token is invalid (fail-fast approach)
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Modified Files
|
||||
|
||||
#### `src/meta_api_grabber/scheduled_grabber.py`
|
||||
|
||||
**New State Tracking:**
|
||||
```python
|
||||
# Track current date for detecting day changes
|
||||
self.current_date: Optional[date] = None
|
||||
|
||||
# Track when yesterday data was last fetched
|
||||
self.yesterday_last_fetched: Optional[datetime] = None
|
||||
```
|
||||
|
||||
**New Methods:**
|
||||
|
||||
1. `_check_new_day(today_date_start: Optional[date]) -> bool`
|
||||
- Compares today's date_start with tracked current_date
|
||||
- Returns True if a new day has been detected
|
||||
|
||||
2. `_should_fetch_yesterday() -> bool`
|
||||
- Returns True if yesterday data has never been fetched
|
||||
- Returns True if 12+ hours have passed since last fetch
|
||||
|
||||
3. `_validate_token()`
|
||||
- Validates the access token using token_manager
|
||||
- Raises ValueError if token is invalid
|
||||
|
||||
**Enhanced Methods:**
|
||||
|
||||
- `grab_account_insights()`: Now returns the `date_start` value and handles token errors
|
||||
- `grab_campaign_insights()`: Added token error handling
|
||||
- `grab_adset_insights()`: Added token error handling
|
||||
- `refresh_token_if_needed()`: Now raises ValueError on token validation failure
|
||||
- `run_collection_cycle()`: Implements the yesterday data collection logic
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
Collection Cycle Start
|
||||
↓
|
||||
Fetch "today" data for all accounts
|
||||
↓
|
||||
Extract date_start from today's data
|
||||
↓
|
||||
Check if new day detected
|
||||
↓ (if yes)
|
||||
Reset yesterday_last_fetched
|
||||
↓
|
||||
Check if should fetch yesterday
|
||||
↓ (if yes)
|
||||
Fetch "yesterday" data for all accounts
|
||||
↓
|
||||
Update yesterday_last_fetched timestamp
|
||||
↓
|
||||
Collection Cycle Complete
|
||||
```
|
||||
|
||||
## Testing Instructions
|
||||
|
||||
### 1. Local Testing (Without API Calls)
|
||||
|
||||
Check syntax and imports:
|
||||
```bash
|
||||
uv run python -m py_compile src/meta_api_grabber/scheduled_grabber.py
|
||||
```
|
||||
|
||||
### 2. Docker Container Test
|
||||
|
||||
Start the TimescaleDB container:
|
||||
```bash
|
||||
docker-compose up -d timescaledb
|
||||
```
|
||||
|
||||
Wait for the database to be healthy:
|
||||
```bash
|
||||
docker-compose ps
|
||||
```
|
||||
|
||||
### 3. Test Token Validation
|
||||
|
||||
Intentionally use an invalid token to verify error handling:
|
||||
```bash
|
||||
# Set an invalid token in .env temporarily
|
||||
META_ACCESS_TOKEN=invalid_token_for_testing
|
||||
|
||||
# Run the grabber - it should error out immediately
|
||||
uv run python src/meta_api_grabber/scheduled_grabber.py
|
||||
```
|
||||
|
||||
Expected output: Clear error message about invalid token
|
||||
|
||||
### 4. Production Test Run
|
||||
|
||||
**Important**: Before running with real token:
|
||||
|
||||
1. Ensure `.env` has valid credentials:
|
||||
```
|
||||
META_ACCESS_TOKEN=<your_valid_token>
|
||||
META_APP_ID=<your_app_id>
|
||||
META_APP_SECRET=<your_app_secret>
|
||||
DATABASE_URL=postgresql://meta_user:meta_password@localhost:5555/meta_insights
|
||||
```
|
||||
|
||||
2. Run a single cycle to verify:
|
||||
```bash
|
||||
# This will run one collection cycle and exit
|
||||
uv run python -c "
|
||||
import asyncio
|
||||
from src.meta_api_grabber.scheduled_grabber import ScheduledInsightsGrabber
|
||||
|
||||
async def test():
|
||||
grabber = ScheduledInsightsGrabber(max_accounts=1)
|
||||
async with grabber.db:
|
||||
await grabber.db.connect()
|
||||
await grabber.db.initialize_schema()
|
||||
await grabber.load_ad_accounts()
|
||||
await grabber.run_collection_cycle()
|
||||
|
||||
asyncio.run(test())
|
||||
"
|
||||
```
|
||||
|
||||
### 5. Monitor Yesterday Data Collection
|
||||
|
||||
The system will:
|
||||
- **First run**: Collect today's data, detect current date
|
||||
- **Subsequent runs**: Continue collecting today's data every 2 hours
|
||||
- **When new day starts**:
|
||||
- Log message: "📅 New day detected: YYYY-MM-DD -> YYYY-MM-DD"
|
||||
- Immediately fetch yesterday's data
|
||||
- **Every 12 hours**: Update yesterday's data
|
||||
|
||||
Check database to verify yesterday data is being stored:
|
||||
```sql
|
||||
-- Connect to TimescaleDB
|
||||
psql -U meta_user -d meta_insights -h localhost -p 5555
|
||||
|
||||
-- Check yesterday data
|
||||
SELECT
|
||||
time,
|
||||
account_id,
|
||||
date_preset,
|
||||
date_start,
|
||||
date_stop,
|
||||
impressions,
|
||||
spend
|
||||
FROM account_insights
|
||||
WHERE date_preset = 'yesterday'
|
||||
ORDER BY time DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
## Expected Behavior
|
||||
|
||||
### Scenario 1: Fresh Start
|
||||
- Cycle 1: Fetch today's data, initialize current_date
|
||||
- Cycle 2: Fetch today's data, fetch yesterday's data (first time)
|
||||
- Cycle 3-6: Fetch today's data only
|
||||
- Cycle 7: Fetch today's data, update yesterday's data (12h passed)
|
||||
|
||||
### Scenario 2: Day Change
|
||||
- Cycle N: Today is 2025-10-21, fetch today's data
|
||||
- Cycle N+1: Today is 2025-10-22 (new day!)
|
||||
- Log: "📅 New day detected: 2025-10-21 -> 2025-10-22"
|
||||
- Fetch today's data (2025-10-22)
|
||||
- Fetch yesterday's data (2025-10-21)
|
||||
|
||||
### Scenario 3: Invalid Token
|
||||
- Any cycle with invalid token:
|
||||
- Error immediately with clear message
|
||||
- Stop execution (don't continue to other accounts)
|
||||
- Exit with non-zero status code
|
||||
|
||||
## Deployment Notes
|
||||
|
||||
### Docker Production Deployment
|
||||
|
||||
The implementation is designed to run continuously in a Docker container. If token authentication fails:
|
||||
|
||||
1. Container will error out and stop
|
||||
2. This prevents unnecessary API calls with invalid credentials
|
||||
3. You'll see clear error messages in container logs
|
||||
4. Fix token issues before restarting
|
||||
|
||||
### Monitoring Recommendations
|
||||
|
||||
1. **Check logs regularly** for:
|
||||
- "📅 New day detected" messages
|
||||
- "Fetching yesterday's data" messages
|
||||
- Token validation errors
|
||||
|
||||
2. **Database monitoring**:
|
||||
- Verify yesterday data is being updated
|
||||
- Check for gaps in date_start/date_stop values
|
||||
|
||||
3. **Token expiry**:
|
||||
- System uses automatic token refresh (if enabled)
|
||||
- Monitor for token expiration warnings
|
||||
|
||||
## Configuration Options
|
||||
|
||||
In `src/meta_api_grabber/scheduled_grabber.py` `async_main()`:
|
||||
|
||||
```python
|
||||
grabber = ScheduledInsightsGrabber(
|
||||
max_accounts=3, # Limit number of accounts for testing
|
||||
auto_refresh_token=True # Enable automatic token refresh
|
||||
)
|
||||
|
||||
await grabber.run_scheduled(
|
||||
interval_hours=2.0, # How often to collect today's data
|
||||
refresh_metadata_every_n_cycles=12, # How often to refresh metadata cache
|
||||
)
|
||||
```
|
||||
|
||||
**Yesterday fetch interval is hardcoded to 12 hours** in the `_should_fetch_yesterday()` method.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Yesterday data not being fetched
|
||||
- Check logs for "Fetching yesterday's data" messages
|
||||
- Verify `date_start` is being extracted from today's data
|
||||
- Check `self.current_date` is being initialized
|
||||
|
||||
### Token errors not stopping execution
|
||||
- Ensure `ValueError` is being raised in grab methods
|
||||
- Check that `except ValueError` blocks are re-raising the exception
|
||||
|
||||
### Database issues
|
||||
- Verify `date_start` and `date_stop` columns exist in all insights tables
|
||||
- Run schema initialization: `await self.db.initialize_schema()`
|
||||
|
||||
## Summary
|
||||
|
||||
The implementation successfully adds:
|
||||
- ✅ New day detection via `date_start` monitoring
|
||||
- ✅ Automatic yesterday data collection on day change
|
||||
- ✅ 12-hour update cycle for yesterday data
|
||||
- ✅ Token validation with fail-fast error handling
|
||||
- ✅ Clear logging for debugging and monitoring
|
||||
|
||||
This ensures complete and accurate historical data collection with minimal API usage.
|
||||
Reference in New Issue
Block a user