7.7 KiB
Yesterday Data Collection Feature
Overview
This implementation extends the Meta API grabber to automatically collect yesterday's data using Meta's yesterday date preset. The system intelligently detects when a new day starts and manages yesterday data collection with the following logic:
Key Features
1. New Day Detection
- Monitors the
date_startfield from "today" preset data - Detects when
date_startchanges to a new date - Triggers immediate fetch of yesterday's data when a new day is detected
2. Yesterday Data Collection Strategy
- First fetch: When a new day is detected, fetch yesterday's data immediately
- Periodic updates: Update yesterday's data every 12 hours
- Rationale: Meta updates historical data, so refreshing ensures accuracy
3. Token Validation & Error Handling
- Validates access token on every API request
- Catches OAuth token errors (error codes 190, 102)
- Raises
ValueErrorwith clear error messages for invalid tokens - Stops execution immediately if token is invalid (fail-fast approach)
Implementation Details
Modified Files
src/meta_api_grabber/scheduled_grabber.py
New State Tracking:
# Track current date for detecting day changes
self.current_date: Optional[date] = None
# Track when yesterday data was last fetched
self.yesterday_last_fetched: Optional[datetime] = None
New Methods:
-
_check_new_day(today_date_start: Optional[date]) -> bool- Compares today's date_start with tracked current_date
- Returns True if a new day has been detected
-
_should_fetch_yesterday() -> bool- Returns True if yesterday data has never been fetched
- Returns True if 12+ hours have passed since last fetch
-
_validate_token()- Validates the access token using token_manager
- Raises ValueError if token is invalid
Enhanced Methods:
grab_account_insights(): Now returns thedate_startvalue and handles token errorsgrab_campaign_insights(): Added token error handlinggrab_adset_insights(): Added token error handlingrefresh_token_if_needed(): Now raises ValueError on token validation failurerun_collection_cycle(): Implements the yesterday data collection logic
Data Flow
Collection Cycle Start
↓
Fetch "today" data for all accounts
↓
Extract date_start from today's data
↓
Check if new day detected
↓ (if yes)
Reset yesterday_last_fetched
↓
Check if should fetch yesterday
↓ (if yes)
Fetch "yesterday" data for all accounts
↓
Update yesterday_last_fetched timestamp
↓
Collection Cycle Complete
Testing Instructions
1. Local Testing (Without API Calls)
Check syntax and imports:
uv run python -m py_compile src/meta_api_grabber/scheduled_grabber.py
2. Docker Container Test
Start the TimescaleDB container:
docker-compose up -d timescaledb
Wait for the database to be healthy:
docker-compose ps
3. Test Token Validation
Intentionally use an invalid token to verify error handling:
# Set an invalid token in .env temporarily
META_ACCESS_TOKEN=invalid_token_for_testing
# Run the grabber - it should error out immediately
uv run python src/meta_api_grabber/scheduled_grabber.py
Expected output: Clear error message about invalid token
4. Production Test Run
Important: Before running with real token:
-
Ensure
.envhas valid credentials:META_ACCESS_TOKEN=<your_valid_token> META_APP_ID=<your_app_id> META_APP_SECRET=<your_app_secret> DATABASE_URL=postgresql://meta_user:meta_password@localhost:5555/meta_insights -
Run a single cycle to verify:
# This will run one collection cycle and exit uv run python -c " import asyncio from src.meta_api_grabber.scheduled_grabber import ScheduledInsightsGrabber async def test(): grabber = ScheduledInsightsGrabber(max_accounts=1) async with grabber.db: await grabber.db.connect() await grabber.db.initialize_schema() await grabber.load_ad_accounts() await grabber.run_collection_cycle() asyncio.run(test()) "
5. Monitor Yesterday Data Collection
The system will:
- First run: Collect today's data, detect current date
- Subsequent runs: Continue collecting today's data every 2 hours
- When new day starts:
- Log message: "📅 New day detected: YYYY-MM-DD -> YYYY-MM-DD"
- Immediately fetch yesterday's data
- Every 12 hours: Update yesterday's data
Check database to verify yesterday data is being stored:
-- Connect to TimescaleDB
psql -U meta_user -d meta_insights -h localhost -p 5555
-- Check yesterday data
SELECT
time,
account_id,
date_preset,
date_start,
date_stop,
impressions,
spend
FROM account_insights
WHERE date_preset = 'yesterday'
ORDER BY time DESC
LIMIT 10;
Expected Behavior
Scenario 1: Fresh Start
- Cycle 1: Fetch today's data, initialize current_date
- Cycle 2: Fetch today's data, fetch yesterday's data (first time)
- Cycle 3-6: Fetch today's data only
- Cycle 7: Fetch today's data, update yesterday's data (12h passed)
Scenario 2: Day Change
- Cycle N: Today is 2025-10-21, fetch today's data
- Cycle N+1: Today is 2025-10-22 (new day!)
- Log: "📅 New day detected: 2025-10-21 -> 2025-10-22"
- Fetch today's data (2025-10-22)
- Fetch yesterday's data (2025-10-21)
Scenario 3: Invalid Token
- Any cycle with invalid token:
- Error immediately with clear message
- Stop execution (don't continue to other accounts)
- Exit with non-zero status code
Deployment Notes
Docker Production Deployment
The implementation is designed to run continuously in a Docker container. If token authentication fails:
- Container will error out and stop
- This prevents unnecessary API calls with invalid credentials
- You'll see clear error messages in container logs
- Fix token issues before restarting
Monitoring Recommendations
-
Check logs regularly for:
- "📅 New day detected" messages
- "Fetching yesterday's data" messages
- Token validation errors
-
Database monitoring:
- Verify yesterday data is being updated
- Check for gaps in date_start/date_stop values
-
Token expiry:
- System uses automatic token refresh (if enabled)
- Monitor for token expiration warnings
Configuration Options
In src/meta_api_grabber/scheduled_grabber.py async_main():
grabber = ScheduledInsightsGrabber(
max_accounts=3, # Limit number of accounts for testing
auto_refresh_token=True # Enable automatic token refresh
)
await grabber.run_scheduled(
interval_hours=2.0, # How often to collect today's data
refresh_metadata_every_n_cycles=12, # How often to refresh metadata cache
)
Yesterday fetch interval is hardcoded to 12 hours in the _should_fetch_yesterday() method.
Troubleshooting
Yesterday data not being fetched
- Check logs for "Fetching yesterday's data" messages
- Verify
date_startis being extracted from today's data - Check
self.current_dateis being initialized
Token errors not stopping execution
- Ensure
ValueErroris being raised in grab methods - Check that
except ValueErrorblocks are re-raising the exception
Database issues
- Verify
date_startanddate_stopcolumns exist in all insights tables - Run schema initialization:
await self.db.initialize_schema()
Summary
The implementation successfully adds:
- ✅ New day detection via
date_startmonitoring - ✅ Automatic yesterday data collection on day change
- ✅ 12-hour update cycle for yesterday data
- ✅ Token validation with fail-fast error handling
- ✅ Clear logging for debugging and monitoring
This ensures complete and accurate historical data collection with minimal API usage.