# Yesterday Data Collection Feature ## Overview This implementation extends the Meta API grabber to automatically collect yesterday's data using Meta's `yesterday` date preset. The system intelligently detects when a new day starts and manages yesterday data collection with the following logic: ## Key Features ### 1. New Day Detection - Monitors the `date_start` field from "today" preset data - Detects when `date_start` changes to a new date - Triggers immediate fetch of yesterday's data when a new day is detected ### 2. Yesterday Data Collection Strategy - **First fetch**: When a new day is detected, fetch yesterday's data immediately - **Periodic updates**: Update yesterday's data every 12 hours - **Rationale**: Meta updates historical data, so refreshing ensures accuracy ### 3. Token Validation & Error Handling - Validates access token on every API request - Catches OAuth token errors (error codes 190, 102) - Raises `ValueError` with clear error messages for invalid tokens - Stops execution immediately if token is invalid (fail-fast approach) ## Implementation Details ### Modified Files #### `src/meta_api_grabber/scheduled_grabber.py` **New State Tracking:** ```python # Track current date for detecting day changes self.current_date: Optional[date] = None # Track when yesterday data was last fetched self.yesterday_last_fetched: Optional[datetime] = None ``` **New Methods:** 1. `_check_new_day(today_date_start: Optional[date]) -> bool` - Compares today's date_start with tracked current_date - Returns True if a new day has been detected 2. `_should_fetch_yesterday() -> bool` - Returns True if yesterday data has never been fetched - Returns True if 12+ hours have passed since last fetch 3. `_validate_token()` - Validates the access token using token_manager - Raises ValueError if token is invalid **Enhanced Methods:** - `grab_account_insights()`: Now returns the `date_start` value and handles token errors - `grab_campaign_insights()`: Added token error handling - `grab_adset_insights()`: Added token error handling - `refresh_token_if_needed()`: Now raises ValueError on token validation failure - `run_collection_cycle()`: Implements the yesterday data collection logic ### Data Flow ``` Collection Cycle Start ↓ Fetch "today" data for all accounts ↓ Extract date_start from today's data ↓ Check if new day detected ↓ (if yes) Reset yesterday_last_fetched ↓ Check if should fetch yesterday ↓ (if yes) Fetch "yesterday" data for all accounts ↓ Update yesterday_last_fetched timestamp ↓ Collection Cycle Complete ``` ## Testing Instructions ### 1. Local Testing (Without API Calls) Check syntax and imports: ```bash uv run python -m py_compile src/meta_api_grabber/scheduled_grabber.py ``` ### 2. Docker Container Test Start the TimescaleDB container: ```bash docker-compose up -d timescaledb ``` Wait for the database to be healthy: ```bash docker-compose ps ``` ### 3. Test Token Validation Intentionally use an invalid token to verify error handling: ```bash # Set an invalid token in .env temporarily META_ACCESS_TOKEN=invalid_token_for_testing # Run the grabber - it should error out immediately uv run python src/meta_api_grabber/scheduled_grabber.py ``` Expected output: Clear error message about invalid token ### 4. Production Test Run **Important**: Before running with real token: 1. Ensure `.env` has valid credentials: ``` META_ACCESS_TOKEN= META_APP_ID= META_APP_SECRET= DATABASE_URL=postgresql://meta_user:meta_password@localhost:5555/meta_insights ``` 2. Run a single cycle to verify: ```bash # This will run one collection cycle and exit uv run python -c " import asyncio from src.meta_api_grabber.scheduled_grabber import ScheduledInsightsGrabber async def test(): grabber = ScheduledInsightsGrabber(max_accounts=1) async with grabber.db: await grabber.db.connect() await grabber.db.initialize_schema() await grabber.load_ad_accounts() await grabber.run_collection_cycle() asyncio.run(test()) " ``` ### 5. Monitor Yesterday Data Collection The system will: - **First run**: Collect today's data, detect current date - **Subsequent runs**: Continue collecting today's data every 2 hours - **When new day starts**: - Log message: "📅 New day detected: YYYY-MM-DD -> YYYY-MM-DD" - Immediately fetch yesterday's data - **Every 12 hours**: Update yesterday's data Check database to verify yesterday data is being stored: ```sql -- Connect to TimescaleDB psql -U meta_user -d meta_insights -h localhost -p 5555 -- Check yesterday data SELECT time, account_id, date_preset, date_start, date_stop, impressions, spend FROM account_insights WHERE date_preset = 'yesterday' ORDER BY time DESC LIMIT 10; ``` ## Expected Behavior ### Scenario 1: Fresh Start - Cycle 1: Fetch today's data, initialize current_date - Cycle 2: Fetch today's data, fetch yesterday's data (first time) - Cycle 3-6: Fetch today's data only - Cycle 7: Fetch today's data, update yesterday's data (12h passed) ### Scenario 2: Day Change - Cycle N: Today is 2025-10-21, fetch today's data - Cycle N+1: Today is 2025-10-22 (new day!) - Log: "📅 New day detected: 2025-10-21 -> 2025-10-22" - Fetch today's data (2025-10-22) - Fetch yesterday's data (2025-10-21) ### Scenario 3: Invalid Token - Any cycle with invalid token: - Error immediately with clear message - Stop execution (don't continue to other accounts) - Exit with non-zero status code ## Deployment Notes ### Docker Production Deployment The implementation is designed to run continuously in a Docker container. If token authentication fails: 1. Container will error out and stop 2. This prevents unnecessary API calls with invalid credentials 3. You'll see clear error messages in container logs 4. Fix token issues before restarting ### Monitoring Recommendations 1. **Check logs regularly** for: - "📅 New day detected" messages - "Fetching yesterday's data" messages - Token validation errors 2. **Database monitoring**: - Verify yesterday data is being updated - Check for gaps in date_start/date_stop values 3. **Token expiry**: - System uses automatic token refresh (if enabled) - Monitor for token expiration warnings ## Configuration Options In `src/meta_api_grabber/scheduled_grabber.py` `async_main()`: ```python grabber = ScheduledInsightsGrabber( max_accounts=3, # Limit number of accounts for testing auto_refresh_token=True # Enable automatic token refresh ) await grabber.run_scheduled( interval_hours=2.0, # How often to collect today's data refresh_metadata_every_n_cycles=12, # How often to refresh metadata cache ) ``` **Yesterday fetch interval is hardcoded to 12 hours** in the `_should_fetch_yesterday()` method. ## Troubleshooting ### Yesterday data not being fetched - Check logs for "Fetching yesterday's data" messages - Verify `date_start` is being extracted from today's data - Check `self.current_date` is being initialized ### Token errors not stopping execution - Ensure `ValueError` is being raised in grab methods - Check that `except ValueError` blocks are re-raising the exception ### Database issues - Verify `date_start` and `date_stop` columns exist in all insights tables - Run schema initialization: `await self.db.initialize_schema()` ## Summary The implementation successfully adds: - ✅ New day detection via `date_start` monitoring - ✅ Automatic yesterday data collection on day change - ✅ 12-hour update cycle for yesterday data - ✅ Token validation with fail-fast error handling - ✅ Clear logging for debugging and monitoring This ensures complete and accurate historical data collection with minimal API usage.