Files
meta_api_grabber/YESTERDAY_DATA_FEATURE.md

7.7 KiB

Yesterday Data Collection Feature

Overview

This implementation extends the Meta API grabber to automatically collect yesterday's data using Meta's yesterday date preset. The system intelligently detects when a new day starts and manages yesterday data collection with the following logic:

Key Features

1. New Day Detection

  • Monitors the date_start field from "today" preset data
  • Detects when date_start changes to a new date
  • Triggers immediate fetch of yesterday's data when a new day is detected

2. Yesterday Data Collection Strategy

  • First fetch: When a new day is detected, fetch yesterday's data immediately
  • Periodic updates: Update yesterday's data every 12 hours
  • Rationale: Meta updates historical data, so refreshing ensures accuracy

3. Token Validation & Error Handling

  • Validates access token on every API request
  • Catches OAuth token errors (error codes 190, 102)
  • Raises ValueError with clear error messages for invalid tokens
  • Stops execution immediately if token is invalid (fail-fast approach)

Implementation Details

Modified Files

src/meta_api_grabber/scheduled_grabber.py

New State Tracking:

# Track current date for detecting day changes
self.current_date: Optional[date] = None

# Track when yesterday data was last fetched
self.yesterday_last_fetched: Optional[datetime] = None

New Methods:

  1. _check_new_day(today_date_start: Optional[date]) -> bool

    • Compares today's date_start with tracked current_date
    • Returns True if a new day has been detected
  2. _should_fetch_yesterday() -> bool

    • Returns True if yesterday data has never been fetched
    • Returns True if 12+ hours have passed since last fetch
  3. _validate_token()

    • Validates the access token using token_manager
    • Raises ValueError if token is invalid

Enhanced Methods:

  • grab_account_insights(): Now returns the date_start value and handles token errors
  • grab_campaign_insights(): Added token error handling
  • grab_adset_insights(): Added token error handling
  • refresh_token_if_needed(): Now raises ValueError on token validation failure
  • run_collection_cycle(): Implements the yesterday data collection logic

Data Flow

Collection Cycle Start
    ↓
Fetch "today" data for all accounts
    ↓
Extract date_start from today's data
    ↓
Check if new day detected
    ↓ (if yes)
Reset yesterday_last_fetched
    ↓
Check if should fetch yesterday
    ↓ (if yes)
Fetch "yesterday" data for all accounts
    ↓
Update yesterday_last_fetched timestamp
    ↓
Collection Cycle Complete

Testing Instructions

1. Local Testing (Without API Calls)

Check syntax and imports:

uv run python -m py_compile src/meta_api_grabber/scheduled_grabber.py

2. Docker Container Test

Start the TimescaleDB container:

docker-compose up -d timescaledb

Wait for the database to be healthy:

docker-compose ps

3. Test Token Validation

Intentionally use an invalid token to verify error handling:

# Set an invalid token in .env temporarily
META_ACCESS_TOKEN=invalid_token_for_testing

# Run the grabber - it should error out immediately
uv run python src/meta_api_grabber/scheduled_grabber.py

Expected output: Clear error message about invalid token

4. Production Test Run

Important: Before running with real token:

  1. Ensure .env has valid credentials:

    META_ACCESS_TOKEN=<your_valid_token>
    META_APP_ID=<your_app_id>
    META_APP_SECRET=<your_app_secret>
    DATABASE_URL=postgresql://meta_user:meta_password@localhost:5555/meta_insights
    
  2. Run a single cycle to verify:

    # This will run one collection cycle and exit
    uv run python -c "
    import asyncio
    from src.meta_api_grabber.scheduled_grabber import ScheduledInsightsGrabber
    
    async def test():
        grabber = ScheduledInsightsGrabber(max_accounts=1)
        async with grabber.db:
            await grabber.db.connect()
            await grabber.db.initialize_schema()
            await grabber.load_ad_accounts()
            await grabber.run_collection_cycle()
    
    asyncio.run(test())
    "
    

5. Monitor Yesterday Data Collection

The system will:

  • First run: Collect today's data, detect current date
  • Subsequent runs: Continue collecting today's data every 2 hours
  • When new day starts:
    • Log message: "📅 New day detected: YYYY-MM-DD -> YYYY-MM-DD"
    • Immediately fetch yesterday's data
  • Every 12 hours: Update yesterday's data

Check database to verify yesterday data is being stored:

-- Connect to TimescaleDB
psql -U meta_user -d meta_insights -h localhost -p 5555

-- Check yesterday data
SELECT
    time,
    account_id,
    date_preset,
    date_start,
    date_stop,
    impressions,
    spend
FROM account_insights
WHERE date_preset = 'yesterday'
ORDER BY time DESC
LIMIT 10;

Expected Behavior

Scenario 1: Fresh Start

  • Cycle 1: Fetch today's data, initialize current_date
  • Cycle 2: Fetch today's data, fetch yesterday's data (first time)
  • Cycle 3-6: Fetch today's data only
  • Cycle 7: Fetch today's data, update yesterday's data (12h passed)

Scenario 2: Day Change

  • Cycle N: Today is 2025-10-21, fetch today's data
  • Cycle N+1: Today is 2025-10-22 (new day!)
    • Log: "📅 New day detected: 2025-10-21 -> 2025-10-22"
    • Fetch today's data (2025-10-22)
    • Fetch yesterday's data (2025-10-21)

Scenario 3: Invalid Token

  • Any cycle with invalid token:
    • Error immediately with clear message
    • Stop execution (don't continue to other accounts)
    • Exit with non-zero status code

Deployment Notes

Docker Production Deployment

The implementation is designed to run continuously in a Docker container. If token authentication fails:

  1. Container will error out and stop
  2. This prevents unnecessary API calls with invalid credentials
  3. You'll see clear error messages in container logs
  4. Fix token issues before restarting

Monitoring Recommendations

  1. Check logs regularly for:

    • "📅 New day detected" messages
    • "Fetching yesterday's data" messages
    • Token validation errors
  2. Database monitoring:

    • Verify yesterday data is being updated
    • Check for gaps in date_start/date_stop values
  3. Token expiry:

    • System uses automatic token refresh (if enabled)
    • Monitor for token expiration warnings

Configuration Options

In src/meta_api_grabber/scheduled_grabber.py async_main():

grabber = ScheduledInsightsGrabber(
    max_accounts=3,  # Limit number of accounts for testing
    auto_refresh_token=True  # Enable automatic token refresh
)

await grabber.run_scheduled(
    interval_hours=2.0,  # How often to collect today's data
    refresh_metadata_every_n_cycles=12,  # How often to refresh metadata cache
)

Yesterday fetch interval is hardcoded to 12 hours in the _should_fetch_yesterday() method.

Troubleshooting

Yesterday data not being fetched

  • Check logs for "Fetching yesterday's data" messages
  • Verify date_start is being extracted from today's data
  • Check self.current_date is being initialized

Token errors not stopping execution

  • Ensure ValueError is being raised in grab methods
  • Check that except ValueError blocks are re-raising the exception

Database issues

  • Verify date_start and date_stop columns exist in all insights tables
  • Run schema initialization: await self.db.initialize_schema()

Summary

The implementation successfully adds:

  • New day detection via date_start monitoring
  • Automatic yesterday data collection on day change
  • 12-hour update cycle for yesterday data
  • Token validation with fail-fast error handling
  • Clear logging for debugging and monitoring

This ensures complete and accurate historical data collection with minimal API usage.