# Meta API Grabber Async data collection system for Meta's Marketing API with TimescaleDB time-series storage and dashboard support. ## Features - **OAuth2 Authentication** - Automated token generation flow - **TimescaleDB Integration** - Optimized time-series database for ad metrics - **Scheduled Collection** - Periodic data grabbing (every 2 hours recommended) - **Metadata Caching** - Smart caching of accounts, campaigns, and ad sets - **Async/await architecture** for efficient API calls - **Conservative rate limiting** (2s between requests, 1 concurrent request) - **Multi-level insights** - Account, campaign, and ad set data - **Dashboard Ready** - Includes Grafana setup for visualization - **Continuous Aggregates** - Pre-computed hourly/daily rollups - **Data Compression** - Automatic compression of older data ## Quick Start ### 1. Install Dependencies ```bash uv sync ``` ### 2. Start TimescaleDB ```bash docker-compose up -d ``` This starts: - **TimescaleDB** on port 5432 (PostgreSQL-compatible) - **Grafana** on port 3000 (for dashboards) ### 3. Configure Credentials ```bash cp .env.example .env ``` Edit `.env` and add: - **META_APP_ID** and **META_APP_SECRET** from [Meta for Developers](https://developers.facebook.com/) - **META_AD_ACCOUNT_ID** from Meta Ads Manager (format: `act_1234567890`) - **DATABASE_URL** is pre-configured for local Docker setup ### 4. Get Long-Lived Access Token **OAuth2 Flow (Recommended - Gets 60-day token)** ```bash uv run python src/meta_api_grabber/auth.py ``` This will: 1. Open OAuth2 authorization in your browser 2. Exchange the code for a short-lived token 3. **Automatically exchange for a long-lived token (60 days)** 4. Save token to `.env` 5. Save token metadata to `.meta_token.json` (for auto-refresh) **Manual Token (Not Recommended)** - Get a token from [Graph API Explorer](https://developers.facebook.com/tools/explorer/) - Add it to `.env` as `META_ACCESS_TOKEN` - Note: Manual tokens won't have auto-refresh capability ### 5. Start Scheduled Collection ```bash uv run python src/meta_api_grabber/scheduled_grabber.py ``` This will: - **Automatically refresh tokens** before they expire (checks every cycle) - Collect data every 2 hours using the `today` date preset (recommended by Meta) - Cache metadata (accounts, campaigns, ad sets) twice daily - Store time-series data in TimescaleDB - Use upsert strategy to handle updates ## Usage Modes ### 1. Scheduled Collection (Recommended for Dashboards) ```bash uv run python src/meta_api_grabber/scheduled_grabber.py ``` - Runs continuously, collecting data every 2 hours - Stores data in TimescaleDB for dashboard visualization - Uses `today` date preset (recommended by Meta) - Caches metadata to reduce API calls ### 2. One-Time Data Export (JSON) ```bash uv run python src/meta_api_grabber/insights_grabber.py ``` - Fetches insights for the last 7 days - Saves to `data/meta_insights_TIMESTAMP.json` - Good for ad-hoc analysis or testing ### 3. OAuth2 Authentication ```bash uv run python src/meta_api_grabber/auth.py ``` - Interactive flow to get long-lived token (60 days) - Saves token to `.env` and metadata to `.meta_token.json` ### 4. Check Token Status ```bash uv run python src/meta_api_grabber/token_manager.py ``` - Shows token expiry and validity - Manually refresh if needed ## Data Collected ### Account Level - Impressions, clicks, spend - CPC, CPM, CTR - Reach, frequency - Actions and cost per action ### Campaign Level (top 10) - Campaign name and ID - Impressions, clicks, spend - CTR, CPC ### Ad Set Level (top 10) - Ad set name and ID - Impressions, clicks, spend - CTR, CPM ## Database Schema ### Time-Series Tables (Hypertables) - **account_insights** - Account-level metrics over time - **campaign_insights** - Campaign-level metrics over time - **adset_insights** - Ad set level metrics over time ### Metadata Tables (Cached) - **ad_accounts** - Account metadata - **campaigns** - Campaign metadata - **adsets** - Ad set metadata ### Continuous Aggregates - **account_insights_hourly** - Hourly rollups - **account_insights_daily** - Daily rollups ### Features - **Automatic partitioning** by day (chunk_time_interval = 1 day) - **Compression** for data older than 7 days - **Indexes** on account_id, campaign_id, adset_id + time - **Upsert strategy** to handle duplicate/updated data ## Dashboard Setup ### Access Grafana 1. Open http://localhost:3000 2. Login with `admin` / `admin` 3. Add TimescaleDB as data source: - Type: PostgreSQL - Host: `timescaledb:5432` - Database: `meta_insights` - User: `meta_user` - Password: `meta_password` - TLS/SSL Mode: disable ### Example Queries **Latest Account Metrics:** ```sql SELECT * FROM latest_account_metrics WHERE account_id = 'act_your_id'; ``` **Campaign Performance (Last 24h):** ```sql SELECT * FROM campaign_performance_24h ORDER BY total_spend DESC; ``` **Hourly Trend:** ```sql SELECT bucket, avg_impressions, avg_clicks, avg_spend FROM account_insights_hourly WHERE account_id = 'act_your_id' AND bucket >= NOW() - INTERVAL '7 days' ORDER BY bucket; ``` ## Rate Limiting & Backoff The system implements Meta's best practices for rate limiting: ### Intelligent Rate Limiting - **Monitors `x-fb-ads-insights-throttle` header** from every API response - Tracks both app-level and account-level usage percentages - **Auto-throttles** when usage exceeds 75% - **Progressive delays** based on usage (75%: 2x, 85%: 3x, 90%: 5x, 95%: 10x) ### Exponential Backoff - **Automatic retries** on rate limit errors (up to 5 attempts) - **Exponential backoff**: 2s → 4s → 8s → 16s → 32s - Max backoff: 5 minutes - Recognizes Meta error codes 17 and 80004 ### Conservative Defaults - **2 seconds base delay** between API requests - **1 concurrent request** at a time - **Top 50 campaigns/adsets** per collection - **2 hour intervals** between scheduled collections ### Best Practices Applied Based on [Meta's official recommendations](https://developers.facebook.com/docs/marketing-api/insights/best-practices/): - ✅ Monitor rate limit headers - ✅ Pace queries with wait times - ✅ Implement backoff when approaching limits - ✅ Use date presets (e.g., 'today') instead of custom ranges - ✅ Limit query scope and metrics ## Token Management ### Automatic Token Refresh The system automatically manages token lifecycle: **Token Types:** - **Short-lived tokens**: Valid for 1-2 hours (obtained from OAuth) - **Long-lived tokens**: Valid for 60 days (automatically exchanged) **Auto-Refresh Logic:** 1. OAuth flow automatically exchanges for 60-day token 2. Token metadata saved to `.meta_token.json` (includes expiry) 3. Scheduled grabber checks token before each cycle 4. Auto-refreshes when < 7 days remaining 5. New token saved and API reinitialized seamlessly **Files Created:** - `.env` - Contains `META_ACCESS_TOKEN` (updated on refresh) - `.meta_token.json` - Token metadata (expiry, issued_at, etc.) - Both files are gitignored for security **Manual Token Operations:** Check token status: ```bash uv run python src/meta_api_grabber/token_manager.py ``` Re-authenticate (if token expires): ```bash uv run python src/meta_api_grabber/auth.py ``` **Long-Running Collection:** The scheduled grabber runs indefinitely without manual intervention. Token refresh happens automatically every ~53 days (7 days before the 60-day expiry).