# Meta API Grabber Async data collection system for Meta's Marketing API with TimescaleDB time-series storage and dashboard support. ## Features - **OAuth2 Authentication** - Automated token generation flow - **TimescaleDB Integration** - Optimized time-series database for ad metrics - **Scheduled Collection** - Periodic data grabbing (every 2 hours recommended) - **Metadata Caching** - Smart caching of accounts, campaigns, and ad sets - **Async/await architecture** for efficient API calls - **Conservative rate limiting** (2s between requests, 1 concurrent request) - **Multi-level insights** - Account, campaign, and ad set data - **Dashboard Ready** - Includes Grafana setup for visualization - **Continuous Aggregates** - Pre-computed hourly/daily rollups - **Data Compression** - Automatic compression of older data ## Quick Start ### 1. Install Dependencies ```bash uv sync ``` ### 2. Start TimescaleDB ```bash docker-compose up -d ``` This starts: - **TimescaleDB** on port 5432 (PostgreSQL-compatible) - **Grafana** on port 3000 (for dashboards) ### 3. Configure Credentials ```bash cp .env.example .env ``` Edit `.env` and add: - **META_APP_ID** and **META_APP_SECRET** from [Meta for Developers](https://developers.facebook.com/) - **META_AD_ACCOUNT_ID** from Meta Ads Manager (format: `act_1234567890`) - **DATABASE_URL** is pre-configured for local Docker setup ### 4. Get Access Token **Option A: OAuth2 Flow (Recommended)** ```bash uv run python src/meta_api_grabber/auth.py ``` Follow the prompts to authorize and save your token. **Option B: Manual Token** - Get a token from [Graph API Explorer](https://developers.facebook.com/tools/explorer/) - Add it to `.env` as `META_ACCESS_TOKEN` ### 5. Start Scheduled Collection ```bash uv run python src/meta_api_grabber/scheduled_grabber.py ``` This will: - Collect data every 2 hours using the `today` date preset (recommended by Meta) - Cache metadata (accounts, campaigns, ad sets) twice daily - Store time-series data in TimescaleDB - Use upsert strategy to handle updates ## Usage Modes ### 1. Scheduled Collection (Recommended for Dashboards) ```bash uv run python src/meta_api_grabber/scheduled_grabber.py ``` - Runs continuously, collecting data every 2 hours - Stores data in TimescaleDB for dashboard visualization - Uses `today` date preset (recommended by Meta) - Caches metadata to reduce API calls ### 2. One-Time Data Export (JSON) ```bash uv run python src/meta_api_grabber/insights_grabber.py ``` - Fetches insights for the last 7 days - Saves to `data/meta_insights_TIMESTAMP.json` - Good for ad-hoc analysis or testing ### 3. OAuth2 Authentication ```bash uv run python src/meta_api_grabber/auth.py ``` - Interactive flow to get access token - Saves token to `.env` automatically ## Data Collected ### Account Level - Impressions, clicks, spend - CPC, CPM, CTR - Reach, frequency - Actions and cost per action ### Campaign Level (top 10) - Campaign name and ID - Impressions, clicks, spend - CTR, CPC ### Ad Set Level (top 10) - Ad set name and ID - Impressions, clicks, spend - CTR, CPM ## Database Schema ### Time-Series Tables (Hypertables) - **account_insights** - Account-level metrics over time - **campaign_insights** - Campaign-level metrics over time - **adset_insights** - Ad set level metrics over time ### Metadata Tables (Cached) - **ad_accounts** - Account metadata - **campaigns** - Campaign metadata - **adsets** - Ad set metadata ### Continuous Aggregates - **account_insights_hourly** - Hourly rollups - **account_insights_daily** - Daily rollups ### Features - **Automatic partitioning** by day (chunk_time_interval = 1 day) - **Compression** for data older than 7 days - **Indexes** on account_id, campaign_id, adset_id + time - **Upsert strategy** to handle duplicate/updated data ## Dashboard Setup ### Access Grafana 1. Open http://localhost:3000 2. Login with `admin` / `admin` 3. Add TimescaleDB as data source: - Type: PostgreSQL - Host: `timescaledb:5432` - Database: `meta_insights` - User: `meta_user` - Password: `meta_password` - TLS/SSL Mode: disable ### Example Queries **Latest Account Metrics:** ```sql SELECT * FROM latest_account_metrics WHERE account_id = 'act_your_id'; ``` **Campaign Performance (Last 24h):** ```sql SELECT * FROM campaign_performance_24h ORDER BY total_spend DESC; ``` **Hourly Trend:** ```sql SELECT bucket, avg_impressions, avg_clicks, avg_spend FROM account_insights_hourly WHERE account_id = 'act_your_id' AND bucket >= NOW() - INTERVAL '7 days' ORDER BY bucket; ``` ## Rate Limiting The system is configured to be very conservative: - **2 seconds delay** between API requests - **Only 1 concurrent request** at a time - **Top 50 campaigns/adsets** per collection - **2 hour intervals** between collections This ensures you stay well within Meta's API rate limits.