248 lines
7.2 KiB
Markdown
248 lines
7.2 KiB
Markdown
# Meta API Grabber
|
|
|
|
Async data collection system for Meta's Marketing API with TimescaleDB time-series storage and dashboard support.
|
|
|
|
## Features
|
|
|
|
- **OAuth2 Authentication** - Automated token generation flow
|
|
- **TimescaleDB Integration** - Optimized time-series database for ad metrics
|
|
- **Scheduled Collection** - Periodic data grabbing (every 2 hours recommended)
|
|
- **Metadata Caching** - Smart caching of accounts, campaigns, and ad sets
|
|
- **Async/await architecture** for efficient API calls
|
|
- **Conservative rate limiting** (2s between requests, 1 concurrent request)
|
|
- **Multi-level insights** - Account, campaign, and ad set data
|
|
- **Dashboard Ready** - Includes Grafana setup for visualization
|
|
- **Continuous Aggregates** - Pre-computed hourly/daily rollups
|
|
- **Data Compression** - Automatic compression of older data
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install Dependencies
|
|
```bash
|
|
uv sync
|
|
```
|
|
|
|
### 2. Start TimescaleDB
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
This starts:
|
|
- **TimescaleDB** on port 5432 (PostgreSQL-compatible)
|
|
- **Grafana** on port 3000 (for dashboards)
|
|
|
|
### 3. Configure Credentials
|
|
```bash
|
|
cp .env.example .env
|
|
```
|
|
|
|
Edit `.env` and add:
|
|
- **META_APP_ID** and **META_APP_SECRET** from [Meta for Developers](https://developers.facebook.com/)
|
|
- **META_AD_ACCOUNT_ID** from Meta Ads Manager (format: `act_1234567890`)
|
|
- **DATABASE_URL** is pre-configured for local Docker setup
|
|
|
|
### 4. Get Long-Lived Access Token
|
|
|
|
**OAuth2 Flow (Recommended - Gets 60-day token)**
|
|
```bash
|
|
uv run python src/meta_api_grabber/auth.py
|
|
```
|
|
|
|
This will:
|
|
1. Open OAuth2 authorization in your browser
|
|
2. Exchange the code for a short-lived token
|
|
3. **Automatically exchange for a long-lived token (60 days)**
|
|
4. Save token to `.env`
|
|
5. Save token metadata to `.meta_token.json` (for auto-refresh)
|
|
|
|
**Manual Token (Not Recommended)**
|
|
- Get a token from [Graph API Explorer](https://developers.facebook.com/tools/explorer/)
|
|
- Add it to `.env` as `META_ACCESS_TOKEN`
|
|
- Note: Manual tokens won't have auto-refresh capability
|
|
|
|
### 5. Start Scheduled Collection
|
|
```bash
|
|
uv run python src/meta_api_grabber/scheduled_grabber.py
|
|
```
|
|
|
|
This will:
|
|
- **Automatically refresh tokens** before they expire (checks every cycle)
|
|
- Collect data every 2 hours using the `today` date preset (recommended by Meta)
|
|
- Cache metadata (accounts, campaigns, ad sets) twice daily
|
|
- Store time-series data in TimescaleDB
|
|
- Use upsert strategy to handle updates
|
|
|
|
## Usage Modes
|
|
|
|
### 1. Scheduled Collection (Recommended for Dashboards)
|
|
```bash
|
|
uv run python src/meta_api_grabber/scheduled_grabber.py
|
|
```
|
|
- Runs continuously, collecting data every 2 hours
|
|
- Stores data in TimescaleDB for dashboard visualization
|
|
- Uses `today` date preset (recommended by Meta)
|
|
- Caches metadata to reduce API calls
|
|
|
|
### 2. One-Time Data Export (JSON)
|
|
```bash
|
|
uv run python src/meta_api_grabber/insights_grabber.py
|
|
```
|
|
- Fetches insights for the last 7 days
|
|
- Saves to `data/meta_insights_TIMESTAMP.json`
|
|
- Good for ad-hoc analysis or testing
|
|
|
|
### 3. OAuth2 Authentication
|
|
```bash
|
|
uv run python src/meta_api_grabber/auth.py
|
|
```
|
|
- Interactive flow to get long-lived token (60 days)
|
|
- Saves token to `.env` and metadata to `.meta_token.json`
|
|
|
|
### 4. Check Token Status
|
|
```bash
|
|
uv run python src/meta_api_grabber/token_manager.py
|
|
```
|
|
- Shows token expiry and validity
|
|
- Manually refresh if needed
|
|
|
|
## Data Collected
|
|
|
|
### Account Level
|
|
- Impressions, clicks, spend
|
|
- CPC, CPM, CTR
|
|
- Reach, frequency
|
|
- Actions and cost per action
|
|
|
|
### Campaign Level (top 10)
|
|
- Campaign name and ID
|
|
- Impressions, clicks, spend
|
|
- CTR, CPC
|
|
|
|
### Ad Set Level (top 10)
|
|
- Ad set name and ID
|
|
- Impressions, clicks, spend
|
|
- CTR, CPM
|
|
|
|
## Database Schema
|
|
|
|
### Time-Series Tables (Hypertables)
|
|
- **account_insights** - Account-level metrics over time
|
|
- **campaign_insights** - Campaign-level metrics over time
|
|
- **adset_insights** - Ad set level metrics over time
|
|
|
|
### Metadata Tables (Cached)
|
|
- **ad_accounts** - Account metadata
|
|
- **campaigns** - Campaign metadata
|
|
- **adsets** - Ad set metadata
|
|
|
|
### Continuous Aggregates
|
|
- **account_insights_hourly** - Hourly rollups
|
|
- **account_insights_daily** - Daily rollups
|
|
|
|
### Features
|
|
- **Automatic partitioning** by day (chunk_time_interval = 1 day)
|
|
- **Compression** for data older than 7 days
|
|
- **Indexes** on account_id, campaign_id, adset_id + time
|
|
- **Upsert strategy** to handle duplicate/updated data
|
|
|
|
## Dashboard Setup
|
|
|
|
### Access Grafana
|
|
1. Open http://localhost:3000
|
|
2. Login with `admin` / `admin`
|
|
3. Add TimescaleDB as data source:
|
|
- Type: PostgreSQL
|
|
- Host: `timescaledb:5432`
|
|
- Database: `meta_insights`
|
|
- User: `meta_user`
|
|
- Password: `meta_password`
|
|
- TLS/SSL Mode: disable
|
|
|
|
### Example Queries
|
|
|
|
**Latest Account Metrics:**
|
|
```sql
|
|
SELECT * FROM latest_account_metrics WHERE account_id = 'act_your_id';
|
|
```
|
|
|
|
**Campaign Performance (Last 24h):**
|
|
```sql
|
|
SELECT * FROM campaign_performance_24h ORDER BY total_spend DESC;
|
|
```
|
|
|
|
**Hourly Trend:**
|
|
```sql
|
|
SELECT bucket, avg_impressions, avg_clicks, avg_spend
|
|
FROM account_insights_hourly
|
|
WHERE account_id = 'act_your_id'
|
|
AND bucket >= NOW() - INTERVAL '7 days'
|
|
ORDER BY bucket;
|
|
```
|
|
|
|
## Rate Limiting & Backoff
|
|
|
|
The system implements Meta's best practices for rate limiting:
|
|
|
|
### Intelligent Rate Limiting
|
|
- **Monitors `x-fb-ads-insights-throttle` header** from every API response
|
|
- Tracks both app-level and account-level usage percentages
|
|
- **Auto-throttles** when usage exceeds 75%
|
|
- **Progressive delays** based on usage (75%: 2x, 85%: 3x, 90%: 5x, 95%: 10x)
|
|
|
|
### Exponential Backoff
|
|
- **Automatic retries** on rate limit errors (up to 5 attempts)
|
|
- **Exponential backoff**: 2s → 4s → 8s → 16s → 32s
|
|
- Max backoff: 5 minutes
|
|
- Recognizes Meta error codes 17 and 80004
|
|
|
|
### Conservative Defaults
|
|
- **2 seconds base delay** between API requests
|
|
- **1 concurrent request** at a time
|
|
- **Top 50 campaigns/adsets** per collection
|
|
- **2 hour intervals** between scheduled collections
|
|
|
|
### Best Practices Applied
|
|
Based on [Meta's official recommendations](https://developers.facebook.com/docs/marketing-api/insights/best-practices/):
|
|
- ✅ Monitor rate limit headers
|
|
- ✅ Pace queries with wait times
|
|
- ✅ Implement backoff when approaching limits
|
|
- ✅ Use date presets (e.g., 'today') instead of custom ranges
|
|
- ✅ Limit query scope and metrics
|
|
|
|
## Token Management
|
|
|
|
### Automatic Token Refresh
|
|
|
|
The system automatically manages token lifecycle:
|
|
|
|
**Token Types:**
|
|
- **Short-lived tokens**: Valid for 1-2 hours (obtained from OAuth)
|
|
- **Long-lived tokens**: Valid for 60 days (automatically exchanged)
|
|
|
|
**Auto-Refresh Logic:**
|
|
1. OAuth flow automatically exchanges for 60-day token
|
|
2. Token metadata saved to `.meta_token.json` (includes expiry)
|
|
3. Scheduled grabber checks token before each cycle
|
|
4. Auto-refreshes when < 7 days remaining
|
|
5. New token saved and API reinitialized seamlessly
|
|
|
|
**Files Created:**
|
|
- `.env` - Contains `META_ACCESS_TOKEN` (updated on refresh)
|
|
- `.meta_token.json` - Token metadata (expiry, issued_at, etc.)
|
|
- Both files are gitignored for security
|
|
|
|
**Manual Token Operations:**
|
|
|
|
Check token status:
|
|
```bash
|
|
uv run python src/meta_api_grabber/token_manager.py
|
|
```
|
|
|
|
Re-authenticate (if token expires):
|
|
```bash
|
|
uv run python src/meta_api_grabber/auth.py
|
|
```
|
|
|
|
**Long-Running Collection:**
|
|
The scheduled grabber runs indefinitely without manual intervention. Token refresh happens automatically every ~53 days (7 days before the 60-day expiry).
|