Jonas Linter c5fa92c4ec Fixed stuff
2025-11-10 13:07:08 +01:00
2025-10-21 10:54:21 +02:00
2025-11-10 13:07:08 +01:00
2025-10-21 11:55:14 +02:00
2025-10-21 10:54:21 +02:00
2025-11-10 13:07:08 +01:00
2025-11-04 11:42:18 +01:00
2025-11-10 11:27:54 +01:00
2025-10-28 15:43:24 +00:00
2025-11-04 12:02:05 +01:00
2025-11-10 11:27:54 +01:00

Meta API Grabber

Async data collection system for Meta's Marketing API with TimescaleDB time-series storage and dashboard support.

docker build . -t gitea.99tales.net/jonas/meta_grabber:lastest

Features

  • OAuth2 Authentication - Automated token generation flow
  • TimescaleDB Integration - Optimized time-series database for ad metrics
  • Scheduled Collection - Periodic data grabbing (every 2 hours recommended)
  • Metadata Caching - Smart caching of accounts, campaigns, and ad sets
  • Async/await architecture for efficient API calls
  • Conservative rate limiting (2s between requests, 1 concurrent request)
  • Multi-level insights - Account, campaign, and ad set data
  • Dashboard Ready - Includes Grafana setup for visualization
  • Continuous Aggregates - Pre-computed hourly/daily rollups
  • Data Compression - Automatic compression of older data

Quick Start

1. Install Dependencies

uv sync

2. Start TimescaleDB

docker-compose up -d

This starts:

  • TimescaleDB on port 5432 (PostgreSQL-compatible)
  • Grafana on port 3000 (for dashboards)

3. Configure Credentials

cp .env.example .env

Edit .env and add:

  • META_APP_ID and META_APP_SECRET from Meta for Developers
  • META_AD_ACCOUNT_ID from Meta Ads Manager (format: act_1234567890)
  • DATABASE_URL is pre-configured for local Docker setup

4. Get Long-Lived Access Token

OAuth2 Flow (Recommended - Gets 60-day token)

uv run python src/meta_api_grabber/auth.py

This will:

  1. Open OAuth2 authorization in your browser
  2. Exchange the code for a short-lived token
  3. Automatically exchange for a long-lived token (60 days)
  4. Save token to .env
  5. Save token metadata to .meta_token.json (for auto-refresh)

Manual Token (Not Recommended)

  • Get a token from Graph API Explorer
  • Add it to .env as META_ACCESS_TOKEN
  • Note: Manual tokens won't have auto-refresh capability

5. Start Scheduled Collection

uv run python src/meta_api_grabber/scheduled_grabber.py

This will:

  • Automatically refresh tokens before they expire (checks every cycle)
  • Collect data every 2 hours using the today date preset (recommended by Meta)
  • Cache metadata (accounts, campaigns, ad sets) twice daily
  • Store time-series data in TimescaleDB
  • Use upsert strategy to handle updates

Usage Modes

uv run python src/meta_api_grabber/scheduled_grabber.py
  • Runs continuously, collecting data every 2 hours
  • Stores data in TimescaleDB for dashboard visualization
  • Uses today date preset (recommended by Meta)
  • Caches metadata to reduce API calls

2. One-Time Data Export (JSON)

uv run python src/meta_api_grabber/insights_grabber.py
  • Fetches insights for the last 7 days
  • Saves to data/meta_insights_TIMESTAMP.json
  • Good for ad-hoc analysis or testing

3. OAuth2 Authentication

uv run python src/meta_api_grabber/auth.py
  • Interactive flow to get long-lived token (60 days)
  • Saves token to .env and metadata to .meta_token.json

4. Check Token Status

uv run python src/meta_api_grabber/token_manager.py
  • Shows token expiry and validity
  • Manually refresh if needed

Data Collected

Account Level

  • Impressions, clicks, spend
  • CPC, CPM, CTR
  • Reach, frequency
  • Actions and cost per action

Campaign Level (top 10)

  • Campaign name and ID
  • Impressions, clicks, spend
  • CTR, CPC

Ad Set Level (top 10)

  • Ad set name and ID
  • Impressions, clicks, spend
  • CTR, CPM

Database Schema

Time-Series Tables (Hypertables)

  • account_insights - Account-level metrics over time
  • campaign_insights - Campaign-level metrics over time
  • adset_insights - Ad set level metrics over time

Metadata Tables (Cached)

  • ad_accounts - Account metadata
  • campaigns - Campaign metadata
  • adsets - Ad set metadata

Continuous Aggregates

  • account_insights_hourly - Hourly rollups
  • account_insights_daily - Daily rollups

Features

  • Automatic partitioning by day (chunk_time_interval = 1 day)
  • Compression for data older than 7 days
  • Indexes on account_id, campaign_id, adset_id + time
  • Upsert strategy to handle duplicate/updated data

Dashboard Setup

Access Grafana

  1. Open http://localhost:3000
  2. Login with admin / admin
  3. Add TimescaleDB as data source:
    • Type: PostgreSQL
    • Host: timescaledb:5432
    • Database: meta_insights
    • User: meta_user
    • Password: meta_password
    • TLS/SSL Mode: disable

Example Queries

Latest Account Metrics:

SELECT * FROM latest_account_metrics WHERE account_id = 'act_your_id';

Campaign Performance (Last 24h):

SELECT * FROM campaign_performance_24h ORDER BY total_spend DESC;

Hourly Trend:

SELECT bucket, avg_impressions, avg_clicks, avg_spend
FROM account_insights_hourly
WHERE account_id = 'act_your_id'
  AND bucket >= NOW() - INTERVAL '7 days'
ORDER BY bucket;

Rate Limiting & Backoff

The system implements Meta's best practices for rate limiting:

Intelligent Rate Limiting

  • Monitors x-fb-ads-insights-throttle header from every API response
  • Tracks both app-level and account-level usage percentages
  • Auto-throttles when usage exceeds 75%
  • Progressive delays based on usage (75%: 2x, 85%: 3x, 90%: 5x, 95%: 10x)

Exponential Backoff

  • Automatic retries on rate limit errors (up to 5 attempts)
  • Exponential backoff: 2s → 4s → 8s → 16s → 32s
  • Max backoff: 5 minutes
  • Recognizes Meta error codes 17 and 80004

Conservative Defaults

  • 2 seconds base delay between API requests
  • 1 concurrent request at a time
  • Top 50 campaigns/adsets per collection
  • 2 hour intervals between scheduled collections

Best Practices Applied

Based on Meta's official recommendations:

  • Monitor rate limit headers
  • Pace queries with wait times
  • Implement backoff when approaching limits
  • Use date presets (e.g., 'today') instead of custom ranges
  • Limit query scope and metrics

Token Management

Automatic Token Refresh

The system automatically manages token lifecycle:

Token Types:

  • Short-lived tokens: Valid for 1-2 hours (obtained from OAuth)
  • Long-lived tokens: Valid for 60 days (automatically exchanged)

Auto-Refresh Logic:

  1. OAuth flow automatically exchanges for 60-day token
  2. Token metadata saved to .meta_token.json (includes expiry)
  3. Scheduled grabber checks token before each cycle
  4. Auto-refreshes when < 7 days remaining
  5. New token saved and API reinitialized seamlessly

Files Created:

  • .env - Contains META_ACCESS_TOKEN (updated on refresh)
  • .meta_token.json - Token metadata (expiry, issued_at, etc.)
  • Both files are gitignored for security

Manual Token Operations:

Check token status:

uv run python src/meta_api_grabber/token_manager.py

Re-authenticate (if token expires):

uv run python src/meta_api_grabber/auth.py

Long-Running Collection: The scheduled grabber runs indefinitely without manual intervention. Token refresh happens automatically every ~53 days (7 days before the 60-day expiry).

Description
No description provided
Readme 3.6 MiB
Languages
Python 99.4%
Dockerfile 0.6%