Updated data collection system

This commit is contained in:
Jonas Linter
2025-10-21 11:55:14 +02:00
parent 0d754846ce
commit 6ba8a0dba2
9 changed files with 1418 additions and 57 deletions

181
README.md
View File

@@ -1,70 +1,94 @@
# Meta API Grabber
Async script to grab ad insights data from Meta's Marketing API with conservative rate limiting.
Async data collection system for Meta's Marketing API with TimescaleDB time-series storage and dashboard support.
## Features
- **OAuth2 Authentication** - Automated token generation flow
- **TimescaleDB Integration** - Optimized time-series database for ad metrics
- **Scheduled Collection** - Periodic data grabbing (every 2 hours recommended)
- **Metadata Caching** - Smart caching of accounts, campaigns, and ad sets
- **Async/await architecture** for efficient API calls
- **Conservative rate limiting** (2s between requests, 1 concurrent request)
- **Multi-level insights** - Account, campaign, and ad set data
- **JSON output** with timestamps
- **Configurable date ranges**
- **Dashboard Ready** - Includes Grafana setup for visualization
- **Continuous Aggregates** - Pre-computed hourly/daily rollups
- **Data Compression** - Automatic compression of older data
## Setup
## Quick Start
1. Install dependencies using uv:
### 1. Install Dependencies
```bash
uv sync
```
2. Configure your Meta API credentials:
### 2. Start TimescaleDB
```bash
docker-compose up -d
```
This starts:
- **TimescaleDB** on port 5432 (PostgreSQL-compatible)
- **Grafana** on port 3000 (for dashboards)
### 3. Configure Credentials
```bash
cp .env.example .env
```
3. Edit `.env` and add your App ID, App Secret, and Ad Account ID:
- Get your App credentials from [Meta for Developers](https://developers.facebook.com/)
- Find your ad account ID in Meta Ads Manager (format: `act_1234567890`)
Edit `.env` and add:
- **META_APP_ID** and **META_APP_SECRET** from [Meta for Developers](https://developers.facebook.com/)
- **META_AD_ACCOUNT_ID** from Meta Ads Manager (format: `act_1234567890`)
- **DATABASE_URL** is pre-configured for local Docker setup
4. Get an access token (choose one method):
### 4. Get Access Token
**Option A: OAuth2 Flow (Recommended)**
```bash
uv run python src/meta_api_grabber/auth.py
```
This will:
- Generate an authorization URL
- Walk you through the OAuth2 flow
- Offer to save the access token to `.env` automatically
**Option B: Manual Token**
- Get a token from [Graph API Explorer](https://developers.facebook.com/tools/explorer/)
- Add it manually to `.env` as `META_ACCESS_TOKEN`
## Usage
Run the insights grabber:
```bash
uv run python src/meta_api_grabber/insights_grabber.py
```
This will:
- Fetch insights for the last 7 days
- Grab account-level, campaign-level (top 10), and ad set-level (top 10) data
- Save results to `data/meta_insights_TIMESTAMP.json`
### Authentication Scripts
**Get OAuth2 Access Token:**
**Option A: OAuth2 Flow (Recommended)**
```bash
uv run python src/meta_api_grabber/auth.py
```
Follow the prompts to authorize and save your token.
**Grab Insights Data:**
**Option B: Manual Token**
- Get a token from [Graph API Explorer](https://developers.facebook.com/tools/explorer/)
- Add it to `.env` as `META_ACCESS_TOKEN`
### 5. Start Scheduled Collection
```bash
uv run python src/meta_api_grabber/scheduled_grabber.py
```
This will:
- Collect data every 2 hours using the `today` date preset (recommended by Meta)
- Cache metadata (accounts, campaigns, ad sets) twice daily
- Store time-series data in TimescaleDB
- Use upsert strategy to handle updates
## Usage Modes
### 1. Scheduled Collection (Recommended for Dashboards)
```bash
uv run python src/meta_api_grabber/scheduled_grabber.py
```
- Runs continuously, collecting data every 2 hours
- Stores data in TimescaleDB for dashboard visualization
- Uses `today` date preset (recommended by Meta)
- Caches metadata to reduce API calls
### 2. One-Time Data Export (JSON)
```bash
uv run python src/meta_api_grabber/insights_grabber.py
```
- Fetches insights for the last 7 days
- Saves to `data/meta_insights_TIMESTAMP.json`
- Good for ad-hoc analysis or testing
### 3. OAuth2 Authentication
```bash
uv run python src/meta_api_grabber/auth.py
```
- Interactive flow to get access token
- Saves token to `.env` automatically
## Data Collected
@@ -84,23 +108,68 @@ uv run python src/meta_api_grabber/insights_grabber.py
- Impressions, clicks, spend
- CTR, CPM
## Database Schema
### Time-Series Tables (Hypertables)
- **account_insights** - Account-level metrics over time
- **campaign_insights** - Campaign-level metrics over time
- **adset_insights** - Ad set level metrics over time
### Metadata Tables (Cached)
- **ad_accounts** - Account metadata
- **campaigns** - Campaign metadata
- **adsets** - Ad set metadata
### Continuous Aggregates
- **account_insights_hourly** - Hourly rollups
- **account_insights_daily** - Daily rollups
### Features
- **Automatic partitioning** by day (chunk_time_interval = 1 day)
- **Compression** for data older than 7 days
- **Indexes** on account_id, campaign_id, adset_id + time
- **Upsert strategy** to handle duplicate/updated data
## Dashboard Setup
### Access Grafana
1. Open http://localhost:3000
2. Login with `admin` / `admin`
3. Add TimescaleDB as data source:
- Type: PostgreSQL
- Host: `timescaledb:5432`
- Database: `meta_insights`
- User: `meta_user`
- Password: `meta_password`
- TLS/SSL Mode: disable
### Example Queries
**Latest Account Metrics:**
```sql
SELECT * FROM latest_account_metrics WHERE account_id = 'act_your_id';
```
**Campaign Performance (Last 24h):**
```sql
SELECT * FROM campaign_performance_24h ORDER BY total_spend DESC;
```
**Hourly Trend:**
```sql
SELECT bucket, avg_impressions, avg_clicks, avg_spend
FROM account_insights_hourly
WHERE account_id = 'act_your_id'
AND bucket >= NOW() - INTERVAL '7 days'
ORDER BY bucket;
```
## Rate Limiting
The script is configured to be very conservative:
- 2 seconds delay between API requests
- Only 1 concurrent request at a time
- Limited to top 10 campaigns and ad sets
The system is configured to be very conservative:
- **2 seconds delay** between API requests
- **Only 1 concurrent request** at a time
- **Top 50 campaigns/adsets** per collection
- **2 hour intervals** between collections
You can adjust these settings in the `MetaInsightsGrabber` class if needed.
## Output
Data is saved to `data/meta_insights_TIMESTAMP.json` with the following structure:
```json
{
"account": { ... },
"campaigns": { ... },
"ad_sets": { ... },
"summary": { ... }
}
```
This ensures you stay well within Meta's API rate limits.