Mostly ready for first test run but there is one improvement I want to implement first

This commit is contained in:
Jonas Linter
2025-10-21 17:46:27 +02:00
parent 6e4cc7ed1d
commit ec10ca51e0
8 changed files with 1612 additions and 28 deletions

302
TIMESTAMP_LOGIC.md Normal file
View File

@@ -0,0 +1,302 @@
# Timestamp Logic for Meta Insights Data
## Overview
The system now uses intelligent timestamp assignment based on the `date_preset` and account timezone to ensure accurate day-by-day plotting while handling Meta's timezone-based data reporting.
## Key Concepts
### Meta's Timezone Behavior
Meta API reports data based on the **ad account's timezone**:
- "today" = today in the account's timezone
- "yesterday" = yesterday in the account's timezone
- An account in `America/Los_Angeles` (PST/PDT) will have different "today" dates than an account in `Europe/London` (GMT/BST)
### The Timestamp Challenge
When storing time-series data, we need timestamps that:
1. Reflect the actual date of the data (not when we fetched it)
2. Account for the ad account's timezone
3. Allow for accurate day-by-day plotting
4. Use current time for "today" (live, constantly updating data)
5. Use historical timestamps for past data (fixed point in time)
## Implementation
### The `_compute_timestamp()` Method
Located in [scheduled_grabber.py](src/meta_api_grabber/scheduled_grabber.py), this method computes the appropriate timestamp for each data point:
```python
def _compute_timestamp(
self,
date_preset: str,
date_start_str: Optional[str],
account_timezone: str
) -> datetime:
"""
Compute the appropriate timestamp for storing insights data.
For 'today': Use current time (data is live, constantly updating)
For historical presets: Use noon of that date in the account's timezone,
then convert to UTC for storage
"""
```
### Logic Flow
#### For "today" Data:
```
date_preset = "today"
Use datetime.now(timezone.utc)
Store with current timestamp
Multiple fetches during the day overwrite each other
(database ON CONFLICT updates existing records)
```
**Why**: Today's data changes throughout the day. Using the current time ensures we can see when data was last updated.
#### For Historical Data (e.g., "yesterday"):
```
date_preset = "yesterday"
date_start = "2025-10-20"
account_timezone = "America/Los_Angeles"
Create datetime: 2025-10-20 12:00:00 in PST
Convert to UTC: 2025-10-20 19:00:00 UTC (PST is UTC-7 in summer)
Store with this timestamp
Data point will plot on the correct day
```
**Why**: Historical data is fixed. Using noon in the account's timezone ensures:
1. The timestamp falls on the correct calendar day
2. Timezone differences don't cause data to appear on wrong days
3. Consistent time (noon) for all historical data points
### Timezone Handling
Account timezones are:
1. **Cached during metadata collection** in the `ad_accounts` table
2. **Retrieved from database** using `_get_account_timezone()`
3. **Cached in memory** to avoid repeated database queries
Example timezone conversion:
```python
# Account in Los Angeles (PST/PDT = UTC-8/UTC-7)
date_start = "2025-10-20" # Yesterday in account timezone
account_tz = ZoneInfo("America/Los_Angeles")
# Create datetime at noon LA time
timestamp_local = datetime(2025, 10, 20, 12, 0, 0, tzinfo=account_tz)
# Result: 2025-10-20 12:00:00-07:00 (PDT)
# Convert to UTC for storage
timestamp_utc = timestamp_local.astimezone(timezone.utc)
# Result: 2025-10-20 19:00:00+00:00 (UTC)
```
## Examples
### Example 1: Same Account, Multiple Days
**Ad Account**: `act_123` in `America/New_York` (EST = UTC-5)
**Scenario**:
- Fetch "yesterday" data on Oct 21, 2025
- `date_start` from API: `"2025-10-20"`
**Timestamp Calculation**:
```
2025-10-20 12:00:00 EST (noon in NY)
↓ convert to UTC
2025-10-20 17:00:00 UTC (stored in database)
```
**Result**: Data plots on October 20 regardless of viewer's timezone
### Example 2: Different Timezones
**Account A**: `America/Los_Angeles` (PDT = UTC-7)
**Account B**: `Europe/London` (BST = UTC+1)
Both fetch "yesterday" on Oct 21, 2025:
| Account | date_start | Local Time | UTC Stored |
|---------|-----------|------------|------------|
| A (LA) | 2025-10-20 | 12:00 PDT | 19:00 UTC |
| B (London) | 2025-10-20 | 12:00 BST | 11:00 UTC |
**Result**: Both plot on October 20, even though stored at different UTC times
### Example 3: "Today" Data Updates
**Account**: Any timezone
**Fetches**: Every 2 hours
| Fetch Time (UTC) | date_preset | date_start | Stored Timestamp |
|-----------------|-------------|------------|------------------|
| 08:00 UTC | "today" | 2025-10-21 | 08:00 UTC (current) |
| 10:00 UTC | "today" | 2025-10-21 | 10:00 UTC (current) |
| 12:00 UTC | "today" | 2025-10-21 | 12:00 UTC (current) |
**Result**: Latest data always has the most recent timestamp, showing when it was fetched
## Database Schema Implications
### Primary Key Constraint
All insights tables use:
```sql
PRIMARY KEY (time, account_id) -- or (time, campaign_id), etc.
```
With `ON CONFLICT DO UPDATE`:
```sql
INSERT INTO account_insights (time, account_id, ...)
VALUES (...)
ON CONFLICT (time, account_id)
DO UPDATE SET
impressions = EXCLUDED.impressions,
spend = EXCLUDED.spend,
...
```
### Behavior by Date Preset
**"today" data**:
- Multiple fetches in same day have different timestamps
- No conflicts (different `time` values)
- Creates multiple rows, building time-series
- Can see data evolution throughout the day
**"yesterday" data**:
- All fetches use same timestamp (noon in account TZ)
- Conflicts occur (same `time` value)
- Updates existing row with fresh data
- Only keeps latest version
## Querying Data
### Query by Day (Recommended)
```sql
-- Get all data for a specific date range
SELECT
DATE(time AT TIME ZONE 'America/Los_Angeles') as data_date,
account_id,
AVG(spend) as avg_spend,
MAX(impressions) as max_impressions
FROM account_insights
WHERE time >= '2025-10-15' AND time < '2025-10-22'
GROUP BY data_date, account_id
ORDER BY data_date DESC;
```
### Filter by Date Preset
```sql
-- Get only historical (yesterday) data
SELECT * FROM account_insights
WHERE date_preset = 'yesterday'
ORDER BY time DESC;
-- Get only live (today) data
SELECT * FROM account_insights
WHERE date_preset = 'today'
ORDER BY time DESC;
```
## Plotting Considerations
When creating day-by-day plots:
### Option 1: Use `date_start` Field
```sql
SELECT
date_start, -- Already a DATE type
SUM(spend) as total_spend
FROM account_insights
GROUP BY date_start
ORDER BY date_start;
```
### Option 2: Extract Date from Timestamp
```sql
SELECT
DATE(time) as data_date, -- Convert timestamp to date
SUM(spend) as total_spend
FROM account_insights
GROUP BY data_date
ORDER BY data_date;
```
### For "Today" Data (Multiple Points Per Day)
```sql
-- Get latest "today" data for each account
SELECT DISTINCT ON (account_id)
account_id,
time,
spend,
impressions
FROM account_insights
WHERE date_preset = 'today'
ORDER BY account_id, time DESC;
```
## Benefits
1. **Accurate Day Assignment**: Historical data always plots on correct calendar day
2. **Timezone Aware**: Respects Meta's timezone-based reporting
3. **Live Updates**: "Today" data shows progression throughout the day
4. **Historical Accuracy**: Yesterday data uses consistent timestamp
5. **Update Tracking**: Can see when "yesterday" data was last refreshed
6. **Query Flexibility**: Can query by date_start or extract date from time
## Troubleshooting
### Data Appears on Wrong Day
**Symptom**: Yesterday's data shows on wrong day in graphs
**Cause**: Timezone not being considered
**Solution**: Already handled! Our `_compute_timestamp()` uses account timezone
### Multiple Entries for Yesterday
**Symptom**: Multiple rows for same account and yesterday's date
**Cause**: Database conflict resolution not working
**Check**:
- Primary key includes `time` and `account_id`
- ON CONFLICT clause exists in insert statements
- Timestamp is actually the same (should be: noon in account TZ)
### Timezone Errors
**Symptom**: `ZoneInfo` errors or invalid timezone names
**Cause**: Invalid timezone in database or missing timezone data
**Solution**: Code falls back to UTC if timezone is invalid
```python
except Exception as e:
print(f"Warning: Could not parse timezone '{account_timezone}': {e}")
return datetime.now(timezone.utc)
```
## Summary
The timestamp logic ensures:
- ✅ "Today" data uses current time (live updates)
- ✅ Historical data uses noon in account's timezone
- ✅ Timezone conversions handled automatically
- ✅ Data plots correctly day-by-day
- ✅ Account timezone cached for performance
- ✅ Fallback handling for missing/invalid timezones
This provides accurate, timezone-aware time-series data ready for visualization!