Mostly ready for first test run but there is one improvement I want to implement first
This commit is contained in:
302
TIMESTAMP_LOGIC.md
Normal file
302
TIMESTAMP_LOGIC.md
Normal file
@@ -0,0 +1,302 @@
|
||||
# Timestamp Logic for Meta Insights Data
|
||||
|
||||
## Overview
|
||||
|
||||
The system now uses intelligent timestamp assignment based on the `date_preset` and account timezone to ensure accurate day-by-day plotting while handling Meta's timezone-based data reporting.
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Meta's Timezone Behavior
|
||||
|
||||
Meta API reports data based on the **ad account's timezone**:
|
||||
- "today" = today in the account's timezone
|
||||
- "yesterday" = yesterday in the account's timezone
|
||||
- An account in `America/Los_Angeles` (PST/PDT) will have different "today" dates than an account in `Europe/London` (GMT/BST)
|
||||
|
||||
### The Timestamp Challenge
|
||||
|
||||
When storing time-series data, we need timestamps that:
|
||||
1. Reflect the actual date of the data (not when we fetched it)
|
||||
2. Account for the ad account's timezone
|
||||
3. Allow for accurate day-by-day plotting
|
||||
4. Use current time for "today" (live, constantly updating data)
|
||||
5. Use historical timestamps for past data (fixed point in time)
|
||||
|
||||
## Implementation
|
||||
|
||||
### The `_compute_timestamp()` Method
|
||||
|
||||
Located in [scheduled_grabber.py](src/meta_api_grabber/scheduled_grabber.py), this method computes the appropriate timestamp for each data point:
|
||||
|
||||
```python
|
||||
def _compute_timestamp(
|
||||
self,
|
||||
date_preset: str,
|
||||
date_start_str: Optional[str],
|
||||
account_timezone: str
|
||||
) -> datetime:
|
||||
"""
|
||||
Compute the appropriate timestamp for storing insights data.
|
||||
|
||||
For 'today': Use current time (data is live, constantly updating)
|
||||
For historical presets: Use noon of that date in the account's timezone,
|
||||
then convert to UTC for storage
|
||||
"""
|
||||
```
|
||||
|
||||
### Logic Flow
|
||||
|
||||
#### For "today" Data:
|
||||
```
|
||||
date_preset = "today"
|
||||
↓
|
||||
Use datetime.now(timezone.utc)
|
||||
↓
|
||||
Store with current timestamp
|
||||
↓
|
||||
Multiple fetches during the day overwrite each other
|
||||
(database ON CONFLICT updates existing records)
|
||||
```
|
||||
|
||||
**Why**: Today's data changes throughout the day. Using the current time ensures we can see when data was last updated.
|
||||
|
||||
#### For Historical Data (e.g., "yesterday"):
|
||||
```
|
||||
date_preset = "yesterday"
|
||||
date_start = "2025-10-20"
|
||||
account_timezone = "America/Los_Angeles"
|
||||
↓
|
||||
Create datetime: 2025-10-20 12:00:00 in PST
|
||||
↓
|
||||
Convert to UTC: 2025-10-20 19:00:00 UTC (PST is UTC-7 in summer)
|
||||
↓
|
||||
Store with this timestamp
|
||||
↓
|
||||
Data point will plot on the correct day
|
||||
```
|
||||
|
||||
**Why**: Historical data is fixed. Using noon in the account's timezone ensures:
|
||||
1. The timestamp falls on the correct calendar day
|
||||
2. Timezone differences don't cause data to appear on wrong days
|
||||
3. Consistent time (noon) for all historical data points
|
||||
|
||||
### Timezone Handling
|
||||
|
||||
Account timezones are:
|
||||
1. **Cached during metadata collection** in the `ad_accounts` table
|
||||
2. **Retrieved from database** using `_get_account_timezone()`
|
||||
3. **Cached in memory** to avoid repeated database queries
|
||||
|
||||
Example timezone conversion:
|
||||
```python
|
||||
# Account in Los Angeles (PST/PDT = UTC-8/UTC-7)
|
||||
date_start = "2025-10-20" # Yesterday in account timezone
|
||||
account_tz = ZoneInfo("America/Los_Angeles")
|
||||
|
||||
# Create datetime at noon LA time
|
||||
timestamp_local = datetime(2025, 10, 20, 12, 0, 0, tzinfo=account_tz)
|
||||
# Result: 2025-10-20 12:00:00-07:00 (PDT)
|
||||
|
||||
# Convert to UTC for storage
|
||||
timestamp_utc = timestamp_local.astimezone(timezone.utc)
|
||||
# Result: 2025-10-20 19:00:00+00:00 (UTC)
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Same Account, Multiple Days
|
||||
|
||||
**Ad Account**: `act_123` in `America/New_York` (EST = UTC-5)
|
||||
|
||||
**Scenario**:
|
||||
- Fetch "yesterday" data on Oct 21, 2025
|
||||
- `date_start` from API: `"2025-10-20"`
|
||||
|
||||
**Timestamp Calculation**:
|
||||
```
|
||||
2025-10-20 12:00:00 EST (noon in NY)
|
||||
↓ convert to UTC
|
||||
2025-10-20 17:00:00 UTC (stored in database)
|
||||
```
|
||||
|
||||
**Result**: Data plots on October 20 regardless of viewer's timezone
|
||||
|
||||
### Example 2: Different Timezones
|
||||
|
||||
**Account A**: `America/Los_Angeles` (PDT = UTC-7)
|
||||
**Account B**: `Europe/London` (BST = UTC+1)
|
||||
|
||||
Both fetch "yesterday" on Oct 21, 2025:
|
||||
|
||||
| Account | date_start | Local Time | UTC Stored |
|
||||
|---------|-----------|------------|------------|
|
||||
| A (LA) | 2025-10-20 | 12:00 PDT | 19:00 UTC |
|
||||
| B (London) | 2025-10-20 | 12:00 BST | 11:00 UTC |
|
||||
|
||||
**Result**: Both plot on October 20, even though stored at different UTC times
|
||||
|
||||
### Example 3: "Today" Data Updates
|
||||
|
||||
**Account**: Any timezone
|
||||
**Fetches**: Every 2 hours
|
||||
|
||||
| Fetch Time (UTC) | date_preset | date_start | Stored Timestamp |
|
||||
|-----------------|-------------|------------|------------------|
|
||||
| 08:00 UTC | "today" | 2025-10-21 | 08:00 UTC (current) |
|
||||
| 10:00 UTC | "today" | 2025-10-21 | 10:00 UTC (current) |
|
||||
| 12:00 UTC | "today" | 2025-10-21 | 12:00 UTC (current) |
|
||||
|
||||
**Result**: Latest data always has the most recent timestamp, showing when it was fetched
|
||||
|
||||
## Database Schema Implications
|
||||
|
||||
### Primary Key Constraint
|
||||
|
||||
All insights tables use:
|
||||
```sql
|
||||
PRIMARY KEY (time, account_id) -- or (time, campaign_id), etc.
|
||||
```
|
||||
|
||||
With `ON CONFLICT DO UPDATE`:
|
||||
```sql
|
||||
INSERT INTO account_insights (time, account_id, ...)
|
||||
VALUES (...)
|
||||
ON CONFLICT (time, account_id)
|
||||
DO UPDATE SET
|
||||
impressions = EXCLUDED.impressions,
|
||||
spend = EXCLUDED.spend,
|
||||
...
|
||||
```
|
||||
|
||||
### Behavior by Date Preset
|
||||
|
||||
**"today" data**:
|
||||
- Multiple fetches in same day have different timestamps
|
||||
- No conflicts (different `time` values)
|
||||
- Creates multiple rows, building time-series
|
||||
- Can see data evolution throughout the day
|
||||
|
||||
**"yesterday" data**:
|
||||
- All fetches use same timestamp (noon in account TZ)
|
||||
- Conflicts occur (same `time` value)
|
||||
- Updates existing row with fresh data
|
||||
- Only keeps latest version
|
||||
|
||||
## Querying Data
|
||||
|
||||
### Query by Day (Recommended)
|
||||
|
||||
```sql
|
||||
-- Get all data for a specific date range
|
||||
SELECT
|
||||
DATE(time AT TIME ZONE 'America/Los_Angeles') as data_date,
|
||||
account_id,
|
||||
AVG(spend) as avg_spend,
|
||||
MAX(impressions) as max_impressions
|
||||
FROM account_insights
|
||||
WHERE time >= '2025-10-15' AND time < '2025-10-22'
|
||||
GROUP BY data_date, account_id
|
||||
ORDER BY data_date DESC;
|
||||
```
|
||||
|
||||
### Filter by Date Preset
|
||||
|
||||
```sql
|
||||
-- Get only historical (yesterday) data
|
||||
SELECT * FROM account_insights
|
||||
WHERE date_preset = 'yesterday'
|
||||
ORDER BY time DESC;
|
||||
|
||||
-- Get only live (today) data
|
||||
SELECT * FROM account_insights
|
||||
WHERE date_preset = 'today'
|
||||
ORDER BY time DESC;
|
||||
```
|
||||
|
||||
## Plotting Considerations
|
||||
|
||||
When creating day-by-day plots:
|
||||
|
||||
### Option 1: Use `date_start` Field
|
||||
```sql
|
||||
SELECT
|
||||
date_start, -- Already a DATE type
|
||||
SUM(spend) as total_spend
|
||||
FROM account_insights
|
||||
GROUP BY date_start
|
||||
ORDER BY date_start;
|
||||
```
|
||||
|
||||
### Option 2: Extract Date from Timestamp
|
||||
```sql
|
||||
SELECT
|
||||
DATE(time) as data_date, -- Convert timestamp to date
|
||||
SUM(spend) as total_spend
|
||||
FROM account_insights
|
||||
GROUP BY data_date
|
||||
ORDER BY data_date;
|
||||
```
|
||||
|
||||
### For "Today" Data (Multiple Points Per Day)
|
||||
|
||||
```sql
|
||||
-- Get latest "today" data for each account
|
||||
SELECT DISTINCT ON (account_id)
|
||||
account_id,
|
||||
time,
|
||||
spend,
|
||||
impressions
|
||||
FROM account_insights
|
||||
WHERE date_preset = 'today'
|
||||
ORDER BY account_id, time DESC;
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Accurate Day Assignment**: Historical data always plots on correct calendar day
|
||||
2. **Timezone Aware**: Respects Meta's timezone-based reporting
|
||||
3. **Live Updates**: "Today" data shows progression throughout the day
|
||||
4. **Historical Accuracy**: Yesterday data uses consistent timestamp
|
||||
5. **Update Tracking**: Can see when "yesterday" data was last refreshed
|
||||
6. **Query Flexibility**: Can query by date_start or extract date from time
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Data Appears on Wrong Day
|
||||
|
||||
**Symptom**: Yesterday's data shows on wrong day in graphs
|
||||
**Cause**: Timezone not being considered
|
||||
**Solution**: Already handled! Our `_compute_timestamp()` uses account timezone
|
||||
|
||||
### Multiple Entries for Yesterday
|
||||
|
||||
**Symptom**: Multiple rows for same account and yesterday's date
|
||||
**Cause**: Database conflict resolution not working
|
||||
**Check**:
|
||||
- Primary key includes `time` and `account_id`
|
||||
- ON CONFLICT clause exists in insert statements
|
||||
- Timestamp is actually the same (should be: noon in account TZ)
|
||||
|
||||
### Timezone Errors
|
||||
|
||||
**Symptom**: `ZoneInfo` errors or invalid timezone names
|
||||
**Cause**: Invalid timezone in database or missing timezone data
|
||||
**Solution**: Code falls back to UTC if timezone is invalid
|
||||
|
||||
```python
|
||||
except Exception as e:
|
||||
print(f"Warning: Could not parse timezone '{account_timezone}': {e}")
|
||||
return datetime.now(timezone.utc)
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
The timestamp logic ensures:
|
||||
- ✅ "Today" data uses current time (live updates)
|
||||
- ✅ Historical data uses noon in account's timezone
|
||||
- ✅ Timezone conversions handled automatically
|
||||
- ✅ Data plots correctly day-by-day
|
||||
- ✅ Account timezone cached for performance
|
||||
- ✅ Fallback handling for missing/invalid timezones
|
||||
|
||||
This provides accurate, timezone-aware time-series data ready for visualization!
|
||||
Reference in New Issue
Block a user