Files
alpinebits_python/LEADS_EXTRACTION.md
2025-11-19 09:55:54 +01:00

212 lines
5.8 KiB
Markdown

# Email Leads Extraction and Import
This document describes the lead extraction and CSV import functionality for the Alpine Bits Python application.
## Overview
The system now supports extracting lead information from email MBOX files and importing the structured data into the application. This includes support for both the original landing page form CSV format and the new email lead export format.
## Lead Extraction (`extract_leads.py`)
### Purpose
Extracts structured lead information from email MBOX files (like Google Takeout exports) and exports them to CSV and JSON formats.
### Usage
```bash
python extract_leads.py
```
### Input Format
MBOX files containing emails with structured lead data in the following format:
```
Name: Martina
Nachname: Contarin
Mail: martinacontarin.mc@gmail.com
Tel: 3473907005
Anreise: 30.12.2025
Abreise: 04.01.2026
Erwachsene: 2
Kinder: 3
Alter Kind 1: 3
Alter Kind 2: 6
Alter Kind 3: 10
Apartment: Peonia
Verpflegung: Halbpension
```
### Output Formats
#### CSV Export (`leads_export.csv`)
Tabular format with the following columns:
- `name` - First name
- `lastname` - Last name
- `mail` - Email address
- `tel` - Phone number
- `anreise` - Check-in date (DD.MM.YYYY)
- `abreise` - Check-out date (DD.MM.YYYY)
- `erwachsene` - Number of adults
- `kinder` - Number of children
- `kind_ages` - Child ages as comma-separated string (e.g., "3,6,10")
- `apartments` - Comma-separated apartment preferences
- `verpflegung` - Meal plan preference
- `sprache` - Language
- `device` - Device information
- `anrede` - Salutation/title
- `land` - Country
- `privacy` - Privacy consent (Yes/No)
#### JSON Export (`leads_export.json`)
Same data in JSON format for programmatic access.
## CSV Import Integration
### Enhanced CSV Importer
The `CSVImporter` class in `csv_import.py` now supports both:
1. **German Landing Page Form Format** (original)
- Column names in German (Zeit der Einreichung, Anreisedatum, etc.)
- Child ages in individual columns (Alter Kind 1, Alter Kind 2, etc.)
2. **English Email Lead Export Format** (new)
- Column names in English (name, lastname, anreise, abreise, etc.)
- Child ages as comma-separated string in `kind_ages` column
### API Endpoint
The existing CSV import endpoint now handles both formats:
```http
PUT /api/admin/import-csv/{hotel_code}/{filename:path}
```
**Example with leads CSV:**
```bash
curl -X PUT \
-H "Authorization: Basic user:pass" \
--data-binary @leads_export.csv \
http://localhost:8000/api/admin/import-csv/bemelmans/leads.csv
```
### Features
- **Automatic Format Detection**: The importer automatically detects which format is being used
- **Child Age Handling**: Supports both individual age columns and comma-separated age format
- **Duplicate Detection**: Uses name, email, dates, and tracking IDs (fbclid/gclid) to prevent duplicates
- **Dry-Run Mode**: Test imports without committing data
- **Pre-Acknowledgement**: Optionally pre-acknowledge all imported reservations
- **Transaction Safety**: Rolls back on any error, maintaining data integrity
### Supported Columns
#### Required Fields
- `name` (or `Vorname`) - First name
- `lastname` (or `Nachname`) - Last name
#### Date Fields (required)
- `anreise` (or `Anreisedatum`) - Check-in date
- `abreise` (or `Abreisedatum`) - Check-out date
#### Guest Information
- `mail` (or `Email`) - Email address
- `tel` (or `Phone`) - Phone number
- `erwachsene` (or `Anzahl Erwachsene`) - Number of adults
- `kinder` (or `Anzahl Kinder`) - Number of children
- `kind_ages` (or individual `Alter Kind 1-10`) - Child ages
#### Preferences
- `apartments` (or `Angebot auswählen`) - Room/apartment preferences
- `verpflegung` - Meal plan preference
- `sprache` - Language preference
#### Metadata
- `device` - Device information
- `anrede` - Salutation/title
- `land` - Country
- `privacy` - Privacy consent
#### Tracking (optional)
- `utm_Source`, `utm_Medium`, `utm_Campaign`, `utm_Term`, `utm_Content` - UTM parameters
- `fbclid` - Facebook click ID
- `gclid` - Google click ID
### Import Examples
**Python:**
```python
from src.alpine_bits_python.csv_import import CSVImporter
from src.alpine_bits_python.db import AsyncSession
async with AsyncSession() as session:
importer = CSVImporter(session, config)
# Test import (dry-run)
result = await importer.import_csv_file(
csv_file_path="leads_export.csv",
hotel_code="bemelmans",
dryrun=True
)
# Actual import
stats = await importer.import_csv_file(
csv_file_path="leads_export.csv",
hotel_code="bemelmans",
pre_acknowledge=True,
client_id="my_client",
username="hotel_user"
)
print(f"Created {stats['created_reservations']} reservations")
```
**Command Line (via API):**
```bash
# Copy CSV to logs directory (endpoint expects it there)
cp leads_export.csv /logs/csv_imports/leads.csv
# Import via API
curl -X PUT \
-H "Authorization: Basic username:password" \
http://localhost:8000/api/admin/import-csv/bemelmans/leads.csv
```
### Return Values
The importer returns statistics:
```python
{
'total_rows': 576,
'skipped_empty': 0,
'created_customers': 45,
'existing_customers': 531,
'created_reservations': 576,
'skipped_duplicates': 0,
'pre_acknowledged': 576,
'errors': []
}
```
## Data Flow
```
Email MBOX Files
extract_leads.py
leads_export.csv / leads_export.json
CSV Import API
CSVImporter.import_csv_file()
Database (Customers & Reservations)
```
## Notes
- Dates can be in formats: `YYYY-MM-DD`, `DD.MM.YYYY`, or `DD/MM/YYYY`
- Child ages are validated to be between 0-17 years old
- If child count doesn't match the number of ages provided, the system will attempt to match them
- All imports are wrapped in database transactions for safety
- The API endpoint requires HTTP Basic Authentication