# Email Leads Extraction and Import This document describes the lead extraction and CSV import functionality for the Alpine Bits Python application. ## Overview The system now supports extracting lead information from email MBOX files and importing the structured data into the application. This includes support for both the original landing page form CSV format and the new email lead export format. ## Lead Extraction (`extract_leads.py`) ### Purpose Extracts structured lead information from email MBOX files (like Google Takeout exports) and exports them to CSV and JSON formats. ### Usage ```bash python extract_leads.py ``` ### Input Format MBOX files containing emails with structured lead data in the following format: ``` Name: Martina Nachname: Contarin Mail: martinacontarin.mc@gmail.com Tel: 3473907005 Anreise: 30.12.2025 Abreise: 04.01.2026 Erwachsene: 2 Kinder: 3 Alter Kind 1: 3 Alter Kind 2: 6 Alter Kind 3: 10 Apartment: Peonia Verpflegung: Halbpension ``` ### Output Formats #### CSV Export (`leads_export.csv`) Tabular format with the following columns: - `name` - First name - `lastname` - Last name - `mail` - Email address - `tel` - Phone number - `anreise` - Check-in date (DD.MM.YYYY) - `abreise` - Check-out date (DD.MM.YYYY) - `erwachsene` - Number of adults - `kinder` - Number of children - `kind_ages` - Child ages as comma-separated string (e.g., "3,6,10") - `apartments` - Comma-separated apartment preferences - `verpflegung` - Meal plan preference - `sprache` - Language - `device` - Device information - `anrede` - Salutation/title - `land` - Country - `privacy` - Privacy consent (Yes/No) #### JSON Export (`leads_export.json`) Same data in JSON format for programmatic access. ## CSV Import Integration ### Enhanced CSV Importer The `CSVImporter` class in `csv_import.py` now supports both: 1. **German Landing Page Form Format** (original) - Column names in German (Zeit der Einreichung, Anreisedatum, etc.) - Child ages in individual columns (Alter Kind 1, Alter Kind 2, etc.) 2. **English Email Lead Export Format** (new) - Column names in English (name, lastname, anreise, abreise, etc.) - Child ages as comma-separated string in `kind_ages` column ### API Endpoint The existing CSV import endpoint now handles both formats: ```http PUT /api/admin/import-csv/{hotel_code}/{filename:path} ``` **Example with leads CSV:** ```bash curl -X PUT \ -H "Authorization: Basic user:pass" \ --data-binary @leads_export.csv \ http://localhost:8000/api/admin/import-csv/bemelmans/leads.csv ``` ### Features - **Automatic Format Detection**: The importer automatically detects which format is being used - **Child Age Handling**: Supports both individual age columns and comma-separated age format - **Duplicate Detection**: Uses name, email, dates, and tracking IDs (fbclid/gclid) to prevent duplicates - **Dry-Run Mode**: Test imports without committing data - **Pre-Acknowledgement**: Optionally pre-acknowledge all imported reservations - **Transaction Safety**: Rolls back on any error, maintaining data integrity ### Supported Columns #### Required Fields - `name` (or `Vorname`) - First name - `lastname` (or `Nachname`) - Last name #### Date Fields (required) - `anreise` (or `Anreisedatum`) - Check-in date - `abreise` (or `Abreisedatum`) - Check-out date #### Guest Information - `mail` (or `Email`) - Email address - `tel` (or `Phone`) - Phone number - `erwachsene` (or `Anzahl Erwachsene`) - Number of adults - `kinder` (or `Anzahl Kinder`) - Number of children - `kind_ages` (or individual `Alter Kind 1-10`) - Child ages #### Preferences - `apartments` (or `Angebot auswählen`) - Room/apartment preferences - `verpflegung` - Meal plan preference - `sprache` - Language preference #### Metadata - `device` - Device information - `anrede` - Salutation/title - `land` - Country - `privacy` - Privacy consent #### Tracking (optional) - `utm_Source`, `utm_Medium`, `utm_Campaign`, `utm_Term`, `utm_Content` - UTM parameters - `fbclid` - Facebook click ID - `gclid` - Google click ID ### Import Examples **Python:** ```python from src.alpine_bits_python.csv_import import CSVImporter from src.alpine_bits_python.db import AsyncSession async with AsyncSession() as session: importer = CSVImporter(session, config) # Test import (dry-run) result = await importer.import_csv_file( csv_file_path="leads_export.csv", hotel_code="bemelmans", dryrun=True ) # Actual import stats = await importer.import_csv_file( csv_file_path="leads_export.csv", hotel_code="bemelmans", pre_acknowledge=True, client_id="my_client", username="hotel_user" ) print(f"Created {stats['created_reservations']} reservations") ``` **Command Line (via API):** ```bash # Copy CSV to logs directory (endpoint expects it there) cp leads_export.csv /logs/csv_imports/leads.csv # Import via API curl -X PUT \ -H "Authorization: Basic username:password" \ http://localhost:8000/api/admin/import-csv/bemelmans/leads.csv ``` ### Return Values The importer returns statistics: ```python { 'total_rows': 576, 'skipped_empty': 0, 'created_customers': 45, 'existing_customers': 531, 'created_reservations': 576, 'skipped_duplicates': 0, 'pre_acknowledged': 576, 'errors': [] } ``` ## Data Flow ``` Email MBOX Files ↓ extract_leads.py ↓ leads_export.csv / leads_export.json ↓ CSV Import API ↓ CSVImporter.import_csv_file() ↓ Database (Customers & Reservations) ``` ## Notes - Dates can be in formats: `YYYY-MM-DD`, `DD.MM.YYYY`, or `DD/MM/YYYY` - Child ages are validated to be between 0-17 years old - If child count doesn't match the number of ages provided, the system will attempt to match them - All imports are wrapped in database transactions for safety - The API endpoint requires HTTP Basic Authentication