5.8 KiB
Email Leads Extraction and Import
This document describes the lead extraction and CSV import functionality for the Alpine Bits Python application.
Overview
The system now supports extracting lead information from email MBOX files and importing the structured data into the application. This includes support for both the original landing page form CSV format and the new email lead export format.
Lead Extraction (extract_leads.py)
Purpose
Extracts structured lead information from email MBOX files (like Google Takeout exports) and exports them to CSV and JSON formats.
Usage
python extract_leads.py
Input Format
MBOX files containing emails with structured lead data in the following format:
Name: Martina
Nachname: Contarin
Mail: martinacontarin.mc@gmail.com
Tel: 3473907005
Anreise: 30.12.2025
Abreise: 04.01.2026
Erwachsene: 2
Kinder: 3
Alter Kind 1: 3
Alter Kind 2: 6
Alter Kind 3: 10
Apartment: Peonia
Verpflegung: Halbpension
Output Formats
CSV Export (leads_export.csv)
Tabular format with the following columns:
name- First namelastname- Last namemail- Email addresstel- Phone numberanreise- Check-in date (DD.MM.YYYY)abreise- Check-out date (DD.MM.YYYY)erwachsene- Number of adultskinder- Number of childrenkind_ages- Child ages as comma-separated string (e.g., "3,6,10")apartments- Comma-separated apartment preferencesverpflegung- Meal plan preferencesprache- Languagedevice- Device informationanrede- Salutation/titleland- Countryprivacy- Privacy consent (Yes/No)
JSON Export (leads_export.json)
Same data in JSON format for programmatic access.
CSV Import Integration
Enhanced CSV Importer
The CSVImporter class in csv_import.py now supports both:
-
German Landing Page Form Format (original)
- Column names in German (Zeit der Einreichung, Anreisedatum, etc.)
- Child ages in individual columns (Alter Kind 1, Alter Kind 2, etc.)
-
English Email Lead Export Format (new)
- Column names in English (name, lastname, anreise, abreise, etc.)
- Child ages as comma-separated string in
kind_agescolumn
API Endpoint
The existing CSV import endpoint now handles both formats:
PUT /api/admin/import-csv/{hotel_code}/{filename:path}
Example with leads CSV:
curl -X PUT \
-H "Authorization: Basic user:pass" \
--data-binary @leads_export.csv \
http://localhost:8000/api/admin/import-csv/bemelmans/leads.csv
Features
- Automatic Format Detection: The importer automatically detects which format is being used
- Child Age Handling: Supports both individual age columns and comma-separated age format
- Duplicate Detection: Uses name, email, dates, and tracking IDs (fbclid/gclid) to prevent duplicates
- Dry-Run Mode: Test imports without committing data
- Pre-Acknowledgement: Optionally pre-acknowledge all imported reservations
- Transaction Safety: Rolls back on any error, maintaining data integrity
Supported Columns
Required Fields
name(orVorname) - First namelastname(orNachname) - Last name
Date Fields (required)
anreise(orAnreisedatum) - Check-in dateabreise(orAbreisedatum) - Check-out date
Guest Information
mail(orEmail) - Email addresstel(orPhone) - Phone numbererwachsene(orAnzahl Erwachsene) - Number of adultskinder(orAnzahl Kinder) - Number of childrenkind_ages(or individualAlter Kind 1-10) - Child ages
Preferences
apartments(orAngebot auswählen) - Room/apartment preferencesverpflegung- Meal plan preferencesprache- Language preference
Metadata
device- Device informationanrede- Salutation/titleland- Countryprivacy- Privacy consent
Tracking (optional)
utm_Source,utm_Medium,utm_Campaign,utm_Term,utm_Content- UTM parametersfbclid- Facebook click IDgclid- Google click ID
Import Examples
Python:
from src.alpine_bits_python.csv_import import CSVImporter
from src.alpine_bits_python.db import AsyncSession
async with AsyncSession() as session:
importer = CSVImporter(session, config)
# Test import (dry-run)
result = await importer.import_csv_file(
csv_file_path="leads_export.csv",
hotel_code="bemelmans",
dryrun=True
)
# Actual import
stats = await importer.import_csv_file(
csv_file_path="leads_export.csv",
hotel_code="bemelmans",
pre_acknowledge=True,
client_id="my_client",
username="hotel_user"
)
print(f"Created {stats['created_reservations']} reservations")
Command Line (via API):
# Copy CSV to logs directory (endpoint expects it there)
cp leads_export.csv /logs/csv_imports/leads.csv
# Import via API
curl -X PUT \
-H "Authorization: Basic username:password" \
http://localhost:8000/api/admin/import-csv/bemelmans/leads.csv
Return Values
The importer returns statistics:
{
'total_rows': 576,
'skipped_empty': 0,
'created_customers': 45,
'existing_customers': 531,
'created_reservations': 576,
'skipped_duplicates': 0,
'pre_acknowledged': 576,
'errors': []
}
Data Flow
Email MBOX Files
↓
extract_leads.py
↓
leads_export.csv / leads_export.json
↓
CSV Import API
↓
CSVImporter.import_csv_file()
↓
Database (Customers & Reservations)
Notes
- Dates can be in formats:
YYYY-MM-DD,DD.MM.YYYY, orDD/MM/YYYY - Child ages are validated to be between 0-17 years old
- If child count doesn't match the number of ages provided, the system will attempt to match them
- All imports are wrapped in database transactions for safety
- The API endpoint requires HTTP Basic Authentication