Files
alpinebits_python/LEADS_EXTRACTION.md
2025-11-19 09:55:54 +01:00

5.8 KiB

Email Leads Extraction and Import

This document describes the lead extraction and CSV import functionality for the Alpine Bits Python application.

Overview

The system now supports extracting lead information from email MBOX files and importing the structured data into the application. This includes support for both the original landing page form CSV format and the new email lead export format.

Lead Extraction (extract_leads.py)

Purpose

Extracts structured lead information from email MBOX files (like Google Takeout exports) and exports them to CSV and JSON formats.

Usage

python extract_leads.py

Input Format

MBOX files containing emails with structured lead data in the following format:

Name: Martina
Nachname: Contarin
Mail: martinacontarin.mc@gmail.com
Tel: 3473907005
Anreise: 30.12.2025
Abreise: 04.01.2026
Erwachsene: 2
Kinder: 3
Alter Kind 1: 3
Alter Kind 2: 6
Alter Kind 3: 10
Apartment: Peonia
Verpflegung: Halbpension

Output Formats

CSV Export (leads_export.csv)

Tabular format with the following columns:

  • name - First name
  • lastname - Last name
  • mail - Email address
  • tel - Phone number
  • anreise - Check-in date (DD.MM.YYYY)
  • abreise - Check-out date (DD.MM.YYYY)
  • erwachsene - Number of adults
  • kinder - Number of children
  • kind_ages - Child ages as comma-separated string (e.g., "3,6,10")
  • apartments - Comma-separated apartment preferences
  • verpflegung - Meal plan preference
  • sprache - Language
  • device - Device information
  • anrede - Salutation/title
  • land - Country
  • privacy - Privacy consent (Yes/No)

JSON Export (leads_export.json)

Same data in JSON format for programmatic access.

CSV Import Integration

Enhanced CSV Importer

The CSVImporter class in csv_import.py now supports both:

  1. German Landing Page Form Format (original)

    • Column names in German (Zeit der Einreichung, Anreisedatum, etc.)
    • Child ages in individual columns (Alter Kind 1, Alter Kind 2, etc.)
  2. English Email Lead Export Format (new)

    • Column names in English (name, lastname, anreise, abreise, etc.)
    • Child ages as comma-separated string in kind_ages column

API Endpoint

The existing CSV import endpoint now handles both formats:

PUT /api/admin/import-csv/{hotel_code}/{filename:path}

Example with leads CSV:

curl -X PUT \
  -H "Authorization: Basic user:pass" \
  --data-binary @leads_export.csv \
  http://localhost:8000/api/admin/import-csv/bemelmans/leads.csv

Features

  • Automatic Format Detection: The importer automatically detects which format is being used
  • Child Age Handling: Supports both individual age columns and comma-separated age format
  • Duplicate Detection: Uses name, email, dates, and tracking IDs (fbclid/gclid) to prevent duplicates
  • Dry-Run Mode: Test imports without committing data
  • Pre-Acknowledgement: Optionally pre-acknowledge all imported reservations
  • Transaction Safety: Rolls back on any error, maintaining data integrity

Supported Columns

Required Fields

  • name (or Vorname) - First name
  • lastname (or Nachname) - Last name

Date Fields (required)

  • anreise (or Anreisedatum) - Check-in date
  • abreise (or Abreisedatum) - Check-out date

Guest Information

  • mail (or Email) - Email address
  • tel (or Phone) - Phone number
  • erwachsene (or Anzahl Erwachsene) - Number of adults
  • kinder (or Anzahl Kinder) - Number of children
  • kind_ages (or individual Alter Kind 1-10) - Child ages

Preferences

  • apartments (or Angebot auswählen) - Room/apartment preferences
  • verpflegung - Meal plan preference
  • sprache - Language preference

Metadata

  • device - Device information
  • anrede - Salutation/title
  • land - Country
  • privacy - Privacy consent

Tracking (optional)

  • utm_Source, utm_Medium, utm_Campaign, utm_Term, utm_Content - UTM parameters
  • fbclid - Facebook click ID
  • gclid - Google click ID

Import Examples

Python:

from src.alpine_bits_python.csv_import import CSVImporter
from src.alpine_bits_python.db import AsyncSession

async with AsyncSession() as session:
    importer = CSVImporter(session, config)

    # Test import (dry-run)
    result = await importer.import_csv_file(
        csv_file_path="leads_export.csv",
        hotel_code="bemelmans",
        dryrun=True
    )

    # Actual import
    stats = await importer.import_csv_file(
        csv_file_path="leads_export.csv",
        hotel_code="bemelmans",
        pre_acknowledge=True,
        client_id="my_client",
        username="hotel_user"
    )
    print(f"Created {stats['created_reservations']} reservations")

Command Line (via API):

# Copy CSV to logs directory (endpoint expects it there)
cp leads_export.csv /logs/csv_imports/leads.csv

# Import via API
curl -X PUT \
  -H "Authorization: Basic username:password" \
  http://localhost:8000/api/admin/import-csv/bemelmans/leads.csv

Return Values

The importer returns statistics:

{
    'total_rows': 576,
    'skipped_empty': 0,
    'created_customers': 45,
    'existing_customers': 531,
    'created_reservations': 576,
    'skipped_duplicates': 0,
    'pre_acknowledged': 576,
    'errors': []
}

Data Flow

Email MBOX Files
      ↓
extract_leads.py
      ↓
leads_export.csv / leads_export.json
      ↓
CSV Import API
      ↓
CSVImporter.import_csv_file()
      ↓
Database (Customers & Reservations)

Notes

  • Dates can be in formats: YYYY-MM-DD, DD.MM.YYYY, or DD/MM/YYYY
  • Child ages are validated to be between 0-17 years old
  • If child count doesn't match the number of ages provided, the system will attempt to match them
  • All imports are wrapped in database transactions for safety
  • The API endpoint requires HTTP Basic Authentication