13 KiB
Database Schema Analysis
Overview
This document analyzes the database schema for normalization issues, redundancy, and potential improvements.
Schema Summary
The database contains 13 tables organized around several core concepts:
- Customer/Guest Management:
customers,hashed_customers,conversion_guests - Reservations:
reservations,conversions,conversion_rooms - Hotels:
hotels,hotel_inventory,room_availability - Webhooks:
webhook_endpoints,webhook_requests - Tracking:
acked_requests
Major Issues Identified
1. CRITICAL: Dual Customer Systems (Data Duplication)
Problem: The schema maintains two parallel customer tracking systems:
customers+hashed_customers(from Wix forms)conversion_guests(from PMS)
Impact:
- Same person can exist in both systems with no linkage
conversion_guests.hashed_customer_idattempts to link but this is backward (many-to-one instead of one-to-one)- Data inconsistency when same guest appears in both sources
Details:
customers (id=1, email="john@example.com")
└─ hashed_customers (id=1, customer_id=1, hashed_email="abc123...")
conversion_guests (hotel_id="HOTEL1", guest_id=42, guest_email="john@example.com")
└─ hashed_customer_id = NULL (or points to hashed_customers.id=1 after matching)
Recommendation:
- Create a unified
personstable with asourcefield ("wix", "pms", "merged") - Both
customersandconversion_guestsshould reference this unified entity - Implement proper guest matching/merging logic
2. Data Redundancy: Hashed Values Stored Separately
Problem: hashed_customers and conversion_guests store hashed values in separate columns alongside originals.
Current Structure:
customers:
- email_address (plaintext)
- phone (plaintext)
hashed_customers:
- customer_id (FK to customers)
- hashed_email
- hashed_phone
- hashed_given_name
...
Issues:
- Violates 3NF (derived data stored in separate table)
- Synchronization required between
customersandhashed_customers - If customer data changes, hashed version can become stale
- Extra JOIN required for every Meta Conversion API call
Better Approach:
Option A: Store hashed values directly in customers table as additional columns
Option B: Compute hashes on-the-fly (SHA256 is fast, ~1-2ms per hash)
Recommendation:
- Short term: Keep current structure but add triggers to auto-update hashed values
- Long term: Move hashed columns into
customerstable directly
3. Advertising Account IDs Duplicated Across Tables
Problem: meta_account_id and google_account_id appear in 3 places:
hotelstable (canonical source)reservationstable (copied at creation time)- Derived from
fbclid/gclidtracking parameters
Current Flow:
hotels.meta_account_id = "123456"
↓
reservation created with fbclid
↓
reservations.meta_account_id = "123456" (copied from hotels)
Issues:
- Denormalization without clear benefit
- If hotel's account ID changes, old reservations have stale data
- Mixed source of truth (sometimes from hotels, sometimes from tracking params)
Recommendation:
- Remove
meta_account_idandgoogle_account_idfromreservations - Always derive from
hotelstable via JOIN - If tracking-derived account differs from hotel's account, log a warning
4. Hotel Information Duplicated in Reservations
Problem: reservations table stores hotel_code and hotel_name but has no FK to hotels table.
Issues:
- Data can become inconsistent if hotel name changes
- No referential integrity
- Unclear if
hotel_codematcheshotels.hotel_id
Recommendation:
- Add
hotel_idFK column toreservationspointing tohotels.hotel_id - Remove
hotel_codeandhotel_namecolumns - Derive hotel information via JOIN when needed
5. Weak Foreign Key Consistency
Problem: Mixed use of ON DELETE policies:
- Some FKs use
SET NULL(appropriate for nullable relationships) - Some use
CASCADE(appropriate for child records) - Some use
NO ACTION(prevents deletion, may cause issues) conversionstable has confusing composite FK setup withhotel_idandguest_id
Examples:
-- Good: Child data should be deleted with parent
hotel_inventory.hotel_id → hotels.hotel_id (ON DELETE CASCADE)
-- Questionable: Should webhook requests survive hotel deletion?
webhook_requests.hotel_id → hotels.hotel_id (ON DELETE NO ACTION)
-- Inconsistent: Why SET NULL vs CASCADE?
reservations.customer_id → customers.id (ON DELETE SET NULL)
reservations.hashed_customer_id → hashed_customers.id (ON DELETE CASCADE)
Recommendation: Review each FK and establish consistent policies:
- Core data (hotels, customers): SET NULL to preserve historical records
- Supporting data (hashed_customers, inventory): CASCADE
- Transactional data (webhooks, conversions): Decide on retention policy
6. Confusing Composite Foreign Key in Conversions
Problem: The conversions table has a composite FK that's incorrectly mapped:
# In db.py lines 650-655
__table_args__ = (
ForeignKeyConstraint(
["hotel_id", "guest_id"],
["conversion_guests.hotel_id", "conversion_guests.guest_id"],
ondelete="SET NULL",
),
)
But the database shows:
Foreign Keys:
hotel_id -> conversion_guests.hotel_id (ON DELETE SET NULL)
guest_id -> conversion_guests.hotel_id (ON DELETE SET NULL) # ← WRONG!
guest_id -> conversion_guests.guest_id (ON DELETE SET NULL)
hotel_id -> conversion_guests.guest_id (ON DELETE SET NULL) # ← WRONG!
Impact:
- Database has 4 FKs instead of 1 composite FK
- Mapping is incorrect (guest_id → hotel_id doesn't make sense)
- Could cause constraint violations or allow orphaned records
Recommendation:
- Fix the composite FK definition in SQLAlchemy
- Run a migration to drop incorrect FKs and recreate properly
7. Unclear Relationship Between Reservations and Conversions
Problem: The relationship between reservations (from Wix forms) and conversions (from PMS) is complex:
conversions:
- reservation_id (FK to reservations) - matched by tracking IDs
- customer_id (FK to customers) - matched by guest details
- hashed_customer_id (FK to hashed_customers) - matched by hashed guest details
- guest_id (FK to conversion_guests) - the actual PMS guest
Issues:
- Three different FK fields to three different customer/guest tables
- Matching logic is unclear from schema alone
directly_attributableandguest_matchedflags indicate matching quality, but this should be more explicit
Recommendation:
- Add a
match_confidenceenum field: "exact_id", "high_confidence", "medium_confidence", "no_match" - Add
match_methodfield to explain how the link was made - Consider a separate
reservation_conversion_linkstable to make the many-to-many relationship explicit
8. Room Type Information Scattered
Problem: Room information appears in multiple places:
reservations.room_type_code,room_classification_code,room_typeconversion_rooms.room_type,room_numberhotel_inventory.inv_type_code,inv_code,room_name
Issues:
- No clear master data for room types
- Room type codes not standardized across sources
- No FK between
reservations.room_type_codeandhotel_inventory.inv_type_code
Recommendation:
- Create a
room_typesreference table linked to hotels - Add FKs from reservations and conversion_rooms to room_types
- Standardize room type codes across all sources
Normalization Analysis
1st Normal Form (1NF): ✅ PASS
- All columns contain atomic values
- Exception:
reservations.children_agesstores comma-separated values- Should be: separate
reservation_childrentable with age column
- Should be: separate
2nd Normal Form (2NF): ⚠️ MOSTLY PASS
- All non-key attributes depend on the full primary key
- Issue: Some denormalized data exists (hotel names, account IDs in reservations)
3rd Normal Form (3NF): ❌ FAIL
Multiple violations:
hashed_customersstores derived data (hashes) that depend oncustomersreservations.meta_account_iddepends onhotelsvia hotel_codereservations.hotel_namedepends onhotelsvia hotel_code
Data Integrity Issues
Missing Foreign Keys
- reservations.hotel_code → should FK to hotels.hotel_id
- reservations.room_type_code → should FK to hotel_inventory
- acked_requests.unique_id → should FK to reservations.unique_id (or be nullable)
Missing Indexes
Consider adding for query performance:
customers.email_address- for lookups during conversion matchingconversions.reservation_date- for time-based queriesconversion_rooms.total_revenue- for revenue analyticsreservations.start_date,end_date- for date range queries
Missing Constraints
-
Check constraints for date logic:
reservations.end_date > start_dateconversion_rooms.departure_date > arrival_date
-
Check constraints for counts:
num_adults >= 0,num_children >= 0
-
NOT NULL constraints on critical fields:
customers.contact_idshould be NOT NULL (it's the natural key)conversions.hotel_idis NOT NULL ✓ (good)
Recommendations Priority
HIGH PRIORITY (Data Integrity)
- Fix composite FK in
conversionstable (lines 650-655 in db.py) - Add
hotel_idFK toreservationstable - Add missing NOT NULL constraints on natural keys
- Add check constraints for date ranges and counts
MEDIUM PRIORITY (Normalization)
- Unify customer/guest systems into a single
personsentity - Remove duplicate account ID fields from
reservations - Remove
hotel_namefromreservations(derive via JOIN) - Create
reservation_childrentable for children_ages
LOW PRIORITY (Performance & Cleanup)
- Move hashed fields into
customerstable (removehashed_customers) - Add indexes for common query patterns
- Create
room_typesreference table - Add
match_confidenceandmatch_methodtoconversions
Positive Aspects
✅ Good use of composite keys (conversion_guests, hotel_inventory)
✅ Unique constraints on natural keys (contact_id, webhook_secret)
✅ Proper use of indexes on frequently queried fields
✅ Cascade deletion for child records (inventory, rooms)
✅ Tracking metadata (created_at, updated_at, first_seen, last_seen)
✅ Webhook deduplication via payload_hash
✅ JSON storage for flexible data (conversion_rooms.daily_sales)
Suggested Refactoring Path
Phase 1: Fix Critical Issues (1-2 days)
- Fix composite FK in conversions
- Add hotel_id FK to reservations
- Add missing constraints
Phase 2: Normalize Customer Data (3-5 days)
- Create unified persons/guests table
- Migrate existing data
- Update matching logic
Phase 3: Clean Up Redundancy (2-3 days)
- Remove duplicate account IDs
- Merge hashed_customers into customers
- Create room_types reference
Phase 4: Enhance Tracking (1-2 days)
- Add match_confidence fields
- Improve conversion attribution
- Add missing indexes
Query Examples Affected by Current Issues
Issue: Duplicate Customer Data
-- Current: Find all reservations for a guest (requires checking both systems)
SELECT r.* FROM reservations r
WHERE r.customer_id = ?
OR r.hashed_customer_id IN (
SELECT id FROM hashed_customers WHERE contact_id = ?
);
-- After fix: Simple unified query
SELECT r.* FROM reservations r
WHERE r.person_id = ?;
Issue: Missing Hotel FK
-- Current: Get hotel info for reservation (unreliable)
SELECT r.*, r.hotel_name
FROM reservations r
WHERE r.id = ?;
-- After fix: Reliable JOIN
SELECT r.*, h.hotel_name, h.meta_account_id
FROM reservations r
JOIN hotels h ON r.hotel_id = h.hotel_id
WHERE r.id = ?;
Issue: Hashed Data in Separate Table
-- Current: Get customer for Meta API (requires JOIN)
SELECT hc.hashed_email, hc.hashed_phone
FROM reservations r
JOIN hashed_customers hc ON r.hashed_customer_id = hc.id
WHERE r.id = ?;
-- After fix: Direct access
SELECT c.hashed_email, c.hashed_phone
FROM reservations r
JOIN customers c ON r.customer_id = c.id
WHERE r.id = ?;
Conclusion
The schema is functional but has significant normalization and consistency issues. The main problems are:
- Dual customer tracking systems that should be unified
- Redundant storage of derived data (hashes, account IDs)
- Missing foreign key relationships (hotels, room types)
- Inconsistent deletion policies across foreign keys
- Broken composite foreign key in conversions table
The database violates 3NF in several places and could benefit from a refactoring effort. However, the issues are primarily architectural rather than critical bugs, so the system can continue operating while improvements are made incrementally.
Estimated effort to fix all issues: 1-2 weeks of development + testing Risk level: Medium (requires data migration and careful FK updates) Recommended approach: Incremental fixes starting with high-priority items