13 KiB
Email Monitoring and Alerting
This document describes the email monitoring and alerting system for the AlpineBits Python server.
Overview
The email monitoring system provides two main features:
- Error Alerts: Automatic email notifications when errors occur in the application
- Daily Reports: Scheduled daily summary emails with statistics and error logs
Architecture
Components
- EmailService (email_service.py): Core SMTP email sending functionality
- EmailAlertHandler (email_monitoring.py): Custom logging handler that captures errors and sends alerts
- DailyReportScheduler (email_monitoring.py): Background task that sends daily reports
How It Works
Error Alerts (Hybrid Approach)
The EmailAlertHandler uses a hybrid threshold + time-based approach:
- Immediate Alerts: If the error threshold is reached (e.g., 5 errors), an alert email is sent immediately
- Buffered Alerts: Otherwise, errors accumulate in a buffer and are sent after the buffer duration (e.g., 15 minutes)
- Cooldown Period: After sending an alert, the system waits for a cooldown period before sending another alert to prevent spam
Flow Diagram:
Error occurs
↓
Add to buffer
↓
Buffer >= threshold? ──Yes──> Send immediate alert
↓ No ↓
Wait for buffer time Reset buffer
↓ ↓
Send buffered alert Enter cooldown
↓
Reset buffer
Daily Reports
The DailyReportScheduler runs as a background task that:
- Waits until the configured send time (e.g., 8:00 AM)
- Collects statistics from the application
- Gathers errors that occurred during the day
- Formats and sends an email report
- Clears the error log
- Schedules the next report for the following day
Configuration
Email Configuration Keys
Add the following to your config.yaml:
email:
# SMTP server configuration
smtp:
host: "smtp.gmail.com" # Your SMTP server hostname
port: 587 # SMTP port (587 for TLS, 465 for SSL)
username: !secret EMAIL_USERNAME # SMTP username (use !secret for env vars)
password: !secret EMAIL_PASSWORD # SMTP password (use !secret for env vars)
use_tls: true # Use STARTTLS encryption
use_ssl: false # Use SSL/TLS from start (mutually exclusive with use_tls)
# Sender information
from_address: "noreply@99tales.com"
from_name: "AlpineBits Monitor"
# Monitoring and alerting
monitoring:
# Daily report configuration
daily_report:
enabled: true # Enable/disable daily reports
recipients:
- "admin@99tales.com"
- "dev@99tales.com"
send_time: "08:00" # Time to send (24h format, local time)
include_stats: true # Include application statistics
include_errors: true # Include error summary
# Error alert configuration
error_alerts:
enabled: true # Enable/disable error alerts
recipients:
- "alerts@99tales.com"
- "oncall@99tales.com"
error_threshold: 5 # Send immediate alert after N errors
buffer_minutes: 15 # Wait N minutes before sending buffered errors
cooldown_minutes: 15 # Wait N minutes before sending another alert
log_levels: # Log levels to monitor
- "ERROR"
- "CRITICAL"
Environment Variables
For security, store sensitive credentials in environment variables:
# Create a .env file (never commit this!)
EMAIL_USERNAME=your-smtp-username@gmail.com
EMAIL_PASSWORD=your-smtp-app-password
The annotatedyaml library automatically loads values marked with !secret from environment variables.
Gmail Configuration
If using Gmail, you need to:
- Enable 2-factor authentication on your Google account
- Generate an "App Password" for SMTP access
- Use the app password as
EMAIL_PASSWORD
Gmail Settings:
smtp:
host: "smtp.gmail.com"
port: 587
use_tls: true
use_ssl: false
Other SMTP Providers
SendGrid:
smtp:
host: "smtp.sendgrid.net"
port: 587
username: "apikey"
password: !secret SENDGRID_API_KEY
use_tls: true
AWS SES:
smtp:
host: "email-smtp.us-east-1.amazonaws.com"
port: 587
username: !secret AWS_SES_USERNAME
password: !secret AWS_SES_PASSWORD
use_tls: true
Usage
Automatic Error Monitoring
Once configured, the system automatically captures all ERROR and CRITICAL log messages:
from alpine_bits_python.logging_config import get_logger
_LOGGER = get_logger(__name__)
# This error will be captured and sent via email
_LOGGER.error("Database connection failed")
# This will also be captured
try:
risky_operation()
except Exception:
_LOGGER.exception("Operation failed") # Includes stack trace
Triggering Test Alerts
To test your email configuration, you can manually trigger errors:
import logging
_LOGGER = logging.getLogger(__name__)
# Generate multiple errors to trigger immediate alert (if threshold = 5)
for i in range(5):
_LOGGER.error(f"Test error {i + 1}")
Daily Report Statistics
To include custom statistics in daily reports, set a stats collector function:
async def collect_stats():
"""Collect application statistics for daily report."""
return {
"total_reservations": await count_reservations(),
"new_customers": await count_new_customers(),
"active_hotels": await count_active_hotels(),
"api_requests": get_request_count(),
}
# Register the collector
report_scheduler = app.state.report_scheduler
if report_scheduler:
report_scheduler.set_stats_collector(collect_stats)
Email Templates
Error Alert Email
Subject: 🚨 AlpineBits Error Alert: 5 errors (threshold exceeded)
Body:
Error Alert - 2025-10-15 14:30:45
======================================================================
Alert Type: Immediate Alert
Error Count: 5
Time Range: 14:25:00 to 14:30:00
Reason: (threshold of 5 exceeded)
======================================================================
Errors:
----------------------------------------------------------------------
[2025-10-15 14:25:12] ERROR: Database connection timeout
Module: db:245 (alpine_bits_python.db)
[2025-10-15 14:26:34] ERROR: Failed to process reservation
Module: api:567 (alpine_bits_python.api)
Exception:
Traceback (most recent call last):
...
----------------------------------------------------------------------
Generated by AlpineBits Email Monitoring at 2025-10-15 14:30:45
Daily Report Email
Subject: AlpineBits Daily Report - 2025-10-15
Body (HTML):
AlpineBits Daily Report
Date: 2025-10-15
Statistics
┌────────────────────────┬────────┐
│ Metric │ Value │
├────────────────────────┼────────┤
│ total_reservations │ 42 │
│ new_customers │ 15 │
│ active_hotels │ 4 │
│ api_requests │ 1,234 │
└────────────────────────┴────────┘
Errors (3)
┌──────────────┬──────────┬─────────────────────────┐
│ Time │ Level │ Message │
├──────────────┼──────────┼─────────────────────────┤
│ 08:15:23 │ ERROR │ Connection timeout │
│ 12:45:10 │ ERROR │ Invalid form data │
│ 18:30:00 │ CRITICAL │ Database unavailable │
└──────────────┴──────────┴─────────────────────────┘
Generated by AlpineBits Server
Monitoring and Troubleshooting
Check Email Configuration
from alpine_bits_python.email_service import create_email_service
from alpine_bits_python.config_loader import load_config
config = load_config()
email_service = create_email_service(config)
if email_service:
print("✓ Email service configured")
else:
print("✗ Email service not configured")
Test Email Sending
import asyncio
from alpine_bits_python.email_service import EmailService, EmailConfig
async def test_email():
config = EmailConfig({
"smtp": {
"host": "smtp.gmail.com",
"port": 587,
"username": "your-email@gmail.com",
"password": "your-app-password",
"use_tls": True,
},
"from_address": "sender@example.com",
"from_name": "Test",
})
service = EmailService(config)
result = await service.send_email(
recipients=["recipient@example.com"],
subject="Test Email",
body="This is a test email from AlpineBits server.",
)
if result:
print("✓ Email sent successfully")
else:
print("✗ Email sending failed")
asyncio.run(test_email())
Common Issues
Issue: "Authentication failed"
- Verify SMTP username and password are correct
- For Gmail, ensure you're using an App Password, not your regular password
- Check that 2FA is enabled on Gmail
Issue: "Connection timeout"
- Verify SMTP host and port are correct
- Check firewall rules allow outbound SMTP connections
- Try using port 465 with SSL instead of 587 with TLS
Issue: "No email alerts received"
- Check that
enabled: truein config - Verify recipient email addresses are correct
- Check application logs for email sending errors
- Ensure errors are being logged at ERROR or CRITICAL level
Issue: "Too many emails being sent"
- Increase
cooldown_minutesto reduce alert frequency - Increase
buffer_minutesto batch more errors together - Increase
error_thresholdto only alert on serious issues
Performance Considerations
SMTP is Blocking
Email sending uses the standard Python smtplib, which performs blocking I/O. To prevent blocking the async event loop:
- Email operations are automatically run in a thread pool executor
- This happens transparently via
loop.run_in_executor() - No performance impact on request handling
Memory Usage
- Error buffer size is limited by
buffer_minutesduration - Old errors are automatically cleared after sending
- Daily report error log is cleared after each report
- Typical memory usage: <1 MB for error buffering
Error Handling
- Email sending failures are logged but never crash the application
- If SMTP is unavailable, errors are logged to console/file as normal
- The logging handler has exception safety - it will never cause application failures
Security Considerations
-
Never commit credentials to git
- Use
!secretannotation in YAML - Store credentials in environment variables
- Add
.envto.gitignore
- Use
-
Use TLS/SSL encryption
- Always set
use_tls: trueoruse_ssl: true - Never send credentials in plaintext
- Always set
-
Limit email recipients
- Only send alerts to authorized personnel
- Use dedicated monitoring email addresses
- Consider using distribution lists
-
Sensitive data in logs
- Be careful not to log passwords, API keys, or PII
- Error messages in emails may contain sensitive context
- Review log messages before enabling email alerts
Testing
Run the test suite:
# Test email service only
uv run pytest tests/test_email_service.py -v
# Test with coverage
uv run pytest tests/test_email_service.py --cov=alpine_bits_python.email_service --cov=alpine_bits_python.email_monitoring
Future Enhancements
Potential improvements for future versions:
- Support for email templates (Jinja2)
- Configurable retry logic for failed sends
- Email queuing for high-volume scenarios
- Integration with external monitoring services (PagerDuty, Slack)
- Weekly/monthly report options
- Custom alert rules based on error patterns
- Email attachments for detailed logs
- HTML email styling improvements