424 lines
13 KiB
Markdown
424 lines
13 KiB
Markdown
# Email Monitoring and Alerting
|
|
|
|
This document describes the email monitoring and alerting system for the AlpineBits Python server.
|
|
|
|
## Overview
|
|
|
|
The email monitoring system provides two main features:
|
|
|
|
1. **Error Alerts**: Automatic email notifications when errors occur in the application
|
|
2. **Daily Reports**: Scheduled daily summary emails with statistics and error logs
|
|
|
|
## Architecture
|
|
|
|
### Components
|
|
|
|
- **EmailService** ([email_service.py](../src/alpine_bits_python/email_service.py)): Core SMTP email sending functionality
|
|
- **EmailAlertHandler** ([email_monitoring.py](../src/alpine_bits_python/email_monitoring.py)): Custom logging handler that captures errors and sends alerts
|
|
- **DailyReportScheduler** ([email_monitoring.py](../src/alpine_bits_python/email_monitoring.py)): Background task that sends daily reports
|
|
|
|
### How It Works
|
|
|
|
#### Error Alerts (Hybrid Approach)
|
|
|
|
The `EmailAlertHandler` uses a **hybrid threshold + time-based** approach:
|
|
|
|
1. **Immediate Alerts**: If the error threshold is reached (e.g., 5 errors), an alert email is sent immediately
|
|
2. **Buffered Alerts**: Otherwise, errors accumulate in a buffer and are sent after the buffer duration (e.g., 15 minutes)
|
|
3. **Cooldown Period**: After sending an alert, the system waits for a cooldown period before sending another alert to prevent spam
|
|
|
|
**Flow Diagram:**
|
|
```
|
|
Error occurs
|
|
↓
|
|
Add to buffer
|
|
↓
|
|
Buffer >= threshold? ──Yes──> Send immediate alert
|
|
↓ No ↓
|
|
Wait for buffer time Reset buffer
|
|
↓ ↓
|
|
Send buffered alert Enter cooldown
|
|
↓
|
|
Reset buffer
|
|
```
|
|
|
|
#### Daily Reports
|
|
|
|
The `DailyReportScheduler` runs as a background task that:
|
|
|
|
1. Waits until the configured send time (e.g., 8:00 AM)
|
|
2. Collects statistics from the application
|
|
3. Gathers errors that occurred during the day
|
|
4. Formats and sends an email report
|
|
5. Clears the error log
|
|
6. Schedules the next report for the following day
|
|
|
|
## Configuration
|
|
|
|
### Email Configuration Keys
|
|
|
|
Add the following to your [config.yaml](../config/config.yaml):
|
|
|
|
```yaml
|
|
email:
|
|
# SMTP server configuration
|
|
smtp:
|
|
host: "smtp.gmail.com" # Your SMTP server hostname
|
|
port: 587 # SMTP port (587 for TLS, 465 for SSL)
|
|
username: !secret EMAIL_USERNAME # SMTP username (use !secret for env vars)
|
|
password: !secret EMAIL_PASSWORD # SMTP password (use !secret for env vars)
|
|
use_tls: true # Use STARTTLS encryption
|
|
use_ssl: false # Use SSL/TLS from start (mutually exclusive with use_tls)
|
|
|
|
# Sender information
|
|
from_address: "noreply@99tales.com"
|
|
from_name: "AlpineBits Monitor"
|
|
|
|
# Monitoring and alerting
|
|
monitoring:
|
|
# Daily report configuration
|
|
daily_report:
|
|
enabled: true # Enable/disable daily reports
|
|
recipients:
|
|
- "admin@99tales.com"
|
|
- "dev@99tales.com"
|
|
send_time: "08:00" # Time to send (24h format, local time)
|
|
include_stats: true # Include application statistics
|
|
include_errors: true # Include error summary
|
|
|
|
# Error alert configuration
|
|
error_alerts:
|
|
enabled: true # Enable/disable error alerts
|
|
recipients:
|
|
- "alerts@99tales.com"
|
|
- "oncall@99tales.com"
|
|
error_threshold: 5 # Send immediate alert after N errors
|
|
buffer_minutes: 15 # Wait N minutes before sending buffered errors
|
|
cooldown_minutes: 15 # Wait N minutes before sending another alert
|
|
log_levels: # Log levels to monitor
|
|
- "ERROR"
|
|
- "CRITICAL"
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
For security, store sensitive credentials in environment variables:
|
|
|
|
```bash
|
|
# Create a .env file (never commit this!)
|
|
EMAIL_USERNAME=your-smtp-username@gmail.com
|
|
EMAIL_PASSWORD=your-smtp-app-password
|
|
```
|
|
|
|
The `annotatedyaml` library automatically loads values marked with `!secret` from environment variables.
|
|
|
|
### Gmail Configuration
|
|
|
|
If using Gmail, you need to:
|
|
|
|
1. Enable 2-factor authentication on your Google account
|
|
2. Generate an "App Password" for SMTP access
|
|
3. Use the app password as `EMAIL_PASSWORD`
|
|
|
|
**Gmail Settings:**
|
|
```yaml
|
|
smtp:
|
|
host: "smtp.gmail.com"
|
|
port: 587
|
|
use_tls: true
|
|
use_ssl: false
|
|
```
|
|
|
|
### Other SMTP Providers
|
|
|
|
**SendGrid:**
|
|
```yaml
|
|
smtp:
|
|
host: "smtp.sendgrid.net"
|
|
port: 587
|
|
username: "apikey"
|
|
password: !secret SENDGRID_API_KEY
|
|
use_tls: true
|
|
```
|
|
|
|
**AWS SES:**
|
|
```yaml
|
|
smtp:
|
|
host: "email-smtp.us-east-1.amazonaws.com"
|
|
port: 587
|
|
username: !secret AWS_SES_USERNAME
|
|
password: !secret AWS_SES_PASSWORD
|
|
use_tls: true
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Automatic Error Monitoring
|
|
|
|
Once configured, the system automatically captures all `ERROR` and `CRITICAL` log messages:
|
|
|
|
```python
|
|
from alpine_bits_python.logging_config import get_logger
|
|
|
|
_LOGGER = get_logger(__name__)
|
|
|
|
# This error will be captured and sent via email
|
|
_LOGGER.error("Database connection failed")
|
|
|
|
# This will also be captured
|
|
try:
|
|
risky_operation()
|
|
except Exception:
|
|
_LOGGER.exception("Operation failed") # Includes stack trace
|
|
```
|
|
|
|
### Triggering Test Alerts
|
|
|
|
To test your email configuration, you can manually trigger errors:
|
|
|
|
```python
|
|
import logging
|
|
|
|
_LOGGER = logging.getLogger(__name__)
|
|
|
|
# Generate multiple errors to trigger immediate alert (if threshold = 5)
|
|
for i in range(5):
|
|
_LOGGER.error(f"Test error {i + 1}")
|
|
```
|
|
|
|
### Daily Report Statistics
|
|
|
|
To include custom statistics in daily reports, set a stats collector function:
|
|
|
|
```python
|
|
async def collect_stats():
|
|
"""Collect application statistics for daily report."""
|
|
return {
|
|
"total_reservations": await count_reservations(),
|
|
"new_customers": await count_new_customers(),
|
|
"active_hotels": await count_active_hotels(),
|
|
"api_requests": get_request_count(),
|
|
}
|
|
|
|
# Register the collector
|
|
report_scheduler = app.state.report_scheduler
|
|
if report_scheduler:
|
|
report_scheduler.set_stats_collector(collect_stats)
|
|
```
|
|
|
|
## Email Templates
|
|
|
|
### Error Alert Email
|
|
|
|
**Subject:** 🚨 AlpineBits Error Alert: 5 errors (threshold exceeded)
|
|
|
|
**Body:**
|
|
```
|
|
Error Alert - 2025-10-15 14:30:45
|
|
======================================================================
|
|
|
|
Alert Type: Immediate Alert
|
|
Error Count: 5
|
|
Time Range: 14:25:00 to 14:30:00
|
|
Reason: (threshold of 5 exceeded)
|
|
|
|
======================================================================
|
|
|
|
Errors:
|
|
----------------------------------------------------------------------
|
|
|
|
[2025-10-15 14:25:12] ERROR: Database connection timeout
|
|
Module: db:245 (alpine_bits_python.db)
|
|
|
|
[2025-10-15 14:26:34] ERROR: Failed to process reservation
|
|
Module: api:567 (alpine_bits_python.api)
|
|
Exception:
|
|
Traceback (most recent call last):
|
|
...
|
|
|
|
----------------------------------------------------------------------
|
|
Generated by AlpineBits Email Monitoring at 2025-10-15 14:30:45
|
|
```
|
|
|
|
### Daily Report Email
|
|
|
|
**Subject:** AlpineBits Daily Report - 2025-10-15
|
|
|
|
**Body (HTML):**
|
|
```html
|
|
AlpineBits Daily Report
|
|
Date: 2025-10-15
|
|
|
|
Statistics
|
|
┌────────────────────────┬────────┐
|
|
│ Metric │ Value │
|
|
├────────────────────────┼────────┤
|
|
│ total_reservations │ 42 │
|
|
│ new_customers │ 15 │
|
|
│ active_hotels │ 4 │
|
|
│ api_requests │ 1,234 │
|
|
└────────────────────────┴────────┘
|
|
|
|
Errors (3)
|
|
┌──────────────┬──────────┬─────────────────────────┐
|
|
│ Time │ Level │ Message │
|
|
├──────────────┼──────────┼─────────────────────────┤
|
|
│ 08:15:23 │ ERROR │ Connection timeout │
|
|
│ 12:45:10 │ ERROR │ Invalid form data │
|
|
│ 18:30:00 │ CRITICAL │ Database unavailable │
|
|
└──────────────┴──────────┴─────────────────────────┘
|
|
|
|
Generated by AlpineBits Server
|
|
```
|
|
|
|
## Monitoring and Troubleshooting
|
|
|
|
### Check Email Configuration
|
|
|
|
```python
|
|
from alpine_bits_python.email_service import create_email_service
|
|
from alpine_bits_python.config_loader import load_config
|
|
|
|
config = load_config()
|
|
email_service = create_email_service(config)
|
|
|
|
if email_service:
|
|
print("✓ Email service configured")
|
|
else:
|
|
print("✗ Email service not configured")
|
|
```
|
|
|
|
### Test Email Sending
|
|
|
|
```python
|
|
import asyncio
|
|
from alpine_bits_python.email_service import EmailService, EmailConfig
|
|
|
|
async def test_email():
|
|
config = EmailConfig({
|
|
"smtp": {
|
|
"host": "smtp.gmail.com",
|
|
"port": 587,
|
|
"username": "your-email@gmail.com",
|
|
"password": "your-app-password",
|
|
"use_tls": True,
|
|
},
|
|
"from_address": "sender@example.com",
|
|
"from_name": "Test",
|
|
})
|
|
|
|
service = EmailService(config)
|
|
|
|
result = await service.send_email(
|
|
recipients=["recipient@example.com"],
|
|
subject="Test Email",
|
|
body="This is a test email from AlpineBits server.",
|
|
)
|
|
|
|
if result:
|
|
print("✓ Email sent successfully")
|
|
else:
|
|
print("✗ Email sending failed")
|
|
|
|
asyncio.run(test_email())
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
**Issue: "Authentication failed"**
|
|
- Verify SMTP username and password are correct
|
|
- For Gmail, ensure you're using an App Password, not your regular password
|
|
- Check that 2FA is enabled on Gmail
|
|
|
|
**Issue: "Connection timeout"**
|
|
- Verify SMTP host and port are correct
|
|
- Check firewall rules allow outbound SMTP connections
|
|
- Try using port 465 with SSL instead of 587 with TLS
|
|
|
|
**Issue: "No email alerts received"**
|
|
- Check that `enabled: true` in config
|
|
- Verify recipient email addresses are correct
|
|
- Check application logs for email sending errors
|
|
- Ensure errors are being logged at ERROR or CRITICAL level
|
|
|
|
**Issue: "Too many emails being sent"**
|
|
- Increase `cooldown_minutes` to reduce alert frequency
|
|
- Increase `buffer_minutes` to batch more errors together
|
|
- Increase `error_threshold` to only alert on serious issues
|
|
|
|
## Performance Considerations
|
|
|
|
### SMTP is Blocking
|
|
|
|
Email sending uses the standard Python `smtplib`, which performs blocking I/O. To prevent blocking the async event loop:
|
|
|
|
- Email operations are automatically run in a thread pool executor
|
|
- This happens transparently via `loop.run_in_executor()`
|
|
- No performance impact on request handling
|
|
|
|
### Memory Usage
|
|
|
|
- Error buffer size is limited by `buffer_minutes` duration
|
|
- Old errors are automatically cleared after sending
|
|
- Daily report error log is cleared after each report
|
|
- Typical memory usage: <1 MB for error buffering
|
|
|
|
### Error Handling
|
|
|
|
- Email sending failures are logged but never crash the application
|
|
- If SMTP is unavailable, errors are logged to console/file as normal
|
|
- The logging handler has exception safety - it will never cause application failures
|
|
|
|
## Security Considerations
|
|
|
|
1. **Never commit credentials to git**
|
|
- Use `!secret` annotation in YAML
|
|
- Store credentials in environment variables
|
|
- Add `.env` to `.gitignore`
|
|
|
|
2. **Use TLS/SSL encryption**
|
|
- Always set `use_tls: true` or `use_ssl: true`
|
|
- Never send credentials in plaintext
|
|
|
|
3. **Limit email recipients**
|
|
- Only send alerts to authorized personnel
|
|
- Use dedicated monitoring email addresses
|
|
- Consider using distribution lists
|
|
|
|
4. **Sensitive data in logs**
|
|
- Be careful not to log passwords, API keys, or PII
|
|
- Error messages in emails may contain sensitive context
|
|
- Review log messages before enabling email alerts
|
|
|
|
## Testing
|
|
|
|
Run the test suite:
|
|
|
|
```bash
|
|
# Test email service only
|
|
uv run pytest tests/test_email_service.py -v
|
|
|
|
# Test with coverage
|
|
uv run pytest tests/test_email_service.py --cov=alpine_bits_python.email_service --cov=alpine_bits_python.email_monitoring
|
|
```
|
|
|
|
## Future Enhancements
|
|
|
|
Potential improvements for future versions:
|
|
|
|
- [ ] Support for email templates (Jinja2)
|
|
- [ ] Configurable retry logic for failed sends
|
|
- [ ] Email queuing for high-volume scenarios
|
|
- [ ] Integration with external monitoring services (PagerDuty, Slack)
|
|
- [ ] Weekly/monthly report options
|
|
- [ ] Custom alert rules based on error patterns
|
|
- [ ] Email attachments for detailed logs
|
|
- [ ] HTML email styling improvements
|
|
|
|
## References
|
|
|
|
- [Python smtplib Documentation](https://docs.python.org/3/library/smtplib.html)
|
|
- [Python logging Documentation](https://docs.python.org/3/library/logging.html)
|
|
- [Gmail SMTP Settings](https://support.google.com/mail/answer/7126229)
|
|
- [annotatedyaml Documentation](https://github.com/yourusername/annotatedyaml)
|