Added email monitoring
This commit is contained in:
423
docs/EMAIL_MONITORING.md
Normal file
423
docs/EMAIL_MONITORING.md
Normal file
@@ -0,0 +1,423 @@
|
||||
# Email Monitoring and Alerting
|
||||
|
||||
This document describes the email monitoring and alerting system for the AlpineBits Python server.
|
||||
|
||||
## Overview
|
||||
|
||||
The email monitoring system provides two main features:
|
||||
|
||||
1. **Error Alerts**: Automatic email notifications when errors occur in the application
|
||||
2. **Daily Reports**: Scheduled daily summary emails with statistics and error logs
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
- **EmailService** ([email_service.py](../src/alpine_bits_python/email_service.py)): Core SMTP email sending functionality
|
||||
- **EmailAlertHandler** ([email_monitoring.py](../src/alpine_bits_python/email_monitoring.py)): Custom logging handler that captures errors and sends alerts
|
||||
- **DailyReportScheduler** ([email_monitoring.py](../src/alpine_bits_python/email_monitoring.py)): Background task that sends daily reports
|
||||
|
||||
### How It Works
|
||||
|
||||
#### Error Alerts (Hybrid Approach)
|
||||
|
||||
The `EmailAlertHandler` uses a **hybrid threshold + time-based** approach:
|
||||
|
||||
1. **Immediate Alerts**: If the error threshold is reached (e.g., 5 errors), an alert email is sent immediately
|
||||
2. **Buffered Alerts**: Otherwise, errors accumulate in a buffer and are sent after the buffer duration (e.g., 15 minutes)
|
||||
3. **Cooldown Period**: After sending an alert, the system waits for a cooldown period before sending another alert to prevent spam
|
||||
|
||||
**Flow Diagram:**
|
||||
```
|
||||
Error occurs
|
||||
↓
|
||||
Add to buffer
|
||||
↓
|
||||
Buffer >= threshold? ──Yes──> Send immediate alert
|
||||
↓ No ↓
|
||||
Wait for buffer time Reset buffer
|
||||
↓ ↓
|
||||
Send buffered alert Enter cooldown
|
||||
↓
|
||||
Reset buffer
|
||||
```
|
||||
|
||||
#### Daily Reports
|
||||
|
||||
The `DailyReportScheduler` runs as a background task that:
|
||||
|
||||
1. Waits until the configured send time (e.g., 8:00 AM)
|
||||
2. Collects statistics from the application
|
||||
3. Gathers errors that occurred during the day
|
||||
4. Formats and sends an email report
|
||||
5. Clears the error log
|
||||
6. Schedules the next report for the following day
|
||||
|
||||
## Configuration
|
||||
|
||||
### Email Configuration Keys
|
||||
|
||||
Add the following to your [config.yaml](../config/config.yaml):
|
||||
|
||||
```yaml
|
||||
email:
|
||||
# SMTP server configuration
|
||||
smtp:
|
||||
host: "smtp.gmail.com" # Your SMTP server hostname
|
||||
port: 587 # SMTP port (587 for TLS, 465 for SSL)
|
||||
username: !secret EMAIL_USERNAME # SMTP username (use !secret for env vars)
|
||||
password: !secret EMAIL_PASSWORD # SMTP password (use !secret for env vars)
|
||||
use_tls: true # Use STARTTLS encryption
|
||||
use_ssl: false # Use SSL/TLS from start (mutually exclusive with use_tls)
|
||||
|
||||
# Sender information
|
||||
from_address: "noreply@99tales.com"
|
||||
from_name: "AlpineBits Monitor"
|
||||
|
||||
# Monitoring and alerting
|
||||
monitoring:
|
||||
# Daily report configuration
|
||||
daily_report:
|
||||
enabled: true # Enable/disable daily reports
|
||||
recipients:
|
||||
- "admin@99tales.com"
|
||||
- "dev@99tales.com"
|
||||
send_time: "08:00" # Time to send (24h format, local time)
|
||||
include_stats: true # Include application statistics
|
||||
include_errors: true # Include error summary
|
||||
|
||||
# Error alert configuration
|
||||
error_alerts:
|
||||
enabled: true # Enable/disable error alerts
|
||||
recipients:
|
||||
- "alerts@99tales.com"
|
||||
- "oncall@99tales.com"
|
||||
error_threshold: 5 # Send immediate alert after N errors
|
||||
buffer_minutes: 15 # Wait N minutes before sending buffered errors
|
||||
cooldown_minutes: 15 # Wait N minutes before sending another alert
|
||||
log_levels: # Log levels to monitor
|
||||
- "ERROR"
|
||||
- "CRITICAL"
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
For security, store sensitive credentials in environment variables:
|
||||
|
||||
```bash
|
||||
# Create a .env file (never commit this!)
|
||||
EMAIL_USERNAME=your-smtp-username@gmail.com
|
||||
EMAIL_PASSWORD=your-smtp-app-password
|
||||
```
|
||||
|
||||
The `annotatedyaml` library automatically loads values marked with `!secret` from environment variables.
|
||||
|
||||
### Gmail Configuration
|
||||
|
||||
If using Gmail, you need to:
|
||||
|
||||
1. Enable 2-factor authentication on your Google account
|
||||
2. Generate an "App Password" for SMTP access
|
||||
3. Use the app password as `EMAIL_PASSWORD`
|
||||
|
||||
**Gmail Settings:**
|
||||
```yaml
|
||||
smtp:
|
||||
host: "smtp.gmail.com"
|
||||
port: 587
|
||||
use_tls: true
|
||||
use_ssl: false
|
||||
```
|
||||
|
||||
### Other SMTP Providers
|
||||
|
||||
**SendGrid:**
|
||||
```yaml
|
||||
smtp:
|
||||
host: "smtp.sendgrid.net"
|
||||
port: 587
|
||||
username: "apikey"
|
||||
password: !secret SENDGRID_API_KEY
|
||||
use_tls: true
|
||||
```
|
||||
|
||||
**AWS SES:**
|
||||
```yaml
|
||||
smtp:
|
||||
host: "email-smtp.us-east-1.amazonaws.com"
|
||||
port: 587
|
||||
username: !secret AWS_SES_USERNAME
|
||||
password: !secret AWS_SES_PASSWORD
|
||||
use_tls: true
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Automatic Error Monitoring
|
||||
|
||||
Once configured, the system automatically captures all `ERROR` and `CRITICAL` log messages:
|
||||
|
||||
```python
|
||||
from alpine_bits_python.logging_config import get_logger
|
||||
|
||||
_LOGGER = get_logger(__name__)
|
||||
|
||||
# This error will be captured and sent via email
|
||||
_LOGGER.error("Database connection failed")
|
||||
|
||||
# This will also be captured
|
||||
try:
|
||||
risky_operation()
|
||||
except Exception:
|
||||
_LOGGER.exception("Operation failed") # Includes stack trace
|
||||
```
|
||||
|
||||
### Triggering Test Alerts
|
||||
|
||||
To test your email configuration, you can manually trigger errors:
|
||||
|
||||
```python
|
||||
import logging
|
||||
|
||||
_LOGGER = logging.getLogger(__name__)
|
||||
|
||||
# Generate multiple errors to trigger immediate alert (if threshold = 5)
|
||||
for i in range(5):
|
||||
_LOGGER.error(f"Test error {i + 1}")
|
||||
```
|
||||
|
||||
### Daily Report Statistics
|
||||
|
||||
To include custom statistics in daily reports, set a stats collector function:
|
||||
|
||||
```python
|
||||
async def collect_stats():
|
||||
"""Collect application statistics for daily report."""
|
||||
return {
|
||||
"total_reservations": await count_reservations(),
|
||||
"new_customers": await count_new_customers(),
|
||||
"active_hotels": await count_active_hotels(),
|
||||
"api_requests": get_request_count(),
|
||||
}
|
||||
|
||||
# Register the collector
|
||||
report_scheduler = app.state.report_scheduler
|
||||
if report_scheduler:
|
||||
report_scheduler.set_stats_collector(collect_stats)
|
||||
```
|
||||
|
||||
## Email Templates
|
||||
|
||||
### Error Alert Email
|
||||
|
||||
**Subject:** 🚨 AlpineBits Error Alert: 5 errors (threshold exceeded)
|
||||
|
||||
**Body:**
|
||||
```
|
||||
Error Alert - 2025-10-15 14:30:45
|
||||
======================================================================
|
||||
|
||||
Alert Type: Immediate Alert
|
||||
Error Count: 5
|
||||
Time Range: 14:25:00 to 14:30:00
|
||||
Reason: (threshold of 5 exceeded)
|
||||
|
||||
======================================================================
|
||||
|
||||
Errors:
|
||||
----------------------------------------------------------------------
|
||||
|
||||
[2025-10-15 14:25:12] ERROR: Database connection timeout
|
||||
Module: db:245 (alpine_bits_python.db)
|
||||
|
||||
[2025-10-15 14:26:34] ERROR: Failed to process reservation
|
||||
Module: api:567 (alpine_bits_python.api)
|
||||
Exception:
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
|
||||
----------------------------------------------------------------------
|
||||
Generated by AlpineBits Email Monitoring at 2025-10-15 14:30:45
|
||||
```
|
||||
|
||||
### Daily Report Email
|
||||
|
||||
**Subject:** AlpineBits Daily Report - 2025-10-15
|
||||
|
||||
**Body (HTML):**
|
||||
```html
|
||||
AlpineBits Daily Report
|
||||
Date: 2025-10-15
|
||||
|
||||
Statistics
|
||||
┌────────────────────────┬────────┐
|
||||
│ Metric │ Value │
|
||||
├────────────────────────┼────────┤
|
||||
│ total_reservations │ 42 │
|
||||
│ new_customers │ 15 │
|
||||
│ active_hotels │ 4 │
|
||||
│ api_requests │ 1,234 │
|
||||
└────────────────────────┴────────┘
|
||||
|
||||
Errors (3)
|
||||
┌──────────────┬──────────┬─────────────────────────┐
|
||||
│ Time │ Level │ Message │
|
||||
├──────────────┼──────────┼─────────────────────────┤
|
||||
│ 08:15:23 │ ERROR │ Connection timeout │
|
||||
│ 12:45:10 │ ERROR │ Invalid form data │
|
||||
│ 18:30:00 │ CRITICAL │ Database unavailable │
|
||||
└──────────────┴──────────┴─────────────────────────┘
|
||||
|
||||
Generated by AlpineBits Server
|
||||
```
|
||||
|
||||
## Monitoring and Troubleshooting
|
||||
|
||||
### Check Email Configuration
|
||||
|
||||
```python
|
||||
from alpine_bits_python.email_service import create_email_service
|
||||
from alpine_bits_python.config_loader import load_config
|
||||
|
||||
config = load_config()
|
||||
email_service = create_email_service(config)
|
||||
|
||||
if email_service:
|
||||
print("✓ Email service configured")
|
||||
else:
|
||||
print("✗ Email service not configured")
|
||||
```
|
||||
|
||||
### Test Email Sending
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from alpine_bits_python.email_service import EmailService, EmailConfig
|
||||
|
||||
async def test_email():
|
||||
config = EmailConfig({
|
||||
"smtp": {
|
||||
"host": "smtp.gmail.com",
|
||||
"port": 587,
|
||||
"username": "your-email@gmail.com",
|
||||
"password": "your-app-password",
|
||||
"use_tls": True,
|
||||
},
|
||||
"from_address": "sender@example.com",
|
||||
"from_name": "Test",
|
||||
})
|
||||
|
||||
service = EmailService(config)
|
||||
|
||||
result = await service.send_email(
|
||||
recipients=["recipient@example.com"],
|
||||
subject="Test Email",
|
||||
body="This is a test email from AlpineBits server.",
|
||||
)
|
||||
|
||||
if result:
|
||||
print("✓ Email sent successfully")
|
||||
else:
|
||||
print("✗ Email sending failed")
|
||||
|
||||
asyncio.run(test_email())
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Issue: "Authentication failed"**
|
||||
- Verify SMTP username and password are correct
|
||||
- For Gmail, ensure you're using an App Password, not your regular password
|
||||
- Check that 2FA is enabled on Gmail
|
||||
|
||||
**Issue: "Connection timeout"**
|
||||
- Verify SMTP host and port are correct
|
||||
- Check firewall rules allow outbound SMTP connections
|
||||
- Try using port 465 with SSL instead of 587 with TLS
|
||||
|
||||
**Issue: "No email alerts received"**
|
||||
- Check that `enabled: true` in config
|
||||
- Verify recipient email addresses are correct
|
||||
- Check application logs for email sending errors
|
||||
- Ensure errors are being logged at ERROR or CRITICAL level
|
||||
|
||||
**Issue: "Too many emails being sent"**
|
||||
- Increase `cooldown_minutes` to reduce alert frequency
|
||||
- Increase `buffer_minutes` to batch more errors together
|
||||
- Increase `error_threshold` to only alert on serious issues
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### SMTP is Blocking
|
||||
|
||||
Email sending uses the standard Python `smtplib`, which performs blocking I/O. To prevent blocking the async event loop:
|
||||
|
||||
- Email operations are automatically run in a thread pool executor
|
||||
- This happens transparently via `loop.run_in_executor()`
|
||||
- No performance impact on request handling
|
||||
|
||||
### Memory Usage
|
||||
|
||||
- Error buffer size is limited by `buffer_minutes` duration
|
||||
- Old errors are automatically cleared after sending
|
||||
- Daily report error log is cleared after each report
|
||||
- Typical memory usage: <1 MB for error buffering
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Email sending failures are logged but never crash the application
|
||||
- If SMTP is unavailable, errors are logged to console/file as normal
|
||||
- The logging handler has exception safety - it will never cause application failures
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Never commit credentials to git**
|
||||
- Use `!secret` annotation in YAML
|
||||
- Store credentials in environment variables
|
||||
- Add `.env` to `.gitignore`
|
||||
|
||||
2. **Use TLS/SSL encryption**
|
||||
- Always set `use_tls: true` or `use_ssl: true`
|
||||
- Never send credentials in plaintext
|
||||
|
||||
3. **Limit email recipients**
|
||||
- Only send alerts to authorized personnel
|
||||
- Use dedicated monitoring email addresses
|
||||
- Consider using distribution lists
|
||||
|
||||
4. **Sensitive data in logs**
|
||||
- Be careful not to log passwords, API keys, or PII
|
||||
- Error messages in emails may contain sensitive context
|
||||
- Review log messages before enabling email alerts
|
||||
|
||||
## Testing
|
||||
|
||||
Run the test suite:
|
||||
|
||||
```bash
|
||||
# Test email service only
|
||||
uv run pytest tests/test_email_service.py -v
|
||||
|
||||
# Test with coverage
|
||||
uv run pytest tests/test_email_service.py --cov=alpine_bits_python.email_service --cov=alpine_bits_python.email_monitoring
|
||||
```
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Potential improvements for future versions:
|
||||
|
||||
- [ ] Support for email templates (Jinja2)
|
||||
- [ ] Configurable retry logic for failed sends
|
||||
- [ ] Email queuing for high-volume scenarios
|
||||
- [ ] Integration with external monitoring services (PagerDuty, Slack)
|
||||
- [ ] Weekly/monthly report options
|
||||
- [ ] Custom alert rules based on error patterns
|
||||
- [ ] Email attachments for detailed logs
|
||||
- [ ] HTML email styling improvements
|
||||
|
||||
## References
|
||||
|
||||
- [Python smtplib Documentation](https://docs.python.org/3/library/smtplib.html)
|
||||
- [Python logging Documentation](https://docs.python.org/3/library/logging.html)
|
||||
- [Gmail SMTP Settings](https://support.google.com/mail/answer/7126229)
|
||||
- [annotatedyaml Documentation](https://github.com/yourusername/annotatedyaml)
|
||||
Reference in New Issue
Block a user