Database
MenoTime uses Amazon RDS PostgreSQL as the primary relational database for patient records, clinical data, appointments, and platform state. This document covers database configuration, connection management, backup strategy, and migration procedures.
RDS PostgreSQL Configuration
Instance Specifications
All three environments (Dev, Staging, Production) use the same instance class to ensure parity and realistic staging testing.
| Configuration | Value |
|---|---|
| Engine | PostgreSQL 15.x |
| Instance Class | db.m7g.large |
| vCPU | 2 cores |
| Memory | 8 GB RAM |
| Storage Type | gp3 (General Purpose SSD) |
| Storage Size | Dev: 100GB, Staging: 500GB, Prod: 1TB |
| Storage Auto-Scaling | Enabled (max 20% beyond allocated) |
| IOPS | 3,000 baseline (configurable) |
| Throughput | 125 MB/s baseline (configurable) |
| Region | us-west-1 (N. California) |
| Availability | Single-AZ (Multi-AZ upgrade available) |
Why db.m7g.large?
Capacity Analysis: - vCPU: 2 cores sufficient for ~500+ concurrent connections - Memory: 8 GB provides 25% overhead for caching and working set - Network: Up to 10 Gbps burst capability - Cost: ~$460/month per instance (3 environments = ~$1,380 baseline) - Performance: Burstable CPU (good for variable workloads)
Upgrade Path:
- If production patient volume exceeds 5,000: upgrade to db.m7g.xlarge (4 vCPU, 16 GB)
- Multi-AZ recommended for production at scale (adds 50% cost)
Network Configuration
Subnet Group: menotime-db-subnet-group
- Spans private subnets in us-west-1a and us-west-1b
- No public internet access (secure by default)
Security Group: menotime-db-sg
Inbound Rules:
- PostgreSQL (5432) from menotime-ecs-sg (ECS security group)
- PostgreSQL (5432) from menotime-bastion-sg (bastion host, if used)
Outbound Rules:
- Deny all (RDS doesn't initiate outbound connections)
Encryption:
- At Rest: KMS-managed encryption (customer-managed key alias/menotime-master)
- In Transit: SSL/TLS required for all connections
- Connection String: postgresql://user:password@host:5432/menotime_prod?sslmode=require
Parameter Group
Name: menotime-postgresql-15-params
Key Parameters:
max_connections = 500
shared_buffers = 2GB (25% of total memory)
effective_cache_size = 6GB (75% of total memory)
maintenance_work_mem = 512MB
work_mem = 8MB (shared_buffers / max_connections * 10)
wal_buffers = 16MB
min_wal_size = 2GB
max_wal_size = 8GB
checkpoint_completion_target = 0.9
log_min_duration_statement = 1000 (log queries >1s)
log_connections = on
log_disconnections = on
log_statement = 'all' (development), 'mod' (production - log modifications only)
log_duration = on
shared_preload_libraries = 'pgaudit,pg_stat_statements'
pgAudit Configuration (HIPAA compliance):
pgaudit.log = 'ALL'
pgaudit.log_level = 'LOG'
pgaudit.role = 'audit'
Logs all database activity including:
- SELECT (with pgaudit.log = 'ALL')
- INSERT, UPDATE, DELETE
- DDL (CREATE, ALTER, DROP)
- User authentication
Maintenance Windows
Backup Window: 03:00 UTC daily (10:00 PM PDT) - Automated snapshots retained for 7 days - No downtime (backups taken from replica for Multi-AZ)
Maintenance Window: Sunday 04:00-05:00 UTC (9:00-10:00 PM PDT) - Applied automatically - Minor version upgrades (e.g., 15.2 → 15.3) - Security patches applied with minimal downtime - Coordinate with on-call engineer for production
Backup Strategy
Automated Backups: - Retention: 7 days - Frequency: Daily at 03:00 UTC - Location: AWS-managed (us-west-1 region) - RPO (Recovery Point Objective): 24 hours - RTO (Recovery Time Objective): 5-10 minutes (restore from backup)
Point-in-Time Recovery: - Enabled (leverages transaction logs) - Can restore to any point within the 7-day retention window - Useful for accidental data deletion recovery
Manual Snapshots:
- Created before major application changes
- Created before environment promotions (for easy rollback)
- Command:
bash
aws rds create-db-snapshot \
--db-instance-identifier menotime-prod \
--db-snapshot-identifier menotime-prod-pre-v1.2.3-20240201
Snapshot Lifecycle:
Automated Backup (7 days) → Manual Snapshot (indefinite)
↓
Copy to menotime-backups S3
↓
Lifecycle: 30 days
(Move to Glacier)
Copy to S3: - Automated Lambda function triggers nightly - Creates RDS snapshot export to S3 in Parquet format - Retention: 2 years in Glacier (lowest cost archival) - Use case: Long-term compliance, historical analysis
Database Schema & Migrations
Schema Overview
Primary Tables:
- users — Patient accounts and clinic staff
- patients — Detailed patient profiles with HIPAA PII
- appointments — Clinic visits and consultations
- clinical_notes — Medical records and observations
- medications — Prescribed treatments
- measurements — Vital signs and symptom tracking
- audit_log — All data access and modifications (HIPAA)
Migration Tool: Alembic
Purpose: Database schema versioning and migration management
Location: /migrations/ directory (in application repository)
Structure:
migrations/
├── alembic.ini
├── env.py
├── script.py.mako
└── versions/
├── 001_initial_schema.py
├── 002_add_patient_preferences.py
├── 003_add_audit_logging.py
└── ...
Migration Naming Convention:
{sequence}_{short_description}.py
001_initial_schema.py
002_add_patient_preferences.py
003_add_audit_logging.py (auto-incrementing)
Migration Workflow
Before Deployment:
-
Create Migration:
bash alembic revision --autogenerate -m "Add patient preferences table" -
Test Migration (local & dev):
bash alembic upgrade head -
Test Rollback:
bash alembic downgrade -1 alembic upgrade head -
Include in Commit:
- Migration files in
/migrations/versions/ - Application code using new schema
- Database tests validating migration
During Deployment:
-
ECS Task Definition: Includes migration step
bash # In container startup alembic upgrade head gunicorn main:app -
Health Check: Waits for migration completion before running app
-
Timeout: 60 seconds for migration (increase for large migrations)
Post-Deployment:
-
Validation: Schema version in
alembic_versiontable matches migration number -
Rollback (if needed):
bash alembic downgrade {previous_revision}
Example Migration File
# migrations/versions/003_add_audit_logging.py
from alembic import op
import sqlalchemy as sa
revision = '003'
down_revision = '002'
branch_labels = None
depends_on = None
def upgrade():
op.create_table(
'audit_log',
sa.Column('id', sa.Integer, primary_key=True),
sa.Column('user_id', sa.Integer, sa.ForeignKey('users.id')),
sa.Column('table_name', sa.String(255)),
sa.Column('action', sa.String(10)), # INSERT, UPDATE, DELETE
sa.Column('old_values', sa.JSON),
sa.Column('new_values', sa.JSON),
sa.Column('timestamp', sa.DateTime, default=sa.func.now()),
)
op.create_index('ix_audit_log_timestamp', 'audit_log', ['timestamp'])
def downgrade():
op.drop_table('audit_log')
Connection Management
Connection Strings by Environment
Development:
postgresql://menotime_user:${DB_PASSWORD}@menotime-dev.xxxxx.us-west-1.rds.amazonaws.com:5432/menotime_dev?sslmode=require
Staging:
postgresql://menotime_user:${DB_PASSWORD}@menotime-staging.xxxxx.us-west-1.rds.amazonaws.com:5432/menotime_staging?sslmode=require
Production:
postgresql://menotime_user:${DB_PASSWORD}@menotime-prod.xxxxx.us-west-1.rds.amazonaws.com:5432/menotime_prod?sslmode=require
These connection strings use <environment> placeholders for the instance identifier.
Connection Pool Configuration
Application: FastAPI + SQLAlchemy
Pool Settings (in main.py):
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine(
DATABASE_URL,
poolclass=QueuePool,
pool_size=10, # Number of connections to keep in pool
max_overflow=20, # Additional connections if needed
pool_pre_ping=True, # Verify connection before using
pool_recycle=3600, # Recycle connections every hour
echo_pool=False, # Disable pool logging (verbose)
)
Pool Monitoring:
- Monitor via CloudWatch: DatabaseConnections metric
- Alert if connections exceed 400 (80% of 500 max)
- Idle connections recycled after 1 hour
Direct Database Access (for operations)
Via AWS Console:
1. RDS Dashboard → Databases → menotime-prod
2. Click "Create database proxy" for connection pooling (optional)
3. Database cannot be publicly accessed; use AWS Secrets Manager for credentials
Via Bastion Host (recommended for production access):
# SSH to bastion (jump host in public subnet)
ssh ec2-user@bastion.menotime.ai
# Connect to database
psql -h `menotime-prod.xxxxx.us-west-1.rds.amazonaws.com` -U menotime_user -d menotime_prod
Via AWS Systems Manager Session Manager (zero-trust):
aws ssm start-session --target `i-xxxxxxxxx`
# Inside session:
psql -h `menotime-prod.xxxxx.us-west-1.rds.amazonaws.com` -U menotime_user -d menotime_prod
Credentials from Secrets Manager:
aws secretsmanager get-secret-value \
--secret-id menotime/db/prod \
--query SecretString \
--output text | jq .
Performance Insights & Monitoring
Enable Performance Insights
Already enabled in staging and production.
What it tracks: - Active session analytics (database load over time) - SQL queries and execution plans - Wait events (where database time is spent) - Resource utilization (CPU, I/O, lock contention)
CloudWatch Metrics (Primary Monitoring)
| Metric | Threshold | Action |
|---|---|---|
| DatabaseConnections | >400 | Investigate connection pool; consider scale-up |
| CPUUtilization | >75% sustained | Scale up to db.m7g.xlarge or optimize queries |
| DatabaseMemoryUsagePercentage | >85% | Increase shared_buffers or reduce cache size |
| DiskQueueDepth | >10 | Investigate slow queries; increase IOPS |
| ReadLatency | >5ms sustained | Check I/O utilization; consider provisioned IOPS |
| WriteLatency | >10ms sustained | Check WAL activity; increase max_wal_size |
| BinLogDiskUsage | >80GB | Increase log retention or archival |
| ReplicaLag | >1s | Not applicable (Single-AZ currently) |
Performance Insights Queries
Top Queries by Load (sample):
SELECT
query_hash,
query_text,
calls,
total_time,
mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;
Identifying Lock Contention:
SELECT
pid,
usename,
query_start,
state_change,
query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY query_start;
Table Sizes (to identify bloat):
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
Scaling & Capacity Planning
Current Capacity
db.m7g.large supports: - ~500 active connections - ~50,000 transactions/second (OLTP workload) - ~1TB of data (expandable)
Growth Milestones
| Patient Volume | Recommendations | Action |
|---|---|---|
| \<500 | Current config (db.m7g.large) | Monitor CPU/memory |
| 500-2,000 | Consider Multi-AZ (high availability) | Enable Multi-AZ in production |
| 2,000-5,000 | Monitor closely; prepare upgrade plan | Test upgrade to db.m7g.xlarge in staging |
| >5,000 | Upgrade to db.m7g.xlarge or larger | Implement upgrade during maintenance |
Upgrade Path
Scaling Up (no downtime with Multi-AZ): 1. Upgrade RDS instance class in AWS Console 2. If Multi-AZ: upgrade applied to standby first, then failover (30-60 sec downtime) 3. If Single-AZ: downtime during upgrade (1-2 minutes) 4. Monitor for 30 minutes post-upgrade
Command:
aws rds modify-db-instance \
--db-instance-identifier menotime-prod \
--db-instance-class db.m7g.xlarge \
--apply-immediately # or --no-apply-immediately for maintenance window
Multi-AZ Migration (Recommended for Production)
Current: Single-AZ (cost: baseline) Multi-AZ: Synchronous replication across availability zones (cost: +50%)
Benefits: - Automatic failover if AZ fails (RTO: 60-120 seconds) - Backups taken from standby (no performance impact) - Planned maintenance with minimal downtime
Process:
aws rds modify-db-instance \
--db-instance-identifier menotime-prod \
--multi-az \
--apply-immediately
Backup & Disaster Recovery
Backup Verification Checklist
Weekly Task:
□ Verify automated backup completed (CloudWatch Backup dashboard)
□ Test restore from backup to staging (every month)
□ Check backup size and retention policy
□ Review S3 snapshot copy status
Disaster Recovery Scenarios
Scenario 1: Accidental Data Deletion
Recovery time: \<5 minutes
# 1. Identify the time of deletion
# 2. Restore to point-in-time (before deletion)
aws rds restore-db-instance-to-point-in-time \
--source-db-instance-identifier menotime-prod \
--target-db-instance-identifier menotime-prod-recovery \
--restore-time 2024-02-01T10:00:00Z \
--use-latest-restorable-time
# 3. Verify data restored correctly
# 4. Promote recovery instance to primary (DNS cutover)
# 5. Monitor for 30 minutes
Scenario 2: RDS Instance Failure
Recovery time: \<2 minutes (Multi-AZ) or \<10 minutes (restore from backup)
With Multi-AZ: - Automatic failover to standby replica - No manual intervention required - DNS updated automatically
Without Multi-AZ: - Restore from latest automated backup - May lose up to 24 hours of data - Consider enabling Multi-AZ immediately
Scenario 3: Entire Region Failure (unlikely)
Recovery time: >30 minutes
# 1. Copy RDS snapshot to alternate region (us-east-1)
aws rds copy-db-snapshot \
--source-db-snapshot-identifier menotime-prod-backup \
--target-db-snapshot-identifier menotime-prod-recovery-us-east-1 \
--source-region us-west-1 \
--destination-region us-east-1
# 2. Restore from snapshot in us-east-1
# 3. Update application DNS to us-east-1 endpoint
# 4. Validate data integrity
Database Maintenance
Monthly Tasks
First Week: - Review CPU/memory utilization trends - Check connection pool health - Verify Performance Insights no bottlenecks
Second Week: - Run VACUUM ANALYZE (full table optimization) - Review and archive old audit logs - Check index fragmentation (reindex if >20% bloat)
Third Week: - Test backup restoration (staging environment) - Review slow query logs - Identify candidates for query optimization
Fourth Week: - Update parameter group if needed - Plan capacity for upcoming month - Review cost allocation and instance utilization
Quarterly Tasks
- Full database integrity check (REINDEX)
- Major version compatibility testing (in staging)
- Disaster recovery drill (restore and validate)
- Security audit (IAM access, audit logs review)
Operational Commands
Common Operations
Check Database Size:
aws rds describe-db-instances \
--db-instance-identifier menotime-prod \
--query 'DBInstances[0].AllocatedStorage'
View Current Connections:
# Connect to database
psql -h `menotime-prod.xxxxx.us-west-1.rds.amazonaws.com` -U menotime_user -d menotime_prod
# List active connections
SELECT count(*) FROM pg_stat_activity;
Kill a Long-Running Query:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE query LIKE 'SELECT %' AND state = 'active' AND query_start < now() - interval '1 hour';
Export Data to S3 (for analysis):
aws rds start-export-task \
--export-task-identifier menotime-prod-export-20240201 \
--source-arn arn:aws:rds:us-west-1:ACCOUNT_ID:db:menotime-prod \
--s3-bucket-name menotime-backups \
--s3-prefix exports/
--export-only patients clinical_notes # optional: specific tables
Summary
MenoTime's database foundation is built on RDS PostgreSQL with automated backups, encryption, and HIPAA-compliant audit logging. The db.m7g.large instance provides ample capacity for current and near-term growth, with clear upgrade and Multi-AZ migration paths for future scaling.
Key operational practices: - Daily automated backups (7-day retention) - Alembic-managed schema migrations - Performance Insights for optimization - Monthly maintenance and validation - Regular disaster recovery testing
For deployment procedures, see ECS Fargate. For monitoring, see Monitoring.