Cost Management
This document details MenoTime's infrastructure costs, cost breakdown by service, optimization opportunities, budgeting strategy, and monthly review process. Cost management is critical to maintaining efficient healthcare delivery and maximizing reinvestment in product development.
Current Cost Profile
Monthly Costs by Patient Volume
| Patient Volume | Monthly Cost | Per-Patient Cost | Key Driver |
|---|---|---|---|
| 250 patients | ~$616 | $2.46/patient | Fixed infrastructure |
| 500 patients | ~$750 | $1.50/patient | Scaling efficiency |
| 1,000 patients | ~$896 | $0.90/patient | Economies of scale |
| 2,000 patients | ~$1,200 | $0.60/patient | Threshold for Multi-AZ |
Cost Breakdown (at 250 patients)
Total Monthly Cost: ~$616
| Service | Cost | % of Total | Annual |
|---|---|---|---|
| RDS PostgreSQL (3 instances) | $425 | 69% | $5,100 |
| ECS Fargate Compute | $95 | 15% | $1,140 |
| NAT Gateways (2×) | $64 | 10% | $768 |
| ALB (1×) | $16 | 3% | $192 |
| CloudFront | $5 | 1% | $60 |
| CloudWatch/Logs | $8 | 1% | $96 |
| Miscellaneous | $3 | 1% | $36 |
RDS dominates at 69% — optimization focus
Cost Breakdown at 1,000 Patients
Total Monthly Cost: ~$896
| Service | Cost | % of Total | Notes |
|---|---|---|---|
| RDS PostgreSQL (3 instances) | $425 | 47% | Still majority but lower %age |
| ECS Fargate Compute | $250 | 28% | More tasks scaling |
| NAT Gateways (2×) | $64 | 7% | Fixed cost (data charges vary) |
| ALB (1×) | $16 | 2% | Fixed cost |
| Data Transfer (inter-AZ) | $75 | 8% | New cost at scale |
| CloudFront | $20 | 2% | Cache + static assets |
| CloudWatch/Logs | $30 | 3% | Increased logging |
| Miscellaneous | $16 | 2% | Various services |
Service-by-Service Cost Analysis
RDS PostgreSQL (~$425/month per 3 instances)
Current Configuration (all environments):
Instance Class: db.m7g.large
vCPU: 2 cores × 3 instances = 6 vCPUs
Memory: 8 GB × 3 instances = 24 GB
Storage: 100GB (dev) + 500GB (staging) + 1TB (prod) = 1.6 TB
Location: us-west-1 (N. California)
Cost per instance: ~$141/month
Cost Drivers: 1. Instance Class (largest cost): db.m7g.large = $141/month 2. Storage: ~$5/month per 100GB (gp3) 3. Backup Storage: ~$5-10/month (7-day retention) 4. Data Transfer: Minimal (internal traffic)
Optimization Opportunities:
Option 1: Reserve Instances (RI) — 23% Savings
All-Upfront 1-Year RI:
db.m7g.large: $1,700 per instance
3 instances: $5,100/year
Amortized: $425/month → $330/month (saves $95/month, $1,140/year)
All-Upfront 3-Year RI:
db.m7g.large: $4,250 per instance
3 instances: $12,750/year
Amortized: $425/month → $265/month (saves $160/month, $1,920/year)
Recommendation: Implement after 6 months of stable usage (validate growth assumptions)
Option 2: Downsize Dev Instance
Current: Dev = db.m7g.large ($141/month)
Alternative: Dev = db.m7g.medium ($70/month)
Impact:
- Saves $71/month × 12 = $852/year
- Risk: May be slow for developers (mitigated by larger staging)
- Mitigation: Upgrade before dev team expands
Option 3: Single-AZ to Multi-AZ Trade-off
Current: Single-AZ = $141/month
Multi-AZ: Single-AZ + Standby = $282/month
For Production:
Current: $141/month (single-AZ, manual failover)
Multi-AZ: $282/month (automatic failover, RTO <2 min)
Premium: $141/month for high availability
Decision: Defer until 500+ patients (cost ROI justified)
Option 4: Migrate to Aurora PostgreSQL (Future)
Current RDS PostgreSQL: ~$425/month (3 instances)
Aurora PostgreSQL (comparable): ~$300/month
Savings: ~$125/month ($1,500/year)
Trade-offs:
+ Automatic scaling, better HA
+ Native read replicas for analytics
- More complex operational model
- Connection pooling behavior differs
- Migration requires careful testing
Timeline: Evaluate in Year 2 if significant data volume
ECS Fargate Compute (~$95/month at 250 patients)
Current Configuration:
Production: 1 vCPU (2 tasks) + 2 GB RAM × 2 = $0.05/hour × 730 = $121/month
Staging: 0.5 vCPU (2 tasks) + 1 GB RAM × 2 = $0.023/hour × 730 = $34/month
Development: 0.5 vCPU (1 task) + 1 GB RAM × 1 = $0.012/hour × 730 = $9/month
Total: ~$164/month → actual ~$95/month (credits/discounts)
Optimization Opportunities:
Option 1: Fargate Spot (70% Discount)
Production Service: Mix 70% Spot + 30% On-Demand
Before: 2 tasks × $0.05/hour × 730 = $73/month
After: 1.4 tasks × $0.05 × 0.3 + 0.6 tasks × $0.05 × 0.7
= $21 + $21 = $42/month
Savings: $31/month
Risk: Spot can be interrupted (mitigated by multi-task setup)
Recommendation: Implement when comfortable with fault tolerance
Option 2: Reduce Non-Production Task Count
Staging: Currently 2 tasks → reduce to 1 task (off-hours) or 1 baseline + 1 bursting
Development: Currently 1 task → keep as-is
Savings: ~$17/month (minimal, staging useful for QA)
Recommendation: Not worth operational complexity; keep current
Option 3: Right-size Task Memory
Current: All dev/staging tasks = 1 GB memory
Review memory usage via CloudWatch
If consistently <500MB: downsize to 0.5 GB
Savings: 20-30% (minimal, ~$5/month)
Recommendation: Defer; current sizing is reasonable
NAT Gateways (~$64/month for 2×)
Current Configuration:
Primary NAT GW: us-west-1a = $32/month
Secondary NAT GW: us-west-1b = $32/month (redundancy)
Data Processing: ~$0-10/month
Cost Drivers:
1. Fixed hourly charge: $0.045/hour × 730 = $32.85
2. Data processing: $0.045 per GB (minimized by VPC Endpoints)
Optimization Opportunities:
Option 1: Single NAT Gateway (Cost) vs. Redundancy (Availability)
Current: 2 NAT Gateways = $64/month
Alternative: 1 NAT Gateway = $32/month (saves $32/month = $384/year)
Trade-off:
Current: If AZ fails → NAT GW in other AZ remains available
Savings: If AZ fails → NAT GW down, ECS tasks can't reach internet
(ECR pulls, SES email blocked) → ~15-30 min impact
Recommendation: Keep 2 NAT Gateways for HA; savings not justified
Option 2: Expand VPC Endpoints (Reduce Data Charges)
Current Endpoints:
- S3 Gateway Endpoint (free)
- Secrets Manager Interface Endpoint (~$7/month)
- ECR Interface Endpoint (~$7/month)
Additional:
- SQS Interface Endpoint (if async messaging added) ~$7/month
- SNS Interface Endpoint (if direct SNS use) ~$7/month
Data Processing Savings:
- Secrets Manager: ~$0.50/GB → avoid by using endpoint
- ECR: ~$0.50/GB → avoid by using endpoint
Current estimated savings: $2-5/month (minimal)
Recommendation: Already optimized; these endpoints are standard
ALB (Application Load Balancer) (~$16/month)
Current Configuration:
ALB Instance Charge: $0.0225/hour × 730 = $16.43/month
Data Processing: ~$0.006 per LCU (Lightweight Capacity Unit)
- Currently processing ~50 million requests/month
- Cost: 50M / 1M × $0.006 = $0.30/month (negligible)
Optimization Opportunities:
Option 1: Keep ALB (Recommended)
Benefits:
- Layer 7 routing (path-based rules)
- WAF integration
- Multi-AZ HA
- SSL/TLS termination
- CloudFront integration
Cost: $16/month is reasonable for these capabilities
Recommendation: Not a cost optimization target
Option 2: Use CloudFront as primary (If static-heavy)
If MenoTime becomes static-asset-heavy:
- ALB → CloudFront origin
- Reduce ALB requests (caching)
- More cost-effective if 50%+ cache hit rate
Current: Mixed API + static = ALB necessary
Recommendation: Revisit if content distribution needs change
CloudFront (CDN) (~$5/month at 250 patients)
Cost Drivers:
Data Transfer Out: $0.085 per GB
Current: ~60 GB/month = $5.10/month
Request Pricing:
HTTP/HTTPS: $0.0075 per 10k requests
Current: ~5 million requests/month = $3.75/month
(One of these applies depending on usage pattern)
Optimization Opportunities:
Option 1: Increase Cache Hit Ratio
Current: ~40% cache hit rate
Target: >70% cache hit rate
Actions:
- Longer cache TTLs on static assets (currently 24h, increase to 7 days)
- Versioned asset filenames (cache forever)
- Compress responses (gzip, brotli) — enabled
Savings: If cache hit rate 40% → 70%
Data transfer: ~60 GB → ~25 GB = saves $3/month
Recommendation: Implement versioned asset names in CI/CD
Option 2: Origin Shield (Cost vs. Hit Ratio)
Feature: Additional caching layer between CloudFront and origin
Cost: $0.005 per request
Benefit: Reduces origin load; improves cache hit ratio
At 5M requests/month: $25/month additional
Payoff: Only if reducing ALB/ECS load matters (not currently)
Recommendation: Skip unless scaling beyond 2,000 patients
CloudWatch & Logs (~$8-30/month)
Cost Breakdown:
CloudWatch Metrics:
- Custom metrics (API latency): ~$0.30 per metric per month
- 10 custom metrics = $3/month
CloudWatch Logs Ingestion:
- $0.50 per GB ingested
- Current: ~15 GB/month = $7.50/month
CloudWatch Logs Storage:
- $0.03 per GB per month stored
- Current: 30-day retention = ~10 GB stored = $0.30/month
Total: ~$10-12/month (at 250 patients)
At 1,000 patients: ~$25-30/month (3-4× logs volume)
Optimization Opportunities:
Option 1: Reduce Log Verbosity
Current: DEBUG logs in all environments
Alternative: INFO in production, DEBUG in staging only
Reduction:
- Production logs: 5 GB/month → 2 GB/month
- Savings: $1.50/month (not significant)
Recommendation: Keep DEBUG for prod troubleshooting; value > cost
Option 2: Tiered Log Retention
Current: All logs 30 days (CloudWatch)
Alternative:
- CloudWatch: 7 days
- S3 Glacier: 2 years (long-term compliance)
Savings: ~$2/month (move some to S3)
Recommendation: Implement with compliance requirement
Cost Optimization Roadmap
Immediate (Month 1-3) — Low Risk
- Implement Reserved Instances (RDS)
- Savings: $95/month ($1,140/year)
- Timeline: 2 weeks (order, validate stability 4 weeks, deploy)
- Risk: Low (can cancel 1-year RI)
-
Impact: 15% infrastructure cost reduction
-
Enable Versioned Asset Filenames (CloudFront)
- Savings: $3/month ($36/year)
- Timeline: 1 week (CI/CD change)
- Risk: Low
-
Impact: Minimal savings but best practice
-
Audit Current Usage
- Savings: Identify $50-100/month waste
- Timeline: 1 day (analysis)
- Risk: None
- Impact: Quick wins
Short-term (Month 3-6) — Medium Risk
- Downsize Dev RDS Instance (db.m7g.medium)
- Savings: $71/month ($852/year)
- Timeline: 2 weeks (test in staging, migrate dev)
- Risk: Medium (slower dev environment)
-
Mitigation: Monitor feedback; upgrade if needed
-
Implement Spot Instances (ECS)
- Savings: $30-50/month ($360-600/year)
- Timeline: 3 weeks (test Spot failure scenarios)
- Risk: Medium (task interruption possible)
-
Mitigation: Ensure 2+ task redundancy, monitor interruptions
-
Cost Allocation Tags (Governance)
- Savings: ~$20/month (waste reduction via visibility)
- Timeline: 2 weeks (tag all resources)
- Risk: Low
- Impact: Foundation for ongoing cost management
Medium-term (Month 6-12) — Higher Risk
- Enable Multi-AZ for Production RDS
- Savings: None (cost increase $141/month)
- Timeline: 4 weeks (failover testing)
- Risk: High (complex migration)
- ROI: HA + reduced RTO/RPO; justified at 500+ patients
-
Decision: Trigger when hitting 500 patients or P2 uptime target
-
Consider Aurora PostgreSQL Migration
- Savings: $125/month ($1,500/year)
- Timeline: 8-12 weeks (extensive testing)
- Risk: High (schema, connection behavior changes)
- Mitigation: Migrate staging first; parallel run test
- Decision: Evaluate in Year 2; low priority currently
Long-term (Year 2+)
- Advanced Optimization
- Compute Savings Plans (instead of RI)
- Reserved Capacity (Fargate)
- Serverless optimization (Lambda for scheduled tasks)
- Multi-region DR (if expansion warranted)
Cost Monitoring & Budgets
AWS Budgets Setup
Budget 1: Monthly Infrastructure Cost
Limit: $700/month (15% buffer above $616)
Actions:
- Alert at 80%: $560 → Email ops team
- Alert at 100%: $700 → Page on-call engineer
Frequency: Daily notification
Goal: Stay below $650/month (10% savings buffer)
Budget 2: RDS Costs
Limit: $450/month (22% above $425)
Actions:
- Alert at 90%: $405 → Review slow queries
- Alert at 100%: $450 → Consider scaling
Goal: Monitor RDS spending (largest cost driver)
Budget 3: Compute (ECS + EC2)
Limit: $150/month
Actions:
- Alert at 80%: $120 → Check auto-scaling policies
Goal: Monitor compute utilization
Budget 4: Data Transfer
Limit: $100/month
Actions:
- Alert at 80%: $80 → Check cross-region activity
Goal: Avoid surprise data transfer charges
Cost Allocation Tags (Required)
All resources tagged with:
| Tag | Values | Purpose |
|---|---|---|
Environment |
prod, staging, dev |
Cost by environment |
CostCenter |
engineering, operations, infrastructure |
Cost by team |
Application |
menotime |
Cost by product |
Temporary |
true, false |
Identify test resources to clean up |
Monthly Cost Report Template:
Infrastructure Cost Report - February 2024
Total Cost: $625
Budget: $700 (89% utilization)
By Environment:
Production: $425 (68%)
Staging: $140 (22%)
Development: $60 (10%)
By Service:
RDS: $425 (68%)
Fargate: $95 (15%)
NAT GW: $64 (10%)
Other: $41 (7%)
Key Metrics:
Patients: 250
Cost/Patient: $2.50
Reserved Instance Savings: (not yet implemented)
Optimizations in Progress:
- RDS RI procurement (in progress)
- CloudFront cache tuning (in progress)
Next Month Action Items:
- Approve RDS RI purchase (savings $1,140/year)
- Monitor Spot instance pilot (if deployed)
Monthly Cost Review Process
First Thursday of Each Month (30 minutes, operations team)
Agenda
- Actual vs. Budget (5 min) ``` Last month actual: $625 Budget: $700 Variance: -$75 (favorable)
YTD: $3,750 (6 months × $625) YTD Budget: $4,200 ```
- Cost Trends (5 min)
- Month-over-month comparison
- Identify spikes (new services, increased traffic)
-
Patient volume tracking
-
Optimization Progress (10 min)
- Review roadmap status
- Measure savings from implemented optimizations
-
Identify new opportunities
-
Anomalies (5 min)
- Unusual service charges
- Untagged resources (cleanup)
-
Forecast for next quarter
-
Decisions (5 min)
- Approve new purchases/upgrades
- Adjust budgets if needed
- Prioritize next optimizations
Example Monthly Report
Date: 2024-02-01
Attendees: VP Ops, Eng Lead, Finance
Last Month Cost: $625 (Budget: $700, -10.7%)
RDS Spending: $425
- Consistent with forecast
- RI procurement approved (1-year, saves $95/month)
- Implementation timeline: 2 weeks
Fargate Spending: $95
- Within expected range
- Spot pilot deferred (lower priority)
NAT Gateway Spending: $64
- No changes planned
Patient Growth:
- 250 → 265 patients (+6%)
- Cost/patient trending down (economy of scale)
- Forecast: 500 patients Q3 2024
Action Items:
☑ Order RDS 1-year RI (due: 2024-02-10)
☐ Monitor RI discounts post-implementation
☐ Schedule RDS Multi-AZ eval for Q3 (500+ patient trigger)
☐ Review Fargate Spot readiness (defer 30 days)
Next Review: 2024-03-01
Cost Transparency to Leadership
Monthly Executive Dashboard
Intended Audience: CEO, CFO, VP Operations
Top-level Metrics:
┌─────────────────────────────────────────┐
│ Total Monthly Cost: $625 │
│ Cost per Patient: $2.50 │
│ Cost per Patient (trended) ↘ 5% month │
│ YTD Infrastructure Spend: $3,750 │
│ YTD Budget: $4,200 │
│ YTD Variance: -$450 (11%) │
│ │
│ Forecast (full year): $7,500 │
│ Forecast Savings (RI): -$1,140 │
│ Adjusted Forecast: $6,360 │
└─────────────────────────────────────────┘
Key Insights:
- Platform scales efficiently (cost/patient declining)
- Infrastructure sized appropriately for 250 patients
- Multiple optimization opportunities pending (RI, Spot)
- Estimated $1,140/year savings from reserved instances
Risk/Opportunities:
- Multi-AZ upgrade needed at 500 patients (+$141/month)
- Aurora migration opportunity (Year 2, $1,500/year savings)
Summary
MenoTime's infrastructure cost at 250 patients is $616/month, dominated by RDS (69%) and ECS Fargate (15%). Key optimization opportunities include:
Quick Wins (Implement Now): - RDS Reserved Instances: $1,140/year savings - CloudFront cache tuning: $36/year savings
Medium-term (3-6 months): - Spot instances: $360-600/year savings - Dev RDS downsize: $852/year savings
Strategic (Year 2+): - Multi-AZ: Availability upgrade, required at scale - Aurora: $1,500/year savings, requires migration
Monthly Process: Review costs, track budgets, measure optimization ROI, forecast growth impact.
Cost management is an ongoing discipline with clear ROI on optimization efforts. Implement Reserved Instances first (highest impact, lowest risk).
For infrastructure planning and scaling, see Database, ECS Fargate, and Environments.