Cost Management

This document details MenoTime's infrastructure costs, cost breakdown by service, optimization opportunities, budgeting strategy, and monthly review process. Cost management is critical to maintaining efficient healthcare delivery and maximizing reinvestment in product development.

Current Cost Profile

Monthly Costs by Patient Volume

Patient Volume	Monthly Cost	Per-Patient Cost	Key Driver
250 patients	~$616	$2.46/patient	Fixed infrastructure
500 patients	~$750	$1.50/patient	Scaling efficiency
1,000 patients	~$896	$0.90/patient	Economies of scale
2,000 patients	~$1,200	$0.60/patient	Threshold for Multi-AZ

Cost Breakdown (at 250 patients)

Total Monthly Cost: ~$616

Service	Cost	% of Total	Annual
RDS PostgreSQL (3 instances)	$425	69%	$5,100
ECS Fargate Compute	$95	15%	$1,140
NAT Gateways (2×)	$64	10%	$768
ALB (1×)	$16	3%	$192
CloudFront	$5	1%	$60
CloudWatch/Logs	$8	1%	$96
Miscellaneous	$3	1%	$36

RDS dominates at 69% — optimization focus

Cost Breakdown at 1,000 Patients

Total Monthly Cost: ~$896

Service	Cost	% of Total	Notes
RDS PostgreSQL (3 instances)	$425	47%	Still majority but lower %age
ECS Fargate Compute	$250	28%	More tasks scaling
NAT Gateways (2×)	$64	7%	Fixed cost (data charges vary)
ALB (1×)	$16	2%	Fixed cost
Data Transfer (inter-AZ)	$75	8%	New cost at scale
CloudFront	$20	2%	Cache + static assets
CloudWatch/Logs	$30	3%	Increased logging
Miscellaneous	$16	2%	Various services

Service-by-Service Cost Analysis

RDS PostgreSQL (~$425/month per 3 instances)

Current Configuration (all environments):

Instance Class: db.m7g.large
vCPU: 2 cores × 3 instances = 6 vCPUs
Memory: 8 GB × 3 instances = 24 GB
Storage: 100GB (dev) + 500GB (staging) + 1TB (prod) = 1.6 TB
Location: us-west-1 (N. California)
Cost per instance: ~$141/month

Cost Drivers: 1. Instance Class (largest cost): db.m7g.large = $141/month 2. Storage: ~$5/month per 100GB (gp3) 3. Backup Storage: ~$5-10/month (7-day retention) 4. Data Transfer: Minimal (internal traffic)

Optimization Opportunities:

Option 1: Reserve Instances (RI) — 23% Savings

All-Upfront 1-Year RI:
  db.m7g.large: $1,700 per instance
  3 instances: $5,100/year
  Amortized: $425/month → $330/month (saves $95/month, $1,140/year)

All-Upfront 3-Year RI:
  db.m7g.large: $4,250 per instance
  3 instances: $12,750/year
  Amortized: $425/month → $265/month (saves $160/month, $1,920/year)

Recommendation: Implement after 6 months of stable usage (validate growth assumptions)

Option 2: Downsize Dev Instance

Current: Dev = db.m7g.large ($141/month)
Alternative: Dev = db.m7g.medium ($70/month)

Impact:
- Saves $71/month × 12 = $852/year
- Risk: May be slow for developers (mitigated by larger staging)
- Mitigation: Upgrade before dev team expands

Option 3: Single-AZ to Multi-AZ Trade-off

Current: Single-AZ = $141/month
Multi-AZ: Single-AZ + Standby = $282/month

For Production:
  Current: $141/month (single-AZ, manual failover)
  Multi-AZ: $282/month (automatic failover, RTO <2 min)
  Premium: $141/month for high availability

Decision: Defer until 500+ patients (cost ROI justified)

Option 4: Migrate to Aurora PostgreSQL (Future)

Current RDS PostgreSQL: ~$425/month (3 instances)
Aurora PostgreSQL (comparable): ~$300/month

Savings: ~$125/month ($1,500/year)
Trade-offs:
  + Automatic scaling, better HA
  + Native read replicas for analytics
  - More complex operational model
  - Connection pooling behavior differs
  - Migration requires careful testing

Timeline: Evaluate in Year 2 if significant data volume

ECS Fargate Compute (~$95/month at 250 patients)

Current Configuration:

Production:  1 vCPU (2 tasks) + 2 GB RAM × 2 = $0.05/hour × 730 = $121/month
Staging:     0.5 vCPU (2 tasks) + 1 GB RAM × 2 = $0.023/hour × 730 = $34/month
Development: 0.5 vCPU (1 task) + 1 GB RAM × 1 = $0.012/hour × 730 = $9/month
Total: ~$164/month → actual ~$95/month (credits/discounts)

Optimization Opportunities:

Option 1: Fargate Spot (70% Discount)

Production Service: Mix 70% Spot + 30% On-Demand

Before: 2 tasks × $0.05/hour × 730 = $73/month
After: 1.4 tasks × $0.05 × 0.3 + 0.6 tasks × $0.05 × 0.7
      = $21 + $21 = $42/month
Savings: $31/month

Risk: Spot can be interrupted (mitigated by multi-task setup)
Recommendation: Implement when comfortable with fault tolerance

Option 2: Reduce Non-Production Task Count

Staging: Currently 2 tasks → reduce to 1 task (off-hours) or 1 baseline + 1 bursting
Development: Currently 1 task → keep as-is

Savings: ~$17/month (minimal, staging useful for QA)
Recommendation: Not worth operational complexity; keep current

Option 3: Right-size Task Memory

Current: All dev/staging tasks = 1 GB memory
Review memory usage via CloudWatch
If consistently <500MB: downsize to 0.5 GB

Savings: 20-30% (minimal, ~$5/month)
Recommendation: Defer; current sizing is reasonable

NAT Gateways (~$64/month for 2×)

Current Configuration:

Primary NAT GW: us-west-1a = $32/month
Secondary NAT GW: us-west-1b = $32/month (redundancy)
Data Processing: ~$0-10/month

Cost Drivers:
1. Fixed hourly charge: $0.045/hour × 730 = $32.85
2. Data processing: $0.045 per GB (minimized by VPC Endpoints)

Optimization Opportunities:

Option 1: Single NAT Gateway (Cost) vs. Redundancy (Availability)

Current: 2 NAT Gateways = $64/month
Alternative: 1 NAT Gateway = $32/month (saves $32/month = $384/year)

Trade-off:
  Current: If AZ fails → NAT GW in other AZ remains available
  Savings: If AZ fails → NAT GW down, ECS tasks can't reach internet
           (ECR pulls, SES email blocked) → ~15-30 min impact

Recommendation: Keep 2 NAT Gateways for HA; savings not justified

Option 2: Expand VPC Endpoints (Reduce Data Charges)

Current Endpoints:
  - S3 Gateway Endpoint (free)
  - Secrets Manager Interface Endpoint (~$7/month)
  - ECR Interface Endpoint (~$7/month)

Additional:
  - SQS Interface Endpoint (if async messaging added) ~$7/month
  - SNS Interface Endpoint (if direct SNS use) ~$7/month

Data Processing Savings:
  - Secrets Manager: ~$0.50/GB → avoid by using endpoint
  - ECR: ~$0.50/GB → avoid by using endpoint
  Current estimated savings: $2-5/month (minimal)

Recommendation: Already optimized; these endpoints are standard

ALB (Application Load Balancer) (~$16/month)

Current Configuration:

ALB Instance Charge: $0.0225/hour × 730 = $16.43/month
Data Processing: ~$0.006 per LCU (Lightweight Capacity Unit)
  - Currently processing ~50 million requests/month
  - Cost: 50M / 1M × $0.006 = $0.30/month (negligible)

Optimization Opportunities:

Option 1: Keep ALB (Recommended)

Benefits:
  - Layer 7 routing (path-based rules)
  - WAF integration
  - Multi-AZ HA
  - SSL/TLS termination
  - CloudFront integration

Cost: $16/month is reasonable for these capabilities
Recommendation: Not a cost optimization target

Option 2: Use CloudFront as primary (If static-heavy)

If MenoTime becomes static-asset-heavy:
  - ALB → CloudFront origin
  - Reduce ALB requests (caching)
  - More cost-effective if 50%+ cache hit rate

Current: Mixed API + static = ALB necessary
Recommendation: Revisit if content distribution needs change

CloudFront (CDN) (~$5/month at 250 patients)

Cost Drivers:

Data Transfer Out: $0.085 per GB
  Current: ~60 GB/month = $5.10/month

Request Pricing:
  HTTP/HTTPS: $0.0075 per 10k requests
  Current: ~5 million requests/month = $3.75/month

(One of these applies depending on usage pattern)

Optimization Opportunities:

Option 1: Increase Cache Hit Ratio

Current: ~40% cache hit rate
Target: >70% cache hit rate

Actions:
  - Longer cache TTLs on static assets (currently 24h, increase to 7 days)
  - Versioned asset filenames (cache forever)
  - Compress responses (gzip, brotli) — enabled

Savings: If cache hit rate 40% → 70%
  Data transfer: ~60 GB → ~25 GB = saves $3/month

Recommendation: Implement versioned asset names in CI/CD

Option 2: Origin Shield (Cost vs. Hit Ratio)

Feature: Additional caching layer between CloudFront and origin
Cost: $0.005 per request
Benefit: Reduces origin load; improves cache hit ratio

At 5M requests/month: $25/month additional
Payoff: Only if reducing ALB/ECS load matters (not currently)

Recommendation: Skip unless scaling beyond 2,000 patients

CloudWatch & Logs (~$8-30/month)

Cost Breakdown:

CloudWatch Metrics:
  - Custom metrics (API latency): ~$0.30 per metric per month
  - 10 custom metrics = $3/month

CloudWatch Logs Ingestion:
  - $0.50 per GB ingested
  - Current: ~15 GB/month = $7.50/month

CloudWatch Logs Storage:
  - $0.03 per GB per month stored
  - Current: 30-day retention = ~10 GB stored = $0.30/month

Total: ~$10-12/month (at 250 patients)
At 1,000 patients: ~$25-30/month (3-4× logs volume)

Optimization Opportunities:

Option 1: Reduce Log Verbosity

Current: DEBUG logs in all environments
Alternative: INFO in production, DEBUG in staging only

Reduction:
  - Production logs: 5 GB/month → 2 GB/month
  - Savings: $1.50/month (not significant)

Recommendation: Keep DEBUG for prod troubleshooting; value > cost

Option 2: Tiered Log Retention

Current: All logs 30 days (CloudWatch)
Alternative:
  - CloudWatch: 7 days
  - S3 Glacier: 2 years (long-term compliance)

Savings: ~$2/month (move some to S3)
Recommendation: Implement with compliance requirement

Cost Optimization Roadmap

Immediate (Month 1-3) — Low Risk

Implement Reserved Instances (RDS)
Savings: $95/month ($1,140/year)
Timeline: 2 weeks (order, validate stability 4 weeks, deploy)
Risk: Low (can cancel 1-year RI)
Impact: 15% infrastructure cost reduction
Enable Versioned Asset Filenames (CloudFront)
Savings: $3/month ($36/year)
Timeline: 1 week (CI/CD change)
Risk: Low
Impact: Minimal savings but best practice
Audit Current Usage
Savings: Identify $50-100/month waste
Timeline: 1 day (analysis)
Risk: None
Impact: Quick wins

Short-term (Month 3-6) — Medium Risk

Downsize Dev RDS Instance (db.m7g.medium)
Savings: $71/month ($852/year)
Timeline: 2 weeks (test in staging, migrate dev)
Risk: Medium (slower dev environment)
Mitigation: Monitor feedback; upgrade if needed
Implement Spot Instances (ECS)
Savings: $30-50/month ($360-600/year)
Timeline: 3 weeks (test Spot failure scenarios)
Risk: Medium (task interruption possible)
Mitigation: Ensure 2+ task redundancy, monitor interruptions
Cost Allocation Tags (Governance)
Savings: ~$20/month (waste reduction via visibility)
Timeline: 2 weeks (tag all resources)
Risk: Low
Impact: Foundation for ongoing cost management

Medium-term (Month 6-12) — Higher Risk

Enable Multi-AZ for Production RDS
Savings: None (cost increase $141/month)
Timeline: 4 weeks (failover testing)
Risk: High (complex migration)
ROI: HA + reduced RTO/RPO; justified at 500+ patients
Decision: Trigger when hitting 500 patients or P2 uptime target
Consider Aurora PostgreSQL Migration
Savings: $125/month ($1,500/year)
Timeline: 8-12 weeks (extensive testing)
Risk: High (schema, connection behavior changes)
Mitigation: Migrate staging first; parallel run test
Decision: Evaluate in Year 2; low priority currently

Long-term (Year 2+)

Advanced Optimization
Compute Savings Plans (instead of RI)
Reserved Capacity (Fargate)
Serverless optimization (Lambda for scheduled tasks)
Multi-region DR (if expansion warranted)

Cost Monitoring & Budgets

AWS Budgets Setup

Budget 1: Monthly Infrastructure Cost

Limit: $700/month (15% buffer above $616)
Actions:
  - Alert at 80%: $560 → Email ops team
  - Alert at 100%: $700 → Page on-call engineer
Frequency: Daily notification
Goal: Stay below $650/month (10% savings buffer)

Budget 2: RDS Costs

Limit: $450/month (22% above $425)
Actions:
  - Alert at 90%: $405 → Review slow queries
  - Alert at 100%: $450 → Consider scaling
Goal: Monitor RDS spending (largest cost driver)

Budget 3: Compute (ECS + EC2)

Limit: $150/month
Actions:
  - Alert at 80%: $120 → Check auto-scaling policies
Goal: Monitor compute utilization

Budget 4: Data Transfer

Limit: $100/month
Actions:
  - Alert at 80%: $80 → Check cross-region activity
Goal: Avoid surprise data transfer charges

Cost Allocation Tags (Required)

All resources tagged with:

Tag	Values	Purpose
`Environment`	`prod`, `staging`, `dev`	Cost by environment
`CostCenter`	`engineering`, `operations`, `infrastructure`	Cost by team
`Application`	`menotime`	Cost by product
`Temporary`	`true`, `false`	Identify test resources to clean up

Monthly Cost Report Template:

Infrastructure Cost Report - February 2024

Total Cost: $625
Budget: $700 (89% utilization)

By Environment:
  Production:  $425 (68%)
  Staging:     $140 (22%)
  Development: $60 (10%)

By Service:
  RDS:         $425 (68%)
  Fargate:     $95  (15%)
  NAT GW:      $64  (10%)
  Other:       $41  (7%)

Key Metrics:
  Patients: 250
  Cost/Patient: $2.50
  Reserved Instance Savings: (not yet implemented)

Optimizations in Progress:
  - RDS RI procurement (in progress)
  - CloudFront cache tuning (in progress)

Next Month Action Items:
  - Approve RDS RI purchase (savings $1,140/year)
  - Monitor Spot instance pilot (if deployed)

Monthly Cost Review Process

First Thursday of Each Month (30 minutes, operations team)

Agenda

Actual vs. Budget (5 min) ``` Last month actual: $625 Budget: $700 Variance: -$75 (favorable)

YTD: $3,750 (6 months × $625) YTD Budget: $4,200 ```

Cost Trends (5 min)
Month-over-month comparison
Identify spikes (new services, increased traffic)
Patient volume tracking
Optimization Progress (10 min)
Review roadmap status
Measure savings from implemented optimizations
Identify new opportunities
Anomalies (5 min)
Unusual service charges
Untagged resources (cleanup)
Forecast for next quarter
Decisions (5 min)
Approve new purchases/upgrades
Adjust budgets if needed
Prioritize next optimizations

Example Monthly Report

Date: 2024-02-01
Attendees: VP Ops, Eng Lead, Finance

Last Month Cost: $625 (Budget: $700, -10.7%)

RDS Spending: $425
  - Consistent with forecast
  - RI procurement approved (1-year, saves $95/month)
  - Implementation timeline: 2 weeks

Fargate Spending: $95
  - Within expected range
  - Spot pilot deferred (lower priority)

NAT Gateway Spending: $64
  - No changes planned

Patient Growth:
  - 250 → 265 patients (+6%)
  - Cost/patient trending down (economy of scale)
  - Forecast: 500 patients Q3 2024

Action Items:
  ☑ Order RDS 1-year RI (due: 2024-02-10)
  ☐ Monitor RI discounts post-implementation
  ☐ Schedule RDS Multi-AZ eval for Q3 (500+ patient trigger)
  ☐ Review Fargate Spot readiness (defer 30 days)

Next Review: 2024-03-01

Cost Transparency to Leadership

Monthly Executive Dashboard

Intended Audience: CEO, CFO, VP Operations

Top-level Metrics:
┌─────────────────────────────────────────┐
│ Total Monthly Cost:        $625         │
│ Cost per Patient:          $2.50        │
│ Cost per Patient (trended) ↘ 5% month   │
│ YTD Infrastructure Spend:  $3,750       │
│ YTD Budget:                $4,200       │
│ YTD Variance:              -$450 (11%)  │
│                                         │
│ Forecast (full year):      $7,500       │
│ Forecast Savings (RI):     -$1,140      │
│ Adjusted Forecast:         $6,360       │
└─────────────────────────────────────────┘

Key Insights:
- Platform scales efficiently (cost/patient declining)
- Infrastructure sized appropriately for 250 patients
- Multiple optimization opportunities pending (RI, Spot)
- Estimated $1,140/year savings from reserved instances

Risk/Opportunities:
- Multi-AZ upgrade needed at 500 patients (+$141/month)
- Aurora migration opportunity (Year 2, $1,500/year savings)

Summary

MenoTime's infrastructure cost at 250 patients is $616/month, dominated by RDS (69%) and ECS Fargate (15%). Key optimization opportunities include:

Quick Wins (Implement Now): - RDS Reserved Instances: $1,140/year savings - CloudFront cache tuning: $36/year savings

Medium-term (3-6 months): - Spot instances: $360-600/year savings - Dev RDS downsize: $852/year savings

Strategic (Year 2+): - Multi-AZ: Availability upgrade, required at scale - Aurora: $1,500/year savings, requires migration

Monthly Process: Review costs, track budgets, measure optimization ROI, forecast growth impact.

Cost management is an ongoing discipline with clear ROI on optimization efforts. Implement Reserved Instances first (highest impact, lowest risk).

For infrastructure planning and scaling, see Database, ECS Fargate, and Environments.