Environments

MenoTime operates three distinct environments, each with specific purposes, configurations, and deployment rules. This document details environment specifications, promotion workflows, and branch-to-environment mapping.

Environment Specifications

Development Environment

Identifier: menotime-dev

Purpose: - Rapid feature development and experimentation - Local testing before staging promotion - Ad-hoc debugging and troubleshooting - Branch feature deployments

Compute Configuration: - ECS Cluster: menotime-dev-cluster - Task Definition: menotime-backend-dev:latest - Desired Tasks: 1 (manual scaling, no auto-scaling) - vCPU: 0.5 - Memory: 1 GB - Container Image: Pulled from ECR with develop branch tag

Database Configuration: - Instance: menotime-dev (RDS PostgreSQL) - Class: db.m7g.large - Storage: 100 GB (gp3) - Multi-AZ: No (single AZ) - Backup Retention: 7 days (automated) - Enhanced Monitoring: Disabled - Performance Insights: Disabled - Publicly Accessible: No (private subnet only)

Networking: - ALB: Shared with staging (separate target group) - DNS: dev-api.menotime.ai (optional; may use Route 53 weighted routing) - WAF: Optional (not typically enabled for dev) - Security Group: menotime-dev-sg (permissive rules for testing)

Note: Configuration uses menotime-{env} naming pattern where {env} is replaced with environment name.

Scaling: - Auto-Scaling: Disabled (manual scaling via AWS Console or CLI) - Replicas: 1 (single task) - Manual Scale-up: When testing load scenarios (up to 2 tasks)

Environment Variables:

ENVIRONMENT=development
DATABASE_HOST=menotime-dev.xxxxx.us-west-1.rds.amazonaws.com
DATABASE_PORT=5432
DATABASE_NAME=menotime_dev
LOG_LEVEL=DEBUG
API_DEBUG=true
SENTRY_ENABLED=true
EMAIL_SANDBOX_MODE=true  # SES sandbox, limited sending
STRIPE_MODE=test

Retention & Cost: - Retention Policy: 30 days (dev data refreshed regularly) - Monthly Cost: ~$200 (lowest tier compute + minimal database usage) - Cleanup: Weekly cleanup of old test data and orphaned resources

Access: - Developers: Full access to console, logs, and database - CI/CD: Automated deployments from develop branch - Secrets: Separate Secrets Manager entries (non-production credentials)

Staging Environment

Identifier: menotime-staging

Purpose: - Pre-production validation before production release - Performance and load testing - Security testing and vulnerability scanning - Demo environment for stakeholders and clients - Integration testing with third-party services

Compute Configuration: - ECS Cluster: menotime-staging-cluster - Task Definition: menotime-backend-staging:latest - Desired Tasks: 2 (manual or minimal auto-scaling) - vCPU: 0.5 - Memory: 1 GB - Container Image: Pulled from ECR with main branch tag (release candidates)

Database Configuration: - Instance: menotime-staging (RDS PostgreSQL) - Class: db.m7g.large - Storage: 500 GB (gp3) - Multi-AZ: No (single AZ; upgrade path available) - Backup Retention: 7 days (automated) - Enhanced Monitoring: Enabled - Performance Insights: Enabled (30-day retention) - Publicly Accessible: No (private subnet only)

Networking: - ALB: Shared ALB with dev (separate target group) - DNS: staging-api.menotime.ai or shared menotime.ai with path routing

Configuration uses naming pattern menotime-staging for staging environment resources. - WAF: Enabled (test WAF rules and patterns) - Security Group: menotime-staging-sg (mirrors production rules)

Scaling: - Auto-Scaling: Minimal (2-3 tasks max) to control costs - CPU Threshold: 70% (trigger scale-up) - Memory Threshold: 80% (trigger scale-up) - Scale-down Cooldown: 300 seconds

Environment Variables:

ENVIRONMENT=staging
DATABASE_HOST=menotime-staging.xxxxx.us-west-1.rds.amazonaws.com
DATABASE_PORT=5432
DATABASE_NAME=menotime_staging
LOG_LEVEL=INFO
API_DEBUG=false
SENTRY_ENABLED=true
EMAIL_SANDBOX_MODE=false  # SES production, but sandbox account initially
STRIPE_MODE=test

Retention & Cost: - Retention Policy: 90 days (mirrors production for realistic testing) - Monthly Cost: ~$250 (shared ALB, larger database) - Data Refresh: Weekly refresh from production snapshot (anonymized)

Access: - Developers: Read-only console access, deploy via CI/CD - QA Team: Read-only access, ability to trigger test scenarios - Product: Demos and stakeholder testing - CI/CD: Automated deployments from main branch pull requests

Production Environment

Identifier: menotime-prod

Purpose: - Live patient data and real-world traffic - Critical healthcare delivery platform - HIPAA-compliant operations - Revenue and patient care dependent

Compute Configuration: - ECS Cluster: menotime-prod-cluster - Task Definition: menotime-backend-prod:latest - Desired Tasks: 2 (minimum for high availability) - vCPU: 1 - Memory: 2 GB - Container Image: Pulled from ECR with semantic version tag (e.g., v1.2.3)

Configuration naming pattern: menotime-prod for production environment.

Database Configuration: - Instance: menotime-prod (RDS PostgreSQL) - Class: db.m7g.large - Storage: 1 TB (gp3, auto-expandable) - Multi-AZ: Recommended upgrade (currently Single-AZ for cost control) - Backup Retention: 7 days (automated), daily snapshots to S3 - Enhanced Monitoring: Enabled - Performance Insights: Enabled (7-day retention) - Publicly Accessible: No (private subnet only) - Encryption: KMS at rest, SSL in transit - IAM Database Authentication: Enabled

Networking: - ALB: Dedicated production ALB - DNS: menotime.ai (primary domain via Route 53) - CloudFront: Enabled for static asset delivery and API caching - WAF: Enabled (production rule set) - Security Group: menotime-prod-sg (restrictive, principle of least privilege)

Scaling: - Auto-Scaling: Enabled (2-4 tasks) - CPU Threshold: 60% (trigger scale-up) - Memory Threshold: 75% (trigger scale-up) - Scale-down Threshold: 30% (with 600-second cooldown) - Scaling Policy: Target Tracking (preferred) or Step Scaling

Environment Variables:

ENVIRONMENT=production
DATABASE_HOST=menotime-prod.xxxxx.us-west-1.rds.amazonaws.com
DATABASE_PORT=5432
DATABASE_NAME=menotime_prod
LOG_LEVEL=WARNING
API_DEBUG=false
SENTRY_ENABLED=true
SENTRY_SAMPLE_RATE=0.1  # Log 10% of errors to avoid alert fatigue
EMAIL_SANDBOX_MODE=false  # SES production account
STRIPE_MODE=live
SECURE_COOKIES=true
HSTS_ENABLED=true

Retention & Cost: - Data Retention: Indefinite (patient records) - Log Retention: CloudWatch (30 days), S3 Glacier (2+ years) - Monthly Cost: ~$616 (250 patients), scaling to ~$896 (1,000 patients) - Cost Drivers: RDS (~70%), ALB, NAT Gateway, data transfer

Access: - Developers: Read-only logs and metrics; no direct console access - On-Call Engineers: Full access during incidents (via SSM Session Manager) - Operations: Monitoring, alerting, backup management - CI/CD: Gated deployments requiring GitHub approvals and passing tests - Audit: All API calls logged to CloudTrail; access controlled by IAM

Deployments: - Frequency: Twice per week (controlled release schedule) - Strategy: Rolling deployment (1 task minimum always running) - Approval: Required merge to main + GitHub Actions approval - Rollback: Automated on health check failure; manual rollback available - Change Window: Business hours (PST) to monitor for issues

Environment Comparison Matrix

Aspect	Dev	Staging	Production
Compute (vCPU/RAM)	0.5 / 1GB	0.5 / 1GB	1 / 2GB
Desired Tasks	1	2	2
Auto-Scaling	No	Minimal (2-3 max)	Yes (2-4)
Database	db.m7g.large	db.m7g.large	db.m7g.large
Multi-AZ	No	No	Recommended upgrade
Enhanced Monitoring	No	Yes	Yes
Performance Insights	No	Yes (30-day)	Yes (7-day)
WAF	Optional	Yes	Yes
CloudFront	No	No	Yes
Backup Retention	7 days	7 days	7 days
Data Refresh	Manual	Weekly (anonymized)	N/A (live)
Cost/Month	~$200	~$250	~$616-896
Access Level	Full	QA/Demo	Restricted
Deployment	Automated	Automated (PR)	Gated approval
Rollback	Fast	Fast	Automated on failure

Promotion Workflow

Development → Staging

Trigger: Pull request to main branch

Process: 1. Developer creates PR from develop (or feature branch) to main 2. CI/CD pipeline: - Runs unit tests, linting, security scans - Builds container image tagged with PR number and commit SHA - Pushes to ECR - Deploys to staging (automatic or manual trigger) 3. QA validates in staging environment 4. PR approval from code review team 5. Merge to main (triggers automated tagging with main branch tag in ECR)

Testing Checklist: - Unit tests pass (>80% coverage) - Integration tests pass (database migrations, API endpoints) - Security scan passes (no critical vulnerabilities) - Performance test passes (response time \<500ms for 95th percentile) - Manual QA sign-off (if applicable)

Staging → Production

Trigger: Git tag matching semantic version (e.g., v1.2.3)

Process: 1. Release Manager or CI/CD creates git tag on main branch bash git tag v1.2.3 git push origin v1.2.3 2. CI/CD pipeline: - Runs full test suite (unit, integration, security, performance) - Builds container image tagged with semantic version - Creates GitHub release with changelog - Requires GitHub Actions approval for production deployment 3. Deployment: - Rolling update to production ECS cluster - Health checks verify new tasks are healthy before removing old ones - CloudWatch alarms monitored for first 30 minutes 4. Post-deployment: - Smoke tests run (critical API endpoints, patient data access) - Monitoring for anomalies (error rates, latency, database connections) - Rollback available if issues detected

Production Deployment Requirements: - All tests passing - Code review approval - GitHub Actions manual approval - Change window compliance (PST business hours preferred) - Runbook prepared for rollback

Branch-to-Environment Mapping

┌──────────────────────────────────────────────────────────┐
│                  GIT WORKFLOW                            │
└──────────────────────────────────────────────────────────┘

Feature Branch (feature/*)
    │
    └──> PR to develop
         │
         └──> Deploy to Dev (if enabled)
              │
              └──> Merge to develop
                   │
                   └──> Auto-deploy to Dev (latest)

develop Branch
    │
    └──> PR to main
         │
         └──> Build & Push (ECR: tag=PR-number, SHA)
              │
              └──> Deploy to Staging (manual or auto)
                   │
                   └──> QA/Testing
                        │
                        └──> PR Approved & Merged
                             │
                             └──> Merge to main
                                  │
                                  └──> ECR: tag=main
                                       │
                                       └──> Staging updated

main Branch
    │
    └──> Git tag: v1.2.3
         │
         └──> Build & Push (ECR: tag=v1.2.3)
              │
              └──> Create GitHub Release
                   │
                   └──> GitHub Actions Approval
                        │
                        └──> Deploy to Production
                             │
                             └──> Rolling update (2→4 tasks)
                                  │
                                  └──> Health checks
                                       │
                                       └──> Live!

Mapping Details:

Branch	Environment	Image Tag	Frequency	Approval
`feature/*`	Dev	`feature-name`	On-demand	None
`develop`	Dev	`develop`	Per commit	None
`main` (PR)	Staging	`PR-{number}`	Per PR	Code review
`main` (merged)	Staging	`main`	Per merge	Code review
`main` (tag)	Production	`v{MAJOR}.{MINOR}.{PATCH}`	Per release	GitHub Actions

Environment Variables & Secrets

Secret Rotation Policy

Environment	Frequency	Rotation Method
Development	Never	Manual if leaked
Staging	Every 90 days	Lambda + Secrets Manager
Production	Every 30 days	Lambda + Secrets Manager

Database Credentials

Stored in: AWS Secrets Manager

Format:

{
  "username": "menotime_user",
  "password": "auto-generated-strong-password",
  "engine": "postgresql",
  "host": "menotime-{env}.xxxxx.us-west-1.rds.amazonaws.com",
  "port": 5432,
  "dbname": "menotime_{env}"
}

Access: ECS task role assumes role to retrieve secret; automatic rotation every 30 days (production)

Environment Promotion Requirements

Before Promotion to Staging

Code Quality:
All tests passing
Code review approval
Linting and style checks
Security:
No hardcoded secrets
Security scan passing (no critical/high CVEs)
Dependency audit passing
Documentation:
Changelog entry
Migration scripts (if database changes)
API documentation updated

Before Promotion to Production

Validation:
24+ hours in staging (minimum)
Passed QA sign-off
Performance test results reviewed
Database migration tested (if applicable)
Operational:
Runbook prepared (if manual steps required)
Rollback plan documented
On-call engineer identified
Monitoring alerts verified
Compliance:
HIPAA assessment completed (if PHI-touching changes)
Audit log entries reviewed
Security team sign-off (if applicable)

Rollback Procedures

Staging Rollback

Trigger: Manual or automated (if health checks fail)

Process: 1. Identify issue from CloudWatch logs and alarms 2. Retrieve previous image tag from ECR (e.g., main vs PR-123) 3. Update ECS service task definition to use previous image 4. Monitor health checks (typically 2-3 minutes) 5. Verify in staging environment

Command:

aws ecs update-service \
  --cluster menotime-staging-cluster \
  --service menotime-staging-service \
  --task-definition menotime-backend-staging:previous-revision \
  --force-new-deployment

Production Rollback

Trigger: Automated on health check failure OR manual if critical issue

Process: 1. Page on-call engineer 2. Assess severity (patient impact, data loss risk) 3. If rollback warranted: - Retrieve previous production version (e.g., v1.2.2) - Update ECS service to use previous version - Monitor for 30+ minutes 4. Post-incident review within 24 hours

Command:

aws ecs update-service \
  --cluster menotime-prod-cluster \
  --service menotime-prod-service \
  --task-definition menotime-backend-prod:previous-revision \
  --force-new-deployment

Monitoring by Environment

Development

Basic CloudWatch logs (no aggregation)
Manual review of errors
No alerting

Staging

CloudWatch dashboards for key metrics
Alarms for critical failures (database down, service unhealthy)
SNS → Slack for staging team

Production

Comprehensive CloudWatch dashboards
Real-time alarms for all critical metrics
SNS → PagerDuty for on-call escalation
GuardDuty findings reviewed weekly
Monthly cost reviews and optimization

Summary

The three-environment approach provides: - Safety: Staging validates changes before production - Agility: Dev enables rapid iteration - Compliance: Audit trails and controlled promotions - Cost Control: Smaller instances in non-production - Scalability: Production auto-scaling for patient volume

For operational runbooks, see Monitoring and ECS Fargate.