Deploy to Production

This guide covers the procedures for deploying code to production with approval gates, verification, and rollback strategies.

Production Deployment Workflow

develop branch → release/vX.X.X → PR to main → Approval
→ Merge to main → Tag release → Build & Deploy → Verification → Monitor

Prerequisites

Before deploying to production, ensure:

✅ All code is merged to develop and tested in staging
✅ Staging deployment has been verified for 24+ hours
✅ No blocking issues reported
✅ Database migrations are tested in staging
✅ You have access to production AWS account
✅ You have admin access to GitHub repository

Step 1: Create a Release Branch

Create a release branch from develop:

git checkout develop
git pull origin develop
git checkout -b release/v1.2.0

Follow semantic versioning: vMAJOR.MINOR.PATCH

Step 2: Pre-Deployment Checklist

Before creating the PR, complete this checklist:

✅ Update version number in pyproject.toml or app/__init__.py
✅ Update CHANGELOG.md with release notes
✅ Run full test suite: pytest
✅ Run security scan: bandit -r app/
✅ Run linting: flake8 app/
✅ Verify no hardcoded secrets: git log -p -S 'password=' --all
✅ Test database migrations on staging data clone
✅ Verify all critical endpoints in staging
✅ Check monitoring and alerting setup

Example CHANGELOG.md Update

## [1.2.0] - 2024-01-20

### Added
- User profile endpoints (GET, POST, PATCH)
- Patient symptom tracking feature
- Admin dashboard reporting

### Fixed
- Authorization header validation bug
- Database connection pool timeout

### Changed
- API response format now includes `meta` object
- Improved pagination limit to 100 items

### Security
- Updated dependencies to patch CVE-2024-1234

Step 3: Create Release PR

Push the release branch and create a PR to main:

git add .
git commit -m "chore: prepare release v1.2.0"
git push origin release/v1.2.0

Create PR on GitHub: - Base: main - Compare: release/v1.2.0 - Title: Release: v1.2.0 - Description: Include release notes

PR Description Template

## Release Information

**Version**: v1.2.0
**Release Date**: 2024-01-20

## What's included

- User profile endpoints
- Patient symptom tracking
- Security patches

## Testing completed

- [x] Unit tests passed
- [x] Integration tests passed
- [x] Staging verified for 24 hours
- [x] Database migrations tested
- [x] No critical issues reported

## Deployment plan

- [ ] Staging deployment verified
- [ ] Production approval obtained
- [ ] Blue/green deployment ready
- [ ] Rollback plan reviewed

Step 4: Code Review and Approval

Code review process for production:

Request review from at least 2 senior engineers
Address all review comments
Obtain explicit approval from engineering lead
Obtain sign-off from product manager

Required Approvals

[ ] Engineering Lead approval
[ ] DevOps/Infrastructure approval
[ ] Product Manager approval
[ ] Security review (if changes affect auth/data)

Step 5: Merge to Main

Once all approvals are obtained:

# Option 1: Merge via GitHub UI
# Use "Create a merge commit" (do NOT squash for production)

Or via command line:

git checkout main
git pull origin main
git merge --no-ff release/v1.2.0  # --no-ff preserves history
git push origin main

Step 6: Create a Release Tag

Tag the release commit:

git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0

Verify the tag:

git tag -l v1.2.0
git show v1.2.0

Step 7: Automatic Production Deployment

When you push to main, GitHub Actions triggers the production deployment pipeline.

The .github/workflows/deploy-production.yml workflow:

Runs full test suite — Comprehensive testing
Security scanning — Checks for vulnerabilities
Builds Docker image — Creates production image
Pushes to ECR — Uploads to registry
Blue/Green Deployment — Deploys with zero downtime
Health checks — Verifies new tasks are healthy
Smoke tests — Tests critical endpoints
Notification — Alerts team of completion

Monitor the Pipeline

# View GitHub Actions
# Go to: Actions → Select workflow run → View details

# Or via CLI
gh run list --branch main
gh run view RUN_ID --log

Understanding Blue/Green Deployment

MenoTime uses blue/green deployment for zero-downtime updates:

[Blue Task - v1.1.0]          [Green Task - v1.2.0]
         ↓                              ↓
    [Load Balancer Routes Traffic]
         ↓                              ↓
    [ALB Health Check - PASSED]  [ALB Health Check - PASSED]

When green task is healthy:
Load Balancer switches traffic → All traffic to v1.2.0
Blue task automatically stopped

How Blue/Green Works in ECS

Production has 2 task definitions running in parallel
New version (green) launches alongside old version (blue)
When green passes health checks, ALB routes traffic to it
Old version (blue) is kept running for 5 minutes
If green fails, traffic immediately reverts to blue
After 5 minutes, blue task is stopped

Step 8: Production Verification

After deployment, verify production is healthy:

Health Check

curl https://api.menotime-app.com/health
# {"status": "healthy", "timestamp": "2024-01-20T14:30:00Z"}

Check Active Tasks

aws ecs list-tasks \
  --cluster production \
  --service-name menotime-api \
  --region us-west-1 \
  --desired-status RUNNING

# Should show multiple tasks running

Verify Deployment Status

aws ecs describe-services \
  --cluster production \
  --services menotime-api \
  --region us-west-1 | jq '.services[0].deployments'

Expected output:

[
  {
    "status": "PRIMARY",
    "taskDefinition": "menotime-api-prod:42",
    "desiredCount": 3,
    "runningCount": 3,
    "pendingCount": 0
  }
]

Check CloudWatch Metrics

# Check error rate
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApplicationELB \
  --metric-name HTTPCode_Target_5XX_Count \
  --dimensions Name=TargetGroup,Value=menotime-api-prod \
  --start-time $(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum \
  --region us-west-1

Test Key Endpoints

# Health check
curl https://api.menotime-app.com/health

# Authentication endpoint
curl -X POST https://api.menotime-app.com/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "test@example.com", "password": "password"}'

# Patient list endpoint
curl -H "Authorization: Bearer PROD_TOKEN" \
  https://api.menotime-app.com/api/v1/patients?limit=10

Monitor Logs

# View real-time logs
aws logs tail /ecs/menotime-api-production --follow --region us-west-1

# Search for errors
aws logs filter-log-events \
  --log-group-name /ecs/menotime-api-production \
  --filter-pattern "ERROR" \
  --region us-west-1 \
  --start-time $(date -d '15 minutes ago' +%s)000

Monitoring Checklist

After deployment, monitor these metrics for 1 hour:

[ ] Error rate (5XX) < 0.1%
[ ] Response time p99 < 500ms
[ ] Database connection pool healthy
[ ] No CloudWatch alarms triggered
[ ] CPU utilization normal
[ ] Memory utilization normal
[ ] User reports look normal

Rollback Procedure

If critical issues occur in production, follow this rollback procedure.

Immediate Rollback (< 5 minutes)

If the deployment hasn't fully completed:

# Stop the new deployment
aws ecs update-service \
  --cluster production \
  --service menotime-api \
  --desired-count 0 \
  --region us-west-1

This removes the green tasks, keeping blue running.

Fast Rollback (5-30 minutes)

Rollback to the previous task definition:

# Get the previous task definition
CURRENT_TASK=$(aws ecs describe-services \
  --cluster production \
  --services menotime-api \
  --region us-west-1 \
  --query 'services[0].taskDefinition' \
  --output text)

# Get task definition revision number
CURRENT_REVISION=${CURRENT_TASK##*:}
PREVIOUS_REVISION=$((CURRENT_REVISION - 1))

# Revert to previous version
aws ecs update-service \
  --cluster production \
  --service menotime-api \
  --task-definition menotime-api-prod:$PREVIOUS_REVISION \
  --force-new-deployment \
  --region us-west-1

Wait for the service to stabilize:

aws ecs wait services-stable \
  --cluster production \
  --services menotime-api \
  --region us-west-1

Tag a Rollback Release

# Create a rollback tag
git tag -a v1.1.0-hotfix -m "Rollback from v1.2.0"
git push origin v1.1.0-hotfix

# Update main branch
git revert v1.2.0
git push origin main

Post-Rollback Analysis

Identify root cause — Review logs and error traces
Create issue — Log the problem in GitHub
Fix the bug — Develop a fix on a feature branch
Test thoroughly — Run extensive tests and staging validation
Re-deploy — Create a new release with the fix

Hotfix Process

For critical production bugs, use the hotfix workflow:

git checkout main
git pull origin main
git checkout -b hotfix/fix-critical-bug

# Make your fix
git add .
git commit -m "hotfix: critical bug in patient endpoint"
git push origin hotfix/fix-critical-bug

Create PR to main with same approval process: - Label: hotfix - Require: Engineering Lead + DevOps approval - Quick turnaround (30 min approval target)

After merge and deployment:

# Also merge hotfix back to develop
git checkout develop
git pull origin develop
git merge hotfix/fix-critical-bug
git push origin develop

# Delete hotfix branch
git branch -d hotfix/fix-critical-bug
git push origin --delete hotfix/fix-critical-bug

Production Environment Details

Property	Value
ECS Cluster	`production`
ECS Service	`menotime-api`
Task Definition	`menotime-api-prod`
Desired Count	3 (high availability)
Container Port	8000
Load Balancer	Application Load Balancer (ALB)
Database	RDS PostgreSQL (`menotime-prod`)
Base URL	`https://api.menotime-app.com`
Region	`us-west-1`
Deployment Type	Blue/Green
Health Check Interval	30 seconds

On-Call Procedures

If you're on-call and a production issue occurs:

Assess severity — Is the API down or degraded?
Alert team — Page relevant engineers
Check dashboards — CloudWatch, Datadog, PagerDuty
Review logs — Understand what's happening
Decide on rollback — If critical, initiate rollback
Post-incident — Document and discuss in retrospective