Skip to content

Deploy to Production

This guide covers the procedures for deploying code to production with approval gates, verification, and rollback strategies.

Production Deployment Workflow

develop branch → release/vX.X.X → PR to main → Approval
→ Merge to main → Tag release → Build & Deploy → Verification → Monitor

Prerequisites

Before deploying to production, ensure:

  • ✅ All code is merged to develop and tested in staging
  • ✅ Staging deployment has been verified for 24+ hours
  • ✅ No blocking issues reported
  • ✅ Database migrations are tested in staging
  • ✅ You have access to production AWS account
  • ✅ You have admin access to GitHub repository

Step 1: Create a Release Branch

Create a release branch from develop:

git checkout develop
git pull origin develop
git checkout -b release/v1.2.0

Follow semantic versioning: vMAJOR.MINOR.PATCH

Step 2: Pre-Deployment Checklist

Before creating the PR, complete this checklist:

  • ✅ Update version number in pyproject.toml or app/__init__.py
  • ✅ Update CHANGELOG.md with release notes
  • ✅ Run full test suite: pytest
  • ✅ Run security scan: bandit -r app/
  • ✅ Run linting: flake8 app/
  • ✅ Verify no hardcoded secrets: git log -p -S 'password=' --all
  • ✅ Test database migrations on staging data clone
  • ✅ Verify all critical endpoints in staging
  • ✅ Check monitoring and alerting setup

Example CHANGELOG.md Update

## [1.2.0] - 2024-01-20

### Added
- User profile endpoints (GET, POST, PATCH)
- Patient symptom tracking feature
- Admin dashboard reporting

### Fixed
- Authorization header validation bug
- Database connection pool timeout

### Changed
- API response format now includes `meta` object
- Improved pagination limit to 100 items

### Security
- Updated dependencies to patch CVE-2024-1234

Step 3: Create Release PR

Push the release branch and create a PR to main:

git add .
git commit -m "chore: prepare release v1.2.0"
git push origin release/v1.2.0

Create PR on GitHub: - Base: main - Compare: release/v1.2.0 - Title: Release: v1.2.0 - Description: Include release notes

PR Description Template

## Release Information

**Version**: v1.2.0
**Release Date**: 2024-01-20

## What's included

- User profile endpoints
- Patient symptom tracking
- Security patches

## Testing completed

- [x] Unit tests passed
- [x] Integration tests passed
- [x] Staging verified for 24 hours
- [x] Database migrations tested
- [x] No critical issues reported

## Deployment plan

- [ ] Staging deployment verified
- [ ] Production approval obtained
- [ ] Blue/green deployment ready
- [ ] Rollback plan reviewed

Step 4: Code Review and Approval

Code review process for production:

  1. Request review from at least 2 senior engineers
  2. Address all review comments
  3. Obtain explicit approval from engineering lead
  4. Obtain sign-off from product manager

Required Approvals

  • [ ] Engineering Lead approval
  • [ ] DevOps/Infrastructure approval
  • [ ] Product Manager approval
  • [ ] Security review (if changes affect auth/data)

Step 5: Merge to Main

Once all approvals are obtained:

# Option 1: Merge via GitHub UI
# Use "Create a merge commit" (do NOT squash for production)

Or via command line:

git checkout main
git pull origin main
git merge --no-ff release/v1.2.0  # --no-ff preserves history
git push origin main

Step 6: Create a Release Tag

Tag the release commit:

git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0

Verify the tag:

git tag -l v1.2.0
git show v1.2.0

Step 7: Automatic Production Deployment

When you push to main, GitHub Actions triggers the production deployment pipeline.

The .github/workflows/deploy-production.yml workflow:

  1. Runs full test suite — Comprehensive testing
  2. Security scanning — Checks for vulnerabilities
  3. Builds Docker image — Creates production image
  4. Pushes to ECR — Uploads to registry
  5. Blue/Green Deployment — Deploys with zero downtime
  6. Health checks — Verifies new tasks are healthy
  7. Smoke tests — Tests critical endpoints
  8. Notification — Alerts team of completion

Monitor the Pipeline

# View GitHub Actions
# Go to: Actions → Select workflow run → View details

# Or via CLI
gh run list --branch main
gh run view RUN_ID --log

Understanding Blue/Green Deployment

MenoTime uses blue/green deployment for zero-downtime updates:

[Blue Task - v1.1.0]          [Green Task - v1.2.0]
         ↓                              ↓
    [Load Balancer Routes Traffic]
         ↓                              ↓
    [ALB Health Check - PASSED]  [ALB Health Check - PASSED]

When green task is healthy:
Load Balancer switches traffic → All traffic to v1.2.0
Blue task automatically stopped

How Blue/Green Works in ECS

  1. Production has 2 task definitions running in parallel
  2. New version (green) launches alongside old version (blue)
  3. When green passes health checks, ALB routes traffic to it
  4. Old version (blue) is kept running for 5 minutes
  5. If green fails, traffic immediately reverts to blue
  6. After 5 minutes, blue task is stopped

Step 8: Production Verification

After deployment, verify production is healthy:

Health Check

curl https://api.menotime-app.com/health
# {"status": "healthy", "timestamp": "2024-01-20T14:30:00Z"}

Check Active Tasks

aws ecs list-tasks \
  --cluster production \
  --service-name menotime-api \
  --region us-west-1 \
  --desired-status RUNNING

# Should show multiple tasks running

Verify Deployment Status

aws ecs describe-services \
  --cluster production \
  --services menotime-api \
  --region us-west-1 | jq '.services[0].deployments'

Expected output:

[
  {
    "status": "PRIMARY",
    "taskDefinition": "menotime-api-prod:42",
    "desiredCount": 3,
    "runningCount": 3,
    "pendingCount": 0
  }
]

Check CloudWatch Metrics

# Check error rate
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApplicationELB \
  --metric-name HTTPCode_Target_5XX_Count \
  --dimensions Name=TargetGroup,Value=menotime-api-prod \
  --start-time $(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum \
  --region us-west-1

Test Key Endpoints

# Health check
curl https://api.menotime-app.com/health

# Authentication endpoint
curl -X POST https://api.menotime-app.com/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "test@example.com", "password": "password"}'

# Patient list endpoint
curl -H "Authorization: Bearer PROD_TOKEN" \
  https://api.menotime-app.com/api/v1/patients?limit=10

Monitor Logs

# View real-time logs
aws logs tail /ecs/menotime-api-production --follow --region us-west-1

# Search for errors
aws logs filter-log-events \
  --log-group-name /ecs/menotime-api-production \
  --filter-pattern "ERROR" \
  --region us-west-1 \
  --start-time $(date -d '15 minutes ago' +%s)000

Monitoring Checklist

After deployment, monitor these metrics for 1 hour:

  • [ ] Error rate (5XX) < 0.1%
  • [ ] Response time p99 < 500ms
  • [ ] Database connection pool healthy
  • [ ] No CloudWatch alarms triggered
  • [ ] CPU utilization normal
  • [ ] Memory utilization normal
  • [ ] User reports look normal

Rollback Procedure

If critical issues occur in production, follow this rollback procedure.

Immediate Rollback (< 5 minutes)

If the deployment hasn't fully completed:

# Stop the new deployment
aws ecs update-service \
  --cluster production \
  --service menotime-api \
  --desired-count 0 \
  --region us-west-1

This removes the green tasks, keeping blue running.

Fast Rollback (5-30 minutes)

Rollback to the previous task definition:

# Get the previous task definition
CURRENT_TASK=$(aws ecs describe-services \
  --cluster production \
  --services menotime-api \
  --region us-west-1 \
  --query 'services[0].taskDefinition' \
  --output text)

# Get task definition revision number
CURRENT_REVISION=${CURRENT_TASK##*:}
PREVIOUS_REVISION=$((CURRENT_REVISION - 1))

# Revert to previous version
aws ecs update-service \
  --cluster production \
  --service menotime-api \
  --task-definition menotime-api-prod:$PREVIOUS_REVISION \
  --force-new-deployment \
  --region us-west-1

Wait for the service to stabilize:

aws ecs wait services-stable \
  --cluster production \
  --services menotime-api \
  --region us-west-1

Tag a Rollback Release

# Create a rollback tag
git tag -a v1.1.0-hotfix -m "Rollback from v1.2.0"
git push origin v1.1.0-hotfix

# Update main branch
git revert v1.2.0
git push origin main

Post-Rollback Analysis

  1. Identify root cause — Review logs and error traces
  2. Create issue — Log the problem in GitHub
  3. Fix the bug — Develop a fix on a feature branch
  4. Test thoroughly — Run extensive tests and staging validation
  5. Re-deploy — Create a new release with the fix

Hotfix Process

For critical production bugs, use the hotfix workflow:

git checkout main
git pull origin main
git checkout -b hotfix/fix-critical-bug

# Make your fix
git add .
git commit -m "hotfix: critical bug in patient endpoint"
git push origin hotfix/fix-critical-bug

Create PR to main with same approval process: - Label: hotfix - Require: Engineering Lead + DevOps approval - Quick turnaround (30 min approval target)

After merge and deployment:

# Also merge hotfix back to develop
git checkout develop
git pull origin develop
git merge hotfix/fix-critical-bug
git push origin develop

# Delete hotfix branch
git branch -d hotfix/fix-critical-bug
git push origin --delete hotfix/fix-critical-bug

Production Environment Details

Property Value
ECS Cluster production
ECS Service menotime-api
Task Definition menotime-api-prod
Desired Count 3 (high availability)
Container Port 8000
Load Balancer Application Load Balancer (ALB)
Database RDS PostgreSQL (menotime-prod)
Base URL https://api.menotime-app.com
Region us-west-1
Deployment Type Blue/Green
Health Check Interval 30 seconds

On-Call Procedures

If you're on-call and a production issue occurs:

  1. Assess severity — Is the API down or degraded?
  2. Alert team — Page relevant engineers
  3. Check dashboards — CloudWatch, Datadog, PagerDuty
  4. Review logs — Understand what's happening
  5. Decide on rollback — If critical, initiate rollback
  6. Post-incident — Document and discuss in retrospective