Deploy to Production
This guide covers the procedures for deploying code to production with approval gates, verification, and rollback strategies.
Production Deployment Workflow
develop branch → release/vX.X.X → PR to main → Approval
→ Merge to main → Tag release → Build & Deploy → Verification → Monitor
Prerequisites
Before deploying to production, ensure:
- ✅ All code is merged to
developand tested in staging - ✅ Staging deployment has been verified for 24+ hours
- ✅ No blocking issues reported
- ✅ Database migrations are tested in staging
- ✅ You have access to production AWS account
- ✅ You have admin access to GitHub repository
Step 1: Create a Release Branch
Create a release branch from develop:
git checkout develop
git pull origin develop
git checkout -b release/v1.2.0
Follow semantic versioning: vMAJOR.MINOR.PATCH
Step 2: Pre-Deployment Checklist
Before creating the PR, complete this checklist:
- ✅ Update version number in
pyproject.tomlorapp/__init__.py - ✅ Update
CHANGELOG.mdwith release notes - ✅ Run full test suite:
pytest - ✅ Run security scan:
bandit -r app/ - ✅ Run linting:
flake8 app/ - ✅ Verify no hardcoded secrets:
git log -p -S 'password=' --all - ✅ Test database migrations on staging data clone
- ✅ Verify all critical endpoints in staging
- ✅ Check monitoring and alerting setup
Example CHANGELOG.md Update
## [1.2.0] - 2024-01-20
### Added
- User profile endpoints (GET, POST, PATCH)
- Patient symptom tracking feature
- Admin dashboard reporting
### Fixed
- Authorization header validation bug
- Database connection pool timeout
### Changed
- API response format now includes `meta` object
- Improved pagination limit to 100 items
### Security
- Updated dependencies to patch CVE-2024-1234
Step 3: Create Release PR
Push the release branch and create a PR to main:
git add .
git commit -m "chore: prepare release v1.2.0"
git push origin release/v1.2.0
Create PR on GitHub:
- Base: main
- Compare: release/v1.2.0
- Title: Release: v1.2.0
- Description: Include release notes
PR Description Template
## Release Information
**Version**: v1.2.0
**Release Date**: 2024-01-20
## What's included
- User profile endpoints
- Patient symptom tracking
- Security patches
## Testing completed
- [x] Unit tests passed
- [x] Integration tests passed
- [x] Staging verified for 24 hours
- [x] Database migrations tested
- [x] No critical issues reported
## Deployment plan
- [ ] Staging deployment verified
- [ ] Production approval obtained
- [ ] Blue/green deployment ready
- [ ] Rollback plan reviewed
Step 4: Code Review and Approval
Code review process for production:
- Request review from at least 2 senior engineers
- Address all review comments
- Obtain explicit approval from engineering lead
- Obtain sign-off from product manager
Required Approvals
- [ ] Engineering Lead approval
- [ ] DevOps/Infrastructure approval
- [ ] Product Manager approval
- [ ] Security review (if changes affect auth/data)
Step 5: Merge to Main
Once all approvals are obtained:
# Option 1: Merge via GitHub UI
# Use "Create a merge commit" (do NOT squash for production)
Or via command line:
git checkout main
git pull origin main
git merge --no-ff release/v1.2.0 # --no-ff preserves history
git push origin main
Step 6: Create a Release Tag
Tag the release commit:
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0
Verify the tag:
git tag -l v1.2.0
git show v1.2.0
Step 7: Automatic Production Deployment
When you push to main, GitHub Actions triggers the production deployment pipeline.
The .github/workflows/deploy-production.yml workflow:
- Runs full test suite — Comprehensive testing
- Security scanning — Checks for vulnerabilities
- Builds Docker image — Creates production image
- Pushes to ECR — Uploads to registry
- Blue/Green Deployment — Deploys with zero downtime
- Health checks — Verifies new tasks are healthy
- Smoke tests — Tests critical endpoints
- Notification — Alerts team of completion
Monitor the Pipeline
# View GitHub Actions
# Go to: Actions → Select workflow run → View details
# Or via CLI
gh run list --branch main
gh run view RUN_ID --log
Understanding Blue/Green Deployment
MenoTime uses blue/green deployment for zero-downtime updates:
[Blue Task - v1.1.0] [Green Task - v1.2.0]
↓ ↓
[Load Balancer Routes Traffic]
↓ ↓
[ALB Health Check - PASSED] [ALB Health Check - PASSED]
When green task is healthy:
Load Balancer switches traffic → All traffic to v1.2.0
Blue task automatically stopped
How Blue/Green Works in ECS
- Production has 2 task definitions running in parallel
- New version (green) launches alongside old version (blue)
- When green passes health checks, ALB routes traffic to it
- Old version (blue) is kept running for 5 minutes
- If green fails, traffic immediately reverts to blue
- After 5 minutes, blue task is stopped
Step 8: Production Verification
After deployment, verify production is healthy:
Health Check
curl https://api.menotime-app.com/health
# {"status": "healthy", "timestamp": "2024-01-20T14:30:00Z"}
Check Active Tasks
aws ecs list-tasks \
--cluster production \
--service-name menotime-api \
--region us-west-1 \
--desired-status RUNNING
# Should show multiple tasks running
Verify Deployment Status
aws ecs describe-services \
--cluster production \
--services menotime-api \
--region us-west-1 | jq '.services[0].deployments'
Expected output:
[
{
"status": "PRIMARY",
"taskDefinition": "menotime-api-prod:42",
"desiredCount": 3,
"runningCount": 3,
"pendingCount": 0
}
]
Check CloudWatch Metrics
# Check error rate
aws cloudwatch get-metric-statistics \
--namespace AWS/ApplicationELB \
--metric-name HTTPCode_Target_5XX_Count \
--dimensions Name=TargetGroup,Value=menotime-api-prod \
--start-time $(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Sum \
--region us-west-1
Test Key Endpoints
# Health check
curl https://api.menotime-app.com/health
# Authentication endpoint
curl -X POST https://api.menotime-app.com/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "test@example.com", "password": "password"}'
# Patient list endpoint
curl -H "Authorization: Bearer PROD_TOKEN" \
https://api.menotime-app.com/api/v1/patients?limit=10
Monitor Logs
# View real-time logs
aws logs tail /ecs/menotime-api-production --follow --region us-west-1
# Search for errors
aws logs filter-log-events \
--log-group-name /ecs/menotime-api-production \
--filter-pattern "ERROR" \
--region us-west-1 \
--start-time $(date -d '15 minutes ago' +%s)000
Monitoring Checklist
After deployment, monitor these metrics for 1 hour:
- [ ] Error rate (5XX) < 0.1%
- [ ] Response time p99 < 500ms
- [ ] Database connection pool healthy
- [ ] No CloudWatch alarms triggered
- [ ] CPU utilization normal
- [ ] Memory utilization normal
- [ ] User reports look normal
Rollback Procedure
If critical issues occur in production, follow this rollback procedure.
Immediate Rollback (< 5 minutes)
If the deployment hasn't fully completed:
# Stop the new deployment
aws ecs update-service \
--cluster production \
--service menotime-api \
--desired-count 0 \
--region us-west-1
This removes the green tasks, keeping blue running.
Fast Rollback (5-30 minutes)
Rollback to the previous task definition:
# Get the previous task definition
CURRENT_TASK=$(aws ecs describe-services \
--cluster production \
--services menotime-api \
--region us-west-1 \
--query 'services[0].taskDefinition' \
--output text)
# Get task definition revision number
CURRENT_REVISION=${CURRENT_TASK##*:}
PREVIOUS_REVISION=$((CURRENT_REVISION - 1))
# Revert to previous version
aws ecs update-service \
--cluster production \
--service menotime-api \
--task-definition menotime-api-prod:$PREVIOUS_REVISION \
--force-new-deployment \
--region us-west-1
Wait for the service to stabilize:
aws ecs wait services-stable \
--cluster production \
--services menotime-api \
--region us-west-1
Tag a Rollback Release
# Create a rollback tag
git tag -a v1.1.0-hotfix -m "Rollback from v1.2.0"
git push origin v1.1.0-hotfix
# Update main branch
git revert v1.2.0
git push origin main
Post-Rollback Analysis
- Identify root cause — Review logs and error traces
- Create issue — Log the problem in GitHub
- Fix the bug — Develop a fix on a feature branch
- Test thoroughly — Run extensive tests and staging validation
- Re-deploy — Create a new release with the fix
Hotfix Process
For critical production bugs, use the hotfix workflow:
git checkout main
git pull origin main
git checkout -b hotfix/fix-critical-bug
# Make your fix
git add .
git commit -m "hotfix: critical bug in patient endpoint"
git push origin hotfix/fix-critical-bug
Create PR to main with same approval process:
- Label: hotfix
- Require: Engineering Lead + DevOps approval
- Quick turnaround (30 min approval target)
After merge and deployment:
# Also merge hotfix back to develop
git checkout develop
git pull origin develop
git merge hotfix/fix-critical-bug
git push origin develop
# Delete hotfix branch
git branch -d hotfix/fix-critical-bug
git push origin --delete hotfix/fix-critical-bug
Production Environment Details
| Property | Value |
|---|---|
| ECS Cluster | production |
| ECS Service | menotime-api |
| Task Definition | menotime-api-prod |
| Desired Count | 3 (high availability) |
| Container Port | 8000 |
| Load Balancer | Application Load Balancer (ALB) |
| Database | RDS PostgreSQL (menotime-prod) |
| Base URL | https://api.menotime-app.com |
| Region | us-west-1 |
| Deployment Type | Blue/Green |
| Health Check Interval | 30 seconds |
On-Call Procedures
If you're on-call and a production issue occurs:
- Assess severity — Is the API down or degraded?
- Alert team — Page relevant engineers
- Check dashboards — CloudWatch, Datadog, PagerDuty
- Review logs — Understand what's happening
- Decide on rollback — If critical, initiate rollback
- Post-incident — Document and discuss in retrospective