Infrastructure as Code with Terraform on AWS: Patterns for Teams
Module structure, remote state management, workspace strategy, and CI/CD integration patterns that scale from a 3-person startup to a 50-engineer platform team.
Why Terraform Discipline Matters
Terraform is straightforward when one person manages one environment. It becomes a coordination problem — and a source of outages — when five engineers are managing eight environments across three AWS accounts.
These patterns come from hard lessons on teams where "just run terraform apply" caused production incidents.
Project Structure
The most important decision: one state file per logical unit of isolation.
infrastructure/
├── modules/
│ ├── networking/ # VPC, subnets, NAT, SGs
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── ecs-service/ # ECS task + service + ALB target group
│ ├── rds-postgres/ # RDS cluster + parameter group + SGs
│ └── s3-bucket/ # S3 + policy + lifecycle rules
│
├── environments/
│ ├── prod/
│ │ ├── networking/ # Separate state per layer
│ │ │ └── main.tf
│ │ ├── databases/
│ │ │ └── main.tf
│ │ └── services/
│ │ └── main.tf
│ ├── staging/
│ └── dev/
│
└── .github/
└── workflows/
└── terraform.yml
Never put networking, databases, and application services in the same state file. A failed app deploy should never be able to destroy your VPC.
Remote State with S3 and DynamoDB Locking
# environments/prod/networking/main.tf
terraform {
backend "s3" {
bucket = "your-company-tfstate-prod"
key = "prod/networking/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-locks"
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# Reference outputs from another state file
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "your-company-tfstate-prod"
key = "prod/networking/terraform.tfstate"
region = "us-east-1"
}
}
module "api_service" {
source = "../../../modules/ecs-service"
vpc_id = data.terraform_remote_state.networking.outputs.vpc_id
private_subnets = data.terraform_remote_state.networking.outputs.private_subnet_ids
# ...
}
Create the S3 bucket and DynamoDB table with versioning enabled before anything else. This is the one thing you manage manually.
Module Design Rules
A good Terraform module has:
- One clear purpose —
rds-postgrescreates exactly one RDS instance. Not "the entire database layer." - No hard-coded values — everything is a variable with a sensible default
- Outputs for everything downstream might need
# modules/rds-postgres/variables.tf
variable "identifier" {
description = "Unique identifier for this RDS instance"
type = string
}
variable "instance_class" {
description = "RDS instance type"
type = string
default = "db.t3.medium"
}
variable "allocated_storage_gb" {
type = number
default = 20
}
variable "deletion_protection" {
description = "Prevent accidental deletion. Set to true in production."
type = bool
default = true
}
CI/CD Integration with GitHub Actions
Never let engineers run terraform apply from their laptops against production.
# .github/workflows/terraform.yml
name: Terraform
on:
pull_request:
paths: ["infrastructure/**"]
push:
branches: [main]
paths: ["infrastructure/**"]
jobs:
plan:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.7.0"
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/GitHubActionsReadOnly
aws-region: us-east-1
- name: Terraform Plan
working-directory: infrastructure/environments/prod/services
run: |
terraform init
terraform plan -no-color -out=tfplan
- name: Comment Plan on PR
uses: actions/github-script@v7
with:
script: |
const output = `\${{ steps.plan.outputs.stdout }}`;
github.rest.issues.createComment({
issue_number: context.issue.number,
body: `### Terraform Plan\n\`\`\`\n${output}\n\`\`\``
});
apply:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
environment: production # Requires manual approval in GitHub
steps:
- uses: actions/checkout@v4
- name: Terraform Apply
run: terraform apply -auto-approve
The Rules We Enforce
- Plan on PR, apply on merge — no exceptions for production
deletion_protection = trueon all stateful resources in prod- Never use
countfor resources with identity (databases, IAM roles) — usefor_eachwith a map terraform fmtenforced in CI — inconsistent formatting is a merge blockerterraform validateandtfsecrun on every PR
These constraints feel slow until the day they prevent a 2-hour production outage.