What is Infrastructure as Code and why is Terraform the most popular tool?

Infrastructure as Code (IaC) means defining cloud infrastructure in version-controlled configuration files rather than clicking through consoles. Terraform is the most popular IaC tool because it supports all major cloud providers through a common HCL syntax, has a large module ecosystem, and uses a plan/apply workflow that shows changes before applying them.

What is a Terraform state file and why must it be stored remotely?

Terraform's state file records the current state of all managed infrastructure so it can compute the diff when you apply changes. Storing it locally means only one person can run Terraform safely. Remote state (S3 + DynamoDB, Terraform Cloud) enables team collaboration with state locking to prevent concurrent modifications.

Infrastructure as Code with Terraform: Beginner to Pro Guide

Q: What is the difference between terraform plan and terraform apply?

terraform plan computes the changes needed to reach the desired state and shows them without making any modifications — it is always safe to run. terraform apply executes those changes against real infrastructure. Always review the plan output before applying, especially for destroy operations.

← Back to Architecture Hub

Infrastructure as Code with Terraform: Beginner to Pro Guide

Infrastructure as Code (IaC) means managing cloud infrastructure - servers, databases, networks, DNS records - through code files instead of through a web console. When your infrastructure is code, it is versionable, reviewable, repeatable, and testable. Terraform is the most widely used IaC tool, supporting all major cloud providers through a declarative, provider-based architecture.

This guide covers everything from your first Terraform file to production-grade module patterns and GitOps pipelines.

Why Infrastructure as Code?

The manual configuration problem:

An engineer logs into the AWS console and manually configures a server
Six months later, no one remembers exactly how it was configured
The server fails and needs to be rebuilt - but the configuration is lost
A slightly different manual rebuild causes mysterious bugs that take days to diagnose

With Terraform:

hcl

# This file IS the server configuration
resource "aws_instance" "web" {
  ami           = "ami-0c02fb55956c7d316"
  instance_type = "t3.medium"
  vpc_security_group_ids = [aws_security_group.web.id]
  tags = { Name = "web-server" }
}

The configuration is in git. Anyone can review it. If the server fails, terraform apply recreates it identically in minutes. If someone changes it manually, the drift is visible in terraform plan.

Terraform Core Concepts

Provider

A provider is a plugin that lets Terraform interact with an API (AWS, GCP, Azure, Cloudflare, GitHub, Kubernetes, etc.).

hcl

# versions.tf
terraform {
  required_version = ">= 1.7.0"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    cloudflare = {
      source  = "cloudflare/cloudflare"
      version = "~> 4.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

provider "cloudflare" {
  api_token = var.cloudflare_api_token
}

Resource

A resource is a piece of infrastructure you want to create or manage.

hcl

# A VPC
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
  tags = {
    Name        = "main-vpc"
    Environment = "production"
  }
}

# A subnet inside the VPC
resource "aws_subnet" "public_a" {
  vpc_id            = aws_vpc.main.id    # Reference to the VPC above
  cidr_block        = "10.0.1.0/24"
  availability_zone = "us-east-1a"
  map_public_ip_on_launch = true
  
  tags = { Name = "public-subnet-a" }
}

Resources reference each other using resource_type.resource_name.attribute syntax. Terraform automatically infers dependencies from these references.

Variable

Variables make configurations reusable across environments.

hcl

# variables.tf
variable "environment" {
  type        = string
  description = "Deployment environment (production, staging, development)"
  validation {
    condition     = contains(["production", "staging", "development"], var.environment)
    error_message = "Environment must be production, staging, or development."
  }
}

variable "instance_type" {
  type    = string
  default = "t3.medium"
}

variable "db_password" {
  type      = string
  sensitive = true    # Never show in logs or plan output
}

Set variable values in a terraform.tfvars file (never commit secrets here) or via environment variables:

bash

# Environment variables prefixed with TF_VAR_
export TF_VAR_db_password="my-secret-password"
export TF_VAR_environment="production"

Output

Outputs expose values from your Terraform configuration - useful for passing data between modules or displaying important information after apply.

hcl

# outputs.tf
output "vpc_id" {
  value       = aws_vpc.main.id
  description = "ID of the main VPC"
}

output "load_balancer_dns" {
  value       = aws_lb.main.dns_name
  description = "DNS name of the application load balancer"
}

output "rds_endpoint" {
  value     = aws_db_instance.main.endpoint
  sensitive = true    # Don't show in console output
}

Complete AWS Infrastructure Example

A production-ready setup with VPC, load balancer, EC2 instances, and RDS:

hcl

# main.tf - Complete production infrastructure

# --- Networking ---
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  tags = { Name = "${var.environment}-vpc" }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}

resource "aws_subnet" "public" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true
  tags = { Name = "${var.environment}-public-${count.index}" }
}

resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 10}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]
  tags = { Name = "${var.environment}-private-${count.index}" }
}

# --- Security Groups ---
resource "aws_security_group" "alb" {
  name   = "${var.environment}-alb-sg"
  vpc_id = aws_vpc.main.id
  
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "app" {
  name   = "${var.environment}-app-sg"
  vpc_id = aws_vpc.main.id
  
  ingress {
    from_port       = 3000
    to_port         = 3000
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]   # Only accept from ALB
  }
}

# --- Application Load Balancer ---
resource "aws_lb" "main" {
  name               = "${var.environment}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.public[*].id
}

resource "aws_lb_target_group" "app" {
  name     = "${var.environment}-app-tg"
  port     = 3000
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id
  
  health_check {
    path                = "/health"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    interval            = 30
  }
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.main.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = aws_acm_certificate_validation.main.certificate_arn
  
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

# --- RDS Database ---
resource "aws_db_subnet_group" "main" {
  name       = "${var.environment}-db-subnet-group"
  subnet_ids = aws_subnet.private[*].id
}

resource "aws_db_instance" "main" {
  identifier        = "${var.environment}-postgres"
  engine            = "postgres"
  engine_version    = "16.1"
  instance_class    = "db.t3.medium"
  allocated_storage = 100
  storage_encrypted = true
  
  db_name  = "myapp"
  username = "myapp_admin"
  password = var.db_password
  
  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.db.id]
  
  backup_retention_period = 7
  deletion_protection     = true    # Prevent accidental deletion
  skip_final_snapshot     = false
  
  tags = { Environment = var.environment }
}

Terraform State Management

Terraform tracks the real-world state of your infrastructure in a state file. The state tells Terraform what resources it created and their current attributes.

Remote State (Required for Teams)

Never store state locally when working in a team. Use remote state with locking:

hcl

# backend.tf
terraform {
  backend "s3" {
    bucket         = "my-company-terraform-state"
    key            = "production/main.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"  # Prevents concurrent applies
  }
}

Create the S3 bucket and DynamoDB table once (manually or with a bootstrap Terraform config) before using it as a backend.

State Commands

bash

# List all resources in state
terraform state list

# Show details of a specific resource
terraform state show aws_db_instance.main

# Remove a resource from state (doesn't destroy it - just stops tracking)
terraform state rm aws_db_instance.old

# Import an existing resource into state
terraform import aws_instance.web i-0123456789abcdef0

# Move resource to a different name (for refactoring)
terraform state mv aws_instance.app aws_instance.web

Terraform Modules

Modules are reusable packages of Terraform configuration. They promote DRY principles and let you parameterize common infrastructure patterns.

text

modules/
+-- vpc/
|   +-- main.tf
|   +-- variables.tf
|   +--- outputs.tf
+-- rds/
|   +-- main.tf
|   +-- variables.tf
|   +--- outputs.tf
+--- ecs-service/
    +-- main.tf
    +-- variables.tf
    +--- outputs.tf

Use a module:

hcl

# environments/production/main.tf
module "vpc" {
  source = "../../modules/vpc"
  
  environment     = "production"
  cidr_block      = "10.0.0.0/16"
  azs             = ["us-east-1a", "us-east-1b"]
  public_subnets  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnets = ["10.0.11.0/24", "10.0.12.0/24"]
}

module "database" {
  source = "../../modules/rds"
  
  environment    = "production"
  vpc_id         = module.vpc.vpc_id
  subnet_ids     = module.vpc.private_subnet_ids
  instance_class = "db.r6g.large"
  db_password    = var.db_password
}

Use public modules from the Terraform Registry:

hcl

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"
  
  cluster_name    = "my-eks-cluster"
  cluster_version = "1.29"
  vpc_id          = module.vpc.vpc_id
  subnet_ids      = module.vpc.private_subnet_ids
}

GitOps Workflow with GitHub Actions

The modern approach: terraform apply runs in CI, triggered by a merge to main.

yaml

# .github/workflows/terraform.yml
name: Terraform

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  id-token: write
  contents: read
  pull-requests: write

jobs:
  terraform:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: environments/production
    
    steps:
      - uses: actions/checkout@v4
      
      # Use OIDC - no stored AWS credentials
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/terraform-role
          aws-region: us-east-1
      
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0
      
      - name: Terraform Init
        run: terraform init
      
      - name: Terraform Format Check
        run: terraform fmt -check
      
      - name: Terraform Validate
        run: terraform validate
      
      - name: Terraform Plan
        id: plan
        run: terraform plan -out=tfplan -no-color
        env:
          TF_VAR_db_password: ${{ secrets.DB_PASSWORD }}
      
      # Post plan as PR comment
      - uses: actions/github-script@v7
        if: github.event_name == 'pull_request'
        with:
          script: |
            const plan = '${{ steps.plan.outputs.stdout }}';
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Terraform Plan\n\`\`\`\n${plan}\n\`\`\``
            });
      
      # Apply only on push to main (not on PRs)
      - name: Terraform Apply
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform apply -auto-approve tfplan

This workflow:

Plans on every PR and posts the diff as a PR comment
Applies automatically when PR is merged to main
Uses OIDC (no static AWS credentials stored in GitHub secrets)

Frequently Asked Questions

Q: What is the Terraform state file and what happens if I lose it?

The state file (terraform.tfstate) records every resource Terraform created and its current attributes. If you lose it, Terraform thinks it has created nothing and will try to create everything again - potentially creating duplicate resources or conflicting with existing ones. Always use remote state (S3 + DynamoDB for AWS) and enable versioning on the S3 bucket so state is never permanently lost.

Q: What is the difference between terraform plan and terraform apply?

terraform plan shows what changes will be made without making them. It compares your HCL configuration against the current state and shows additions (green +), modifications (yellow ~), and deletions (red -). terraform apply executes those changes. Always review the plan output before applying, especially for deletions.

Q: Terraform vs. Pulumi vs. AWS CDK - which should I choose?

Terraform (HCL) is the industry standard with the largest ecosystem and widest cloud provider support. Pulumi and AWS CDK let you write infrastructure in TypeScript, Python, or Go - better for teams who prefer programming languages over DSLs. For most teams starting IaC in 2026, Terraform is the safest choice due to community size, module availability, and hiring pool.

Q: How do I manage different environments (dev, staging, production)?

Use separate state files per environment. Options: separate directories (environments/dev/, environments/staging/), Terraform workspaces (simpler but shares a codebase), or separate repositories per environment. For most teams, separate directories with shared modules gives the best balance of isolation and code reuse.

Key Takeaway

Terraform transforms cloud infrastructure from a fragile manual process into a version-controlled, peer-reviewed, repeatable engineering practice. Your entire production environment - VPCs, load balancers, databases, DNS records, IAM policies - lives in git as HCL files. Changes go through PR review. The plan shows exactly what will change before it changes. GitOps completes the picture by making terraform apply a CI step rather than a manual command. Start by codifying one existing environment, store state remotely, and build the GitOps workflow before you scale to multiple environments.

Part of the Software Architecture Hub - engineering the automation.