Infraestructura como Código: Automatiza y Gestiona tu Infraestructura de Forma Eficiente

La Infraestructura como Código (IaC) ha revolucionado la manera en que las organizaciones modernas gestionan y despliegan su infraestructura. Este paradigma transforma la administración de sistemas desde procesos manuales propensos a errores hacia un enfoque automatizado, versionado y repetible que trata la infraestructura con los mismos principios que el desarrollo de software.

En esta guía comprehensiva, exploraremos cómo implementar IaC efectivamente, desde conceptos fundamentales hasta patrones empresariales avanzados, utilizando las mejores herramientas y prácticas del ecosistema.

Fundamentos de la Infraestructura como Código

Definición y Principios Fundamentales

La Infraestructura como Código es una metodología que gestiona y aprovisiona infraestructura través de archivos de definición legibles por máquina, en lugar de configuración física de hardware o herramientas de configuración interactivas.

Principios fundamentales:

Declarativo vs Imperativo: Define el estado deseado, no los pasos para alcanzarlo. El sistema se encarga de determinar las acciones necesarias.

Inmutabilidad: Los recursos no se modifican directamente en producción. Los cambios se realizan mediante nuevas versiones de la definición.

Idempotencia: Ejecutar la misma configuración múltiples veces produce el mismo resultado, sin efectos secundarios.

Control de versiones: Todo cambio queda registrado, permitiendo trazabilidad completa y rollbacks seguros.

Beneficios Empresariales

Reducción de costos operacionales: Automatización elimina tareas manuales repetitivas, liberando tiempo para actividades de mayor valor.

Consistencia entre entornos: Desarrollo, staging y producción utilizan las mismas definiciones, eliminando el “funciona en mi máquina”.

Velocidad de despliegue: Provisioning de entornos completos en minutos en lugar de días o semanas.

Cumplimiento y auditoría: Todo cambio queda documentado y puede ser auditado automáticamente.

Recuperación ante desastres: Infraestructura completa puede ser recreada desde código fuente.

Herramientas del Ecosistema IaC

Terraform: El Estándar de Facto

Terraform de HashiCorp se ha posicionado como la herramienta líder para IaC multi-cloud. Su sintaxis declarativa HCL (HashiCorp Configuration Language) permite definir recursos de manera intuitiva.

# Ejemplo: Infraestructura web básica en AWS
terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
  
  default_tags {
    tags = {
      Environment = var.environment
      Project     = var.project_name
      ManagedBy   = "terraform"
    }
  }
}

# VPC y subredes
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.project_name}-vpc-${var.environment}"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-igw-${var.environment}"
  }
}

resource "aws_subnet" "public" {
  count = length(var.availability_zones)

  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.project_name}-public-${var.availability_zones[count.index]}"
    Type = "public"
  }
}

resource "aws_subnet" "private" {
  count = length(var.availability_zones)

  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + length(var.availability_zones))
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.project_name}-private-${var.availability_zones[count.index]}"
    Type = "private"
  }
}

# Tabla de rutas
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${var.project_name}-public-rt"
  }
}

resource "aws_route_table_association" "public" {
  count = length(aws_subnet.public)

  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# Application Load Balancer
resource "aws_lb" "main" {
  name               = "${var.project_name}-alb-${var.environment}"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.public[*].id

  enable_deletion_protection = var.environment == "production" ? true : false

  tags = {
    Name = "${var.project_name}-alb-${var.environment}"
  }
}

# Security Groups
resource "aws_security_group" "alb" {
  name_prefix = "${var.project_name}-alb-"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "HTTP"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTPS"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-alb-sg"
  }
}

resource "aws_security_group" "app" {
  name_prefix = "${var.project_name}-app-"
  vpc_id      = aws_vpc.main.id

  ingress {
    description     = "HTTP from ALB"
    from_port       = var.app_port
    to_port         = var.app_port
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-app-sg"
  }
}

Configuración de Variables y Outputs

# variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-west-2"
}

variable "environment" {
  description = "Environment name"
  type        = string
  validation {
    condition = contains(["development", "staging", "production"], var.environment)
    error_message = "Environment must be development, staging, or production."
  }
}

variable "project_name" {
  description = "Project name for resource naming"
  type        = string
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "Availability zones"
  type        = list(string)
  default     = ["us-west-2a", "us-west-2b", "us-west-2c"]
}

variable "app_port" {
  description = "Application port"
  type        = number
  default     = 3000
}

# outputs.tf
output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "public_subnet_ids" {
  description = "IDs of the public subnets"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "IDs of the private subnets"
  value       = aws_subnet.private[*].id
}

output "load_balancer_dns" {
  description = "DNS name of the load balancer"
  value       = aws_lb.main.dns_name
}

output "security_group_app_id" {
  description = "ID of the application security group"
  value       = aws_security_group.app.id
}

Gestión de Estado Remoto

# backend.tf
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "environments/production/terraform.tfstate"
    region         = "us-west-2"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

Patrones Avanzados de IaC

Modularización y Reutilización

# modules/web-app/main.tf
resource "aws_launch_template" "app" {
  name_prefix   = "${var.name}-"
  image_id      = var.ami_id
  instance_type = var.instance_type
  key_name      = var.key_name

  vpc_security_group_ids = var.security_group_ids

  user_data = base64encode(templatefile("${path.module}/user-data.sh", {
    app_port = var.app_port
    app_name = var.name
  }))

  tag_specifications {
    resource_type = "instance"
    tags = merge(var.common_tags, {
      Name = "${var.name}-instance"
    })
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "app" {
  name                = "${var.name}-asg"
  vpc_zone_identifier = var.subnet_ids
  target_group_arns   = [aws_lb_target_group.app.arn]
  health_check_type   = "ELB"

  min_size         = var.min_size
  max_size         = var.max_size
  desired_capacity = var.desired_capacity

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 50
    }
  }

  tag {
    key                 = "Name"
    value               = "${var.name}-asg"
    propagate_at_launch = false
  }

  dynamic "tag" {
    for_each = var.common_tags
    content {
      key                 = tag.key
      value               = tag.value
      propagate_at_launch = true
    }
  }
}

resource "aws_lb_target_group" "app" {
  name     = "${var.name}-tg"
  port     = var.app_port
  protocol = "HTTP"
  vpc_id   = var.vpc_id

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    matcher             = "200"
    path                = var.health_check_path
    port                = "traffic-port"
    protocol            = "HTTP"
    timeout             = 5
    unhealthy_threshold = 2
  }

  tags = var.common_tags
}

# Auto Scaling Policies
resource "aws_autoscaling_policy" "scale_up" {
  name                   = "${var.name}-scale-up"
  scaling_adjustment     = 1
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.app.name
}

resource "aws_autoscaling_policy" "scale_down" {
  name                   = "${var.name}-scale-down"
  scaling_adjustment     = -1
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.app.name
}

# CloudWatch Alarms
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "${var.name}-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = "120"
  statistic           = "Average"
  threshold           = "80"
  alarm_description   = "This metric monitors ec2 cpu utilization"
  alarm_actions       = [aws_autoscaling_policy.scale_up.arn]

  dimensions = {
    AutoScalingGroupName = aws_autoscaling_group.app.name
  }
}

resource "aws_cloudwatch_metric_alarm" "cpu_low" {
  alarm_name          = "${var.name}-cpu-low"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = "120"
  statistic           = "Average"
  threshold           = "10"
  alarm_description   = "This metric monitors ec2 cpu utilization"
  alarm_actions       = [aws_autoscaling_policy.scale_down.arn]

  dimensions = {
    AutoScalingGroupName = aws_autoscaling_group.app.name
  }
}

Configuración Multi-Entorno

# environments/production/main.tf
module "vpc" {
  source = "../../modules/vpc"

  project_name       = "myapp"
  environment        = "production"
  vpc_cidr          = "10.0.0.0/16"
  availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]

  common_tags = local.common_tags
}

module "web_app" {
  source = "../../modules/web-app"

  name               = "myapp-web"
  vpc_id            = module.vpc.vpc_id
  subnet_ids        = module.vpc.private_subnet_ids
  security_group_ids = [module.vpc.app_security_group_id]
  
  instance_type     = "c5.large"
  min_size         = 3
  max_size         = 10
  desired_capacity = 5
  
  app_port           = 3000
  health_check_path  = "/health"
  
  common_tags = local.common_tags
}

locals {
  common_tags = {
    Environment = "production"
    Project     = "myapp"
    Owner       = "platform-team"
    CostCenter  = "engineering"
    ManagedBy   = "terraform"
  }
}

Herramientas Complementarias

AWS CDK: Infraestructura con Lenguajes de Programación

// AWS CDK con TypeScript
import * as cdk from 'aws-cdk-lib';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';
import * as logs from 'aws-cdk-lib/aws-logs';
import { Construct } from 'constructs';

export class WebApplicationStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // VPC
    const vpc = new ec2.Vpc(this, 'VPC', {
      maxAzs: 3,
      natGateways: 3,
      cidr: '10.0.0.0/16',
      subnetConfiguration: [
        {
          cidrMask: 24,
          name: 'public',
          subnetType: ec2.SubnetType.PUBLIC,
        },
        {
          cidrMask: 24,
          name: 'private',
          subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
        },
        {
          cidrMask: 28,
          name: 'database',
          subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
        },
      ],
    });

    // ECS Cluster
    const cluster = new ecs.Cluster(this, 'Cluster', {
      vpc,
      containerInsights: true,
      clusterName: 'web-application-cluster',
    });

    // Application Load Balancer
    const alb = new elbv2.ApplicationLoadBalancer(this, 'ALB', {
      vpc,
      internetFacing: true,
    });

    // Task Definition
    const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDef', {
      memoryLimitMiB: 2048,
      cpu: 1024,
    });

    const logGroup = new logs.LogGroup(this, 'LogGroup', {
      retention: logs.RetentionDays.ONE_WEEK,
    });

    taskDefinition.addContainer('WebContainer', {
      image: ecs.ContainerImage.fromRegistry('nginx:alpine'),
      portMappings: [{ containerPort: 80 }],
      logging: ecs.LogDrivers.awsLogs({
        streamPrefix: 'web-app',
        logGroup,
      }),
    });

    // ECS Service
    const service = new ecs.FargateService(this, 'Service', {
      cluster,
      taskDefinition,
      desiredCount: 3,
      assignPublicIp: false,
    });

    // Target Group
    const targetGroup = new elbv2.ApplicationTargetGroup(this, 'TargetGroup', {
      port: 80,
      vpc,
      protocol: elbv2.ApplicationProtocol.HTTP,
      targetType: elbv2.TargetType.IP,
      healthCheck: {
        path: '/',
        healthyHttpCodes: '200',
      },
    });

    service.attachToApplicationTargetGroup(targetGroup);

    // Listener
    alb.addListener('Listener', {
      port: 80,
      defaultTargetGroups: [targetGroup],
    });

    // Auto Scaling
    const scalableTarget = service.autoScaleTaskCount({
      minCapacity: 2,
      maxCapacity: 10,
    });

    scalableTarget.scaleOnCpuUtilization('CpuScaling', {
      targetUtilizationPercent: 70,
    });

    // Outputs
    new cdk.CfnOutput(this, 'LoadBalancerDNS', {
      value: alb.loadBalancerDnsName,
    });
  }
}

Pulumi: IaC con Programación Orientada a Objetos

# Pulumi con Python
import pulumi
import pulumi_aws as aws
from pulumi_aws import ec2, ecs, elasticloadbalancingv2 as elbv2

# Configuration
config = pulumi.Config()
project_name = pulumi.get_project()
stack_name = pulumi.get_stack()

# VPC
vpc = ec2.Vpc("vpc",
    cidr_block="10.0.0.0/16",
    enable_dns_hostnames=True,
    enable_dns_support=True,
    tags={
        "Name": f"{project_name}-vpc-{stack_name}",
        "Project": project_name,
        "Stack": stack_name,
    })

# Internet Gateway
igw = ec2.InternetGateway("igw",
    vpc_id=vpc.id,
    tags={"Name": f"{project_name}-igw"})

# Subnets
availability_zones = aws.get_availability_zones().names

public_subnets = []
private_subnets = []

for i, az in enumerate(availability_zones[:3]):
    # Public subnet
    public_subnet = ec2.Subnet(f"public-subnet-{i}",
        vpc_id=vpc.id,
        cidr_block=f"10.0.{i}.0/24",
        availability_zone=az,
        map_public_ip_on_launch=True,
        tags={
            "Name": f"{project_name}-public-{az}",
            "Type": "public",
        })
    public_subnets.append(public_subnet)

    # Private subnet
    private_subnet = ec2.Subnet(f"private-subnet-{i}",
        vpc_id=vpc.id,
        cidr_block=f"10.0.{i + 10}.0/24",
        availability_zone=az,
        tags={
            "Name": f"{project_name}-private-{az}",
            "Type": "private",
        })
    private_subnets.append(private_subnet)

# Route Tables
public_rt = ec2.RouteTable("public-rt",
    vpc_id=vpc.id,
    routes=[{
        "cidr_block": "0.0.0.0/0",
        "gateway_id": igw.id,
    }],
    tags={"Name": f"{project_name}-public-rt"})

for i, subnet in enumerate(public_subnets):
    ec2.RouteTableAssociation(f"public-rt-assoc-{i}",
        subnet_id=subnet.id,
        route_table_id=public_rt.id)

# Security Groups
class SecurityGroupBuilder:
    @staticmethod
    def create_alb_sg(vpc_id: pulumi.Input[str]) -> ec2.SecurityGroup:
        return ec2.SecurityGroup("alb-sg",
            vpc_id=vpc_id,
            description="Security group for Application Load Balancer",
            ingress=[
                {"protocol": "tcp", "from_port": 80, "to_port": 80, "cidr_blocks": ["0.0.0.0/0"]},
                {"protocol": "tcp", "from_port": 443, "to_port": 443, "cidr_blocks": ["0.0.0.0/0"]},
            ],
            egress=[{"protocol": "-1", "from_port": 0, "to_port": 0, "cidr_blocks": ["0.0.0.0/0"]}],
            tags={"Name": f"{project_name}-alb-sg"})

    @staticmethod
    def create_app_sg(vpc_id: pulumi.Input[str], alb_sg_id: pulumi.Input[str]) -> ec2.SecurityGroup:
        return ec2.SecurityGroup("app-sg",
            vpc_id=vpc_id,
            description="Security group for application",
            ingress=[{
                "protocol": "tcp",
                "from_port": 3000,
                "to_port": 3000,
                "security_groups": [alb_sg_id],
            }],
            egress=[{"protocol": "-1", "from_port": 0, "to_port": 0, "cidr_blocks": ["0.0.0.0/0"]}],
            tags={"Name": f"{project_name}-app-sg"})

alb_sg = SecurityGroupBuilder.create_alb_sg(vpc.id)
app_sg = SecurityGroupBuilder.create_app_sg(vpc.id, alb_sg.id)

# Application Load Balancer
alb = elbv2.LoadBalancer("alb",
    load_balancer_type="application",
    security_groups=[alb_sg.id],
    subnets=[subnet.id for subnet in public_subnets],
    tags={"Name": f"{project_name}-alb"})

# Exports
pulumi.export("vpc_id", vpc.id)
pulumi.export("public_subnet_ids", [subnet.id for subnet in public_subnets])
pulumi.export("private_subnet_ids", [subnet.id for subnet in private_subnets])
pulumi.export("alb_dns_name", alb.dns_name)

GitOps e Integración CI/CD

Pipeline de Terraform con GitHub Actions

# .github/workflows/terraform.yml
name: Terraform CI/CD

on:
  push:
    branches: [main, develop]
    paths: ['terraform/**']
  pull_request:
    branches: [main]
    paths: ['terraform/**']

env:
  TF_VERSION: '1.6.0'
  AWS_REGION: 'us-west-2'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v3
      with:
        terraform_version: ${{ env.TF_VERSION }}

    - name: Terraform Format Check
      run: terraform fmt -check -recursive terraform/

    - name: Terraform Init
      run: |
        cd terraform/environments/staging
        terraform init -backend=false

    - name: Terraform Validate
      run: |
        cd terraform/environments/staging
        terraform validate

    - name: Run TFSec Security Scan
      uses: aquasecurity/tfsec-action@v1.0.0
      with:
        working_directory: terraform/

    - name: Run Checkov Security Scan
      uses: bridgecrewio/checkov-action@master
      with:
        directory: terraform/
        quiet: true
        framework: terraform

  plan:
    runs-on: ubuntu-latest
    needs: validate
    if: github.event_name == 'pull_request'
    steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: ${{ env.AWS_REGION }}

    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v3
      with:
        terraform_version: ${{ env.TF_VERSION }}

    - name: Terraform Init
      run: |
        cd terraform/environments/staging
        terraform init

    - name: Terraform Plan
      run: |
        cd terraform/environments/staging
        terraform plan -no-color -out=tfplan
        
    - name: Save Plan
      uses: actions/upload-artifact@v3
      with:
        name: terraform-plan
        path: terraform/environments/staging/tfplan

    - name: Comment Plan on PR
      uses: actions/github-script@v7
      if: github.event_name == 'pull_request'
      with:
        script: |
          const fs = require('fs');
          const { execSync } = require('child_process');
          
          try {
            const planOutput = execSync('cd terraform/environments/staging && terraform show -no-color tfplan', 
              { encoding: 'utf-8', maxBuffer: 1024 * 1024 });
            
            const comment = `## Terraform Plan Results
            
            <details><summary>Show Plan</summary>
            
            \`\`\`hcl
            ${planOutput}
            \`\`\`
            
            </details>
            
            Plan generated for commit: ${context.sha}`;
            
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: comment
            });
          } catch (error) {
            console.error('Error posting plan comment:', error);
          }

  apply:
    runs-on: ubuntu-latest
    needs: validate
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: ${{ env.AWS_REGION }}

    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v3
      with:
        terraform_version: ${{ env.TF_VERSION }}

    - name: Terraform Init
      run: |
        cd terraform/environments/production
        terraform init

    - name: Terraform Apply
      run: |
        cd terraform/environments/production
        terraform apply -auto-approve

    - name: Update Infrastructure Documentation
      run: |
        cd terraform/environments/production
        terraform output -json > ../../../docs/infrastructure-outputs.json
        
    - name: Notify Slack
      uses: 8398a7/action-slack@v3
      with:
        status: ${{ job.status }}
        text: |
          Infrastructure deployment ${{ job.status }}!
          Environment: Production
          Commit: ${{ github.sha }}
          Actor: ${{ github.actor }}
      env:
        SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
      if: always()

Mejores Prácticas y Patrones Empresariales

Testing de Infraestructura

// Terratest con Go
package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/gruntwork-io/terratest/modules/aws"
    "github.com/stretchr/testify/assert"
)

func TestTerraformInfrastructure(t *testing.T) {
    t.Parallel()

    // Configuración de Terraform
    terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
        TerraformDir: "../terraform/environments/test",
        Vars: map[string]interface{}{
            "environment":    "test",
            "project_name":   "terratest",
            "instance_type":  "t3.micro",
        },
    })

    // Limpiar recursos después del test
    defer terraform.Destroy(t, terraformOptions)

    // Aplicar configuración de Terraform
    terraform.InitAndApply(t, terraformOptions)

    // Obtener outputs
    vpcId := terraform.Output(t, terraformOptions, "vpc_id")
    albDns := terraform.Output(t, terraformOptions, "load_balancer_dns")

    // Validaciones
    assert.NotEmpty(t, vpcId)
    assert.NotEmpty(t, albDns)

    // Verificar que la VPC existe en AWS
    awsRegion := "us-west-2"
    aws.GetVpcById(t, vpcId, awsRegion)

    // Verificar que el ALB responde
    url := "http://" + albDns
    http_helper.HttpGetWithRetry(t, url, nil, 200, "nginx", 30, 5*time.Second)
}

func TestSecurityGroups(t *testing.T) {
    t.Parallel()

    terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
        TerraformDir: "../terraform/modules/security-groups",
        Vars: map[string]interface{}{
            "vpc_id": "vpc-12345678",
        },
    })

    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    // Verificar reglas de security group
    sgId := terraform.Output(t, terraformOptions, "app_security_group_id")
    
    awsRegion := "us-west-2"
    sg := aws.GetSecurityGroupById(t, sgId, awsRegion)
    
    // Verificar que solo permite tráfico del ALB
    assert.Len(t, sg.GroupRules, 2) // ingress + egress
}

Policy as Code con Open Policy Agent

# policies/terraform.rego
package terraform.security

import future.keywords.in
import future.keywords.if

# Denegar instancias grandes en entornos no productivos
deny_large_instances[msg] {
    input.planned_values.root_module.resources[i].type == "aws_instance"
    instance := input.planned_values.root_module.resources[i]
    
    large_instance_types := {
        "m5.large", "m5.xlarge", "m5.2xlarge",
        "c5.large", "c5.xlarge", "c5.2xlarge"
    }
    
    instance.values.instance_type in large_instance_types
    
    # Obtener environment de las tags
    environment := instance.values.tags.Environment
    environment != "production"
    
    msg := sprintf("Large instance type '%s' not allowed in '%s' environment", [
        instance.values.instance_type,
        environment
    ])
}

# Requerir cifrado en volúmenes EBS
deny_unencrypted_ebs[msg] {
    input.planned_values.root_module.resources[i].type == "aws_ebs_volume"
    volume := input.planned_values.root_module.resources[i]
    
    not volume.values.encrypted
    
    msg := sprintf("EBS volume '%s' must be encrypted", [volume.address])
}

# Requerir tags obligatorias
required_tags := {"Environment", "Project", "Owner", "CostCenter"}

deny_missing_tags[msg] {
    input.planned_values.root_module.resources[i].type in {
        "aws_instance", "aws_ebs_volume", "aws_s3_bucket"
    }
    
    resource := input.planned_values.root_module.resources[i]
    resource_tags := object.get(resource.values, "tags", {})
    
    missing_tag := required_tags[_]
    not missing_tag in object.keys(resource_tags)
    
    msg := sprintf("Resource '%s' missing required tag: '%s'", [
        resource.address,
        missing_tag
    ])
}

# Verificar configuración de S3 buckets
deny_public_s3_buckets[msg] {
    input.planned_values.root_module.resources[i].type == "aws_s3_bucket_acl"
    acl := input.planned_values.root_module.resources[i]
    
    acl.values.acl in {"public-read", "public-read-write"}
    
    msg := sprintf("S3 bucket ACL '%s' allows public access", [acl.address])
}

Compliance y Governance

# .github/workflows/compliance.yml
name: Infrastructure Compliance

on:
  pull_request:
    paths: ['terraform/**']

jobs:
  compliance-scan:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Run Checkov
      uses: bridgecrewio/checkov-action@master
      with:
        directory: terraform/
        quiet: true
        output_format: sarif
        output_file_path: checkov-results.sarif

    - name: Upload Checkov results to GitHub
      uses: github/codeql-action/upload-sarif@v3
      if: always()
      with:
        sarif_file: checkov-results.sarif

    - name: Run TFSec
      uses: aquasecurity/tfsec-action@v1.0.0
      with:
        working_directory: terraform/
        format: sarif
        sarif_file: tfsec-results.sarif

    - name: Upload TFSec results
      uses: github/codeql-action/upload-sarif@v3
      if: always()
      with:
        sarif_file: tfsec-results.sarif

    - name: Run OPA Policy Check
      run: |
        # Install OPA
        curl -L -o opa https://openpolicyagent.org/downloads/latest/opa_linux_amd64_static
        chmod +x opa
        
        # Generate Terraform plan JSON
        cd terraform/environments/staging
        terraform init -backend=false
        terraform plan -out=tfplan
        terraform show -json tfplan > plan.json
        
        # Run policy evaluation
        ../../../opa eval -d ../../../policies/ -i plan.json "data.terraform.security.deny_large_instances"

Monitoreo y Observabilidad de Infraestructura

Métricas y Alertas

# monitoring.tf
resource "aws_cloudwatch_dashboard" "infrastructure" {
  dashboard_name = "${var.project_name}-infrastructure"

  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6

        properties = {
          metrics = [
            ["AWS/ApplicationELB", "RequestCount", "LoadBalancer", aws_lb.main.arn_suffix],
            [".", "TargetResponseTime", ".", "."],
            [".", "HTTPCode_ELB_5XX_Count", ".", "."],
            [".", "HTTPCode_Target_2XX_Count", ".", "."]
          ]
          view    = "timeSeries"
          stacked = false
          region  = var.aws_region
          title   = "Application Load Balancer Metrics"
          period  = 300
        }
      },
      {
        type   = "metric"
        x      = 0
        y      = 6
        width  = 12
        height = 6

        properties = {
          metrics = [
            ["AWS/AutoScaling", "GroupDesiredCapacity", "AutoScalingGroupName", aws_autoscaling_group.app.name],
            [".", "GroupInServiceInstances", ".", "."],
            [".", "GroupPendingInstances", ".", "."],
            [".", "GroupTerminatingInstances", ".", "."]
          ]
          view   = "timeSeries"
          region = var.aws_region
          title  = "Auto Scaling Group Metrics"
          period = 300
        }
      }
    ]
  })
}

# SNS Topic for alerts
resource "aws_sns_topic" "infrastructure_alerts" {
  name = "${var.project_name}-infrastructure-alerts"
}

resource "aws_sns_topic_subscription" "email" {
  topic_arn = aws_sns_topic.infrastructure_alerts.arn
  protocol  = "email"
  endpoint  = var.alert_email
}

# CloudWatch Alarms
resource "aws_cloudwatch_metric_alarm" "high_5xx_errors" {
  alarm_name          = "${var.project_name}-high-5xx-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "HTTPCode_ELB_5XX_Count"
  namespace           = "AWS/ApplicationELB"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "This metric monitors 5xx errors from the load balancer"
  alarm_actions       = [aws_sns_topic.infrastructure_alerts.arn]

  dimensions = {
    LoadBalancer = aws_lb.main.arn_suffix
  }

  tags = var.common_tags
}

resource "aws_cloudwatch_metric_alarm" "high_response_time" {
  alarm_name          = "${var.project_name}-high-response-time"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "TargetResponseTime"
  namespace           = "AWS/ApplicationELB"
  period              = "300"
  statistic           = "Average"
  threshold           = "1"
  alarm_description   = "This metric monitors response time"
  alarm_actions       = [aws_sns_topic.infrastructure_alerts.arn]

  dimensions = {
    LoadBalancer = aws_lb.main.arn_suffix
  }

  tags = var.common_tags
}

Casos de Uso Empresariales

Migración de Infraestructura Legacy

Contexto: Empresa financiera con 200+ servidores físicos, migración a cloud.

Estrategia implementada:

  • Inventario automatizado de infraestructura existente
  • Creación de módulos Terraform reutilizables
  • Migración por fases con validación automática
  • Implementación de governance desde día uno

Resultados:

  • 60% reducción en tiempo de provisioning
  • 40% reducción en costos de infraestructura
  • 99.9% uptime durante migración
  • Cumplimiento regulatorio mantenido

Plataforma Multi-Tenant SaaS

Contexto: Startup scaling from single tenant to multi-tenant architecture.

Implementación:

  • Terraform workspaces para aislamiento de tenants
  • Módulos reutilizables con configuración dinámica
  • CI/CD completamente automatizado
  • Monitoreo por tenant

Resultados:

  • Onboarding de nuevos clientes en 30 minutos
  • Reducción de 80% en esfuerzo operacional
  • Escalabilidad automática basada en demand

Recursos y Herramientas Recomendadas

Documentación Oficial

Herramientas de Ecosistema

La adopción exitosa de Infraestructura como Código requiere tanto cambios técnicos como culturales. Las organizaciones que invierten en estas prácticas ven retornos significativos en velocidad, confiabilidad y costos operacionales. El journey hacia la madurez en IaC es incremental, pero cada paso proporciona valor tangible inmediato.