Escalando Infraestructura como Código con Terraform
Terraform se ha establecido como la herramienta líder para Infraestructura como Código (IaC), permitiendo a los equipos de DevOps gestionar recursos cloud de forma declarativa, versionable y repetible. Esta guía completa te enseñará cómo escalar tu infraestructura de manera eficiente y profesional usando Terraform.
¿Qué es Terraform y por qué es fundamental para el escalamiento?
Terraform es una herramienta de código abierto desarrollada por HashiCorp que permite definir infraestructura usando archivos de configuración declarativos. A diferencia de los scripts imperativos, Terraform describe el estado deseado de tu infraestructura y se encarga de crear, modificar o destruir recursos para alcanzar ese estado.
Ventajas clave de Terraform para el escalamiento:
- Gestión de estado: Mantiene un registro detallado del estado actual de tu infraestructura
- Planificación de cambios: Muestra exactamente qué cambios se realizarán antes de aplicarlos
- Paralelización: Crea recursos simultáneamente cuando no hay dependencias
- Reutilización: Permite crear módulos reutilizables para patrones comunes
- Multi-cloud: Funciona con AWS, Azure, GCP y más de 1000 proveedores
Conceptos fundamentales para el escalamiento
1. Organización del código Terraform
La estructura de tu código Terraform es crucial para el escalamiento. Aquí tienes una estructura recomendada:
2. Gestión del estado remoto
Para equipos y entornos de producción, es esencial usar un backend remoto para almacenar el estado:
# backend.tf
terraform {
backend "s3" {
bucket = "mi-empresa-terraform-state"
key = "environments/production/terraform.tfstate"
region = "us-west-2"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
3. Versionado y bloqueo de proveedores
Define versiones específicas para garantizar reproducibilidad:
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
Implementación práctica: Escalando una aplicación web
Paso 1: Módulo base de VPC
Crea un módulo reutilizable para la red:
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-vpc"
Environment = var.environment
ManagedBy = "terraform"
}
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project_name}-igw"
Environment = var.environment
}
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-public-${count.index + 1}"
Type = "public"
Environment = var.environment
}
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project_name}-private-${count.index + 1}"
Type = "private"
Environment = var.environment
}
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.project_name}-public-rt"
}
}
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
Paso 2: Variables del módulo VPC
# modules/vpc/variables.tf
variable "project_name" {
description = "Nombre del proyecto"
type = string
}
variable "environment" {
description = "Entorno (dev, staging, prod)"
type = string
}
variable "vpc_cidr" {
description = "CIDR block para la VPC"
type = string
default = "10.0.0.0/16"
}
variable "availability_zones" {
description = "Lista de zonas de disponibilidad"
type = list(string)
}
Paso 3: Outputs del módulo
# modules/vpc/outputs.tf
output "vpc_id" {
description = "ID de la VPC"
value = aws_vpc.main.id
}
output "public_subnet_ids" {
description = "IDs de las subnets públicas"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "IDs de las subnets privadas"
value = aws_subnet.private[*].id
}
output "vpc_cidr_block" {
description = "CIDR block de la VPC"
value = aws_vpc.main.cidr_block
}
Paso 4: Módulo de Auto Scaling
# modules/compute/main.tf
resource "aws_launch_template" "app" {
name_prefix = "${var.project_name}-${var.environment}-"
image_id = var.ami_id
instance_type = var.instance_type
vpc_security_group_ids = [aws_security_group.app.id]
user_data = base64encode(templatefile("${path.module}/user_data.sh", {
app_name = var.project_name
}))
tag_specifications {
resource_type = "instance"
tags = {
Name = "${var.project_name}-${var.environment}"
Environment = var.environment
ManagedBy = "terraform"
}
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_autoscaling_group" "app" {
name = "${var.project_name}-${var.environment}-asg"
vpc_zone_identifier = var.subnet_ids
target_group_arns = [aws_lb_target_group.app.arn]
health_check_type = "ELB"
health_check_grace_period = 300
min_size = var.min_instances
max_size = var.max_instances
desired_capacity = var.desired_instances
launch_template {
id = aws_launch_template.app.id
version = "$Latest"
}
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
}
}
tag {
key = "Name"
value = "${var.project_name}-${var.environment}-asg"
propagate_at_launch = false
}
}
resource "aws_autoscaling_policy" "scale_up" {
name = "${var.project_name}-scale-up"
scaling_adjustment = 2
adjustment_type = "ChangeInCapacity"
cooldown = 300
autoscaling_group_name = aws_autoscaling_group.app.name
}
resource "aws_autoscaling_policy" "scale_down" {
name = "${var.project_name}-scale-down"
scaling_adjustment = -1
adjustment_type = "ChangeInCapacity"
cooldown = 300
autoscaling_group_name = aws_autoscaling_group.app.name
}
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
alarm_name = "${var.project_name}-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "75"
alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = [aws_autoscaling_policy.scale_up.arn]
dimensions = {
AutoScalingGroupName = aws_autoscaling_group.app.name
}
}
resource "aws_cloudwatch_metric_alarm" "low_cpu" {
alarm_name = "${var.project_name}-low-cpu"
comparison_operator = "LessThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "25"
alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = [aws_autoscaling_policy.scale_down.arn]
dimensions = {
AutoScalingGroupName = aws_autoscaling_group.app.name
}
}
Paso 5: Load Balancer
# modules/compute/load_balancer.tf
resource "aws_lb" "app" {
name = "${var.project_name}-${var.environment}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = var.subnet_ids
enable_deletion_protection = var.environment == "production"
tags = {
Environment = var.environment
ManagedBy = "terraform"
}
}
resource "aws_lb_target_group" "app" {
name = "${var.project_name}-${var.environment}-tg"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
healthy_threshold = 2
unhealthy_threshold = 2
timeout = 5
interval = 30
path = "/health"
matcher = "200"
port = "traffic-port"
protocol = "HTTP"
}
tags = {
Environment = var.environment
}
}
resource "aws_lb_listener" "app" {
load_balancer_arn = aws_lb.app.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.app.arn
}
}
resource "aws_security_group" "alb" {
name_prefix = "${var.project_name}-alb-"
vpc_id = var.vpc_id
ingress {
description = "HTTP from anywhere"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS from anywhere"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-alb-sg"
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_security_group" "app" {
name_prefix = "${var.project_name}-app-"
vpc_id = var.vpc_id
ingress {
description = "HTTP from ALB"
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-app-sg"
}
lifecycle {
create_before_destroy = true
}
}
Paso 6: Implementación en producción
# environments/production/main.tf
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Project = var.project_name
Environment = "production"
ManagedBy = "terraform"
Team = "devops"
}
}
}
module "vpc" {
source = "../../modules/vpc"
project_name = var.project_name
environment = "production"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]
}
module "compute" {
source = "../../modules/compute"
project_name = var.project_name
environment = "production"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.public_subnet_ids
ami_id = var.ami_id
instance_type = "t3.medium"
min_instances = 2
max_instances = 10
desired_instances = 3
}
# Base de datos RDS
resource "aws_db_subnet_group" "main" {
name = "${var.project_name}-db-subnet-group"
subnet_ids = module.vpc.private_subnet_ids
tags = {
Name = "${var.project_name} DB subnet group"
}
}
resource "aws_db_instance" "main" {
identifier = "${var.project_name}-db"
engine = "postgres"
engine_version = "14.9"
instance_class = "db.t3.micro"
allocated_storage = 20
max_allocated_storage = 100
storage_type = "gp2"
storage_encrypted = true
db_name = var.db_name
username = var.db_username
password = var.db_password
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.db.id]
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "Mon:04:00-Mon:05:00"
skip_final_snapshot = false
final_snapshot_identifier = "${var.project_name}-db-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
tags = {
Name = "${var.project_name}-database"
}
}
resource "aws_security_group" "db" {
name_prefix = "${var.project_name}-db-"
vpc_id = module.vpc.vpc_id
ingress {
description = "PostgreSQL from app servers"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [module.compute.security_group_id]
}
tags = {
Name = "${var.project_name}-db-sg"
}
}
Mejores prácticas para el escalamiento
1. Gestión de secretos
Nunca hardcodees credenciales en tu código Terraform. Usa AWS Secrets Manager o variables de entorno:
data "aws_secretsmanager_secret_version" "db_credentials" {
secret_id = "prod/database/credentials"
}
locals {
db_credentials = jsondecode(data.aws_secretsmanager_secret_version.db_credentials.secret_string)
}
resource "aws_db_instance" "main" {
username = local.db_credentials.username
password = local.db_credentials.password
# ... otros parámetros
}
2. Validación de entrada
Valida las variables de entrada para prevenir errores:
variable "instance_type" {
description = "Tipo de instancia EC2"
type = string
default = "t3.micro"
validation {
condition = can(regex("^t3\\.", var.instance_type))
error_message = "El tipo de instancia debe ser de la familia t3."
}
}
variable "environment" {
description = "Entorno de despliegue"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "El entorno debe ser dev, staging o production."
}
}
3. Tagging consistente
Implementa una estrategia de tagging consistente:
locals {
common_tags = {
Project = var.project_name
Environment = var.environment
ManagedBy = "terraform"
Team = var.team_name
CostCenter = var.cost_center
CreatedBy = data.aws_caller_identity.current.user_id
CreatedAt = timestamp()
}
}
resource "aws_instance" "example" {
# ... configuración de la instancia
tags = merge(local.common_tags, {
Name = "${var.project_name}-${var.environment}-web"
Type = "web-server"
})
}
4. Testing y validación
Implementa tests para tu código Terraform usando Terratest:
func TestTerraformVpcModule(t *testing.T) {
terraformOptions := &terraform.Options{
TerraformDir: "../examples/vpc",
Vars: map[string]interface{}{
"project_name": "test-project",
"environment": "test",
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
vpcId := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcId)
}
Estrategias avanzadas de escalamiento
1. Blue/Green Deployments
resource "aws_autoscaling_group" "blue" {
count = var.deployment_color == "blue" ? 1 : 0
# ... configuración del ASG
}
resource "aws_autoscaling_group" "green" {
count = var.deployment_color == "green" ? 1 : 0
# ... configuración del ASG
}
2. Multi-región
# variables.tf
variable "regions" {
description = "Lista de regiones para despliegue"
type = list(string)
default = ["us-west-2", "us-east-1"]
}
# main.tf
module "infrastructure" {
for_each = toset(var.regions)
source = "./modules/regional-infrastructure"
region = each.key
environment = var.environment
}
3. Workspace para múltiples entornos
# Comandos para gestionar workspaces
terraform workspace new development
terraform workspace new staging
terraform workspace new production
# Usar workspace actual en configuración
locals {
environment = terraform.workspace
config = {
development = {
instance_type = "t3.micro"
min_size = 1
}
staging = {
instance_type = "t3.small"
min_size = 2
}
production = {
instance_type = "t3.medium"
min_size = 3
}
}
}
Monitoreo y observabilidad
1. CloudWatch personalizado
resource "aws_cloudwatch_dashboard" "main" {
dashboard_name = "${var.project_name}-${var.environment}"
dashboard_body = jsonencode({
widgets = [
{
type = "metric"
width = 12
height = 6
properties = {
metrics = [
["AWS/ApplicationELB", "TargetResponseTime", "LoadBalancer", aws_lb.app.arn_suffix],
["AWS/ApplicationELB", "RequestCount", "LoadBalancer", aws_lb.app.arn_suffix],
]
period = 300
stat = "Average"
region = var.aws_region
title = "Application Load Balancer Metrics"
}
}
]
})
}
2. Alertas inteligentes
resource "aws_sns_topic" "alerts" {
name = "${var.project_name}-alerts"
}
resource "aws_cloudwatch_metric_alarm" "high_error_rate" {
alarm_name = "${var.project_name}-high-error-rate"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "HTTPCode_Target_5XX_Count"
namespace = "AWS/ApplicationELB"
period = "300"
statistic = "Sum"
threshold = "10"
alarm_description = "This metric monitors application error rate"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
LoadBalancer = aws_lb.app.arn_suffix
}
}
CI/CD con Terraform
1. Pipeline de GitLab CI
# .gitlab-ci.yml
stages:
- validate
- plan
- apply
variables:
TF_ROOT: ${CI_PROJECT_DIR}/terraform
TF_ADDRESS: ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${CI_ENVIRONMENT_NAME}
cache:
key: "${CI_COMMIT_REF_SLUG}"
paths:
- ${TF_ROOT}/.terraform
before_script:
- cd ${TF_ROOT}
- terraform --version
- terraform init
validate:
stage: validate
script:
- terraform validate
- terraform fmt -check
plan:
stage: plan
script:
- terraform plan -out="planfile"
artifacts:
paths:
- ${TF_ROOT}/planfile
expire_in: 1 week
apply:
stage: apply
script:
- terraform apply -input=false "planfile"
dependencies:
- plan
when: manual
only:
- main
2. GitHub Actions
# .github/workflows/terraform.yml
name: 'Terraform'
on:
push:
branches: [ main ]
pull_request:
jobs:
terraform:
name: 'Terraform'
runs-on: ubuntu-latest
environment: production
defaults:
run:
shell: bash
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.0
- name: Terraform Format
id: fmt
run: terraform fmt -check
- name: Terraform Init
id: init
run: terraform init
- name: Terraform Plan
id: plan
run: terraform plan -no-color -input=false
continue-on-error: true
- name: Terraform Plan Status
if: steps.plan.outcome == 'failure'
run: exit 1
- name: Terraform Apply
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: terraform apply -auto-approve -input=false
Troubleshooting y optimización
1. Debugging común
# Habilitar logs detallados
export TF_LOG=DEBUG
export TF_LOG_PATH=./terraform.log
# Importar recursos existentes
terraform import aws_instance.example i-1234567890abcdef0
# Actualizar providers sin cambiar infraestructura
terraform init -upgrade
# Validar configuración sin aplicar
terraform validate
terraform plan -detailed-exitcode
2. Optimización de rendimiento
# Usar parallelismo controlado
terraform {
# Limitar paralelismo en entornos con muchos recursos
parallelism = 10
}
# Usar depends_on explícito cuando sea necesario
resource "aws_instance" "app" {
# ... configuración
depends_on = [
aws_db_instance.main,
aws_security_group.app
]
}
Recursos adicionales y documentación oficial
Para profundizar en Terraform y seguir las mejores prácticas actualizadas, consulta estos recursos oficiales:
- Documentación oficial de Terraform: Guías completas y referencia de recursos
- Terraform Registry: Módulos y proveedores oficiales y de la comunidad
- Best Practices Guide: Mejores prácticas recomendadas por HashiCorp
- AWS Provider Documentation: Documentación específica del proveedor AWS
- Terraform Learn: Tutoriales interactivos oficiales
Enlaces internos relacionados:
- CI/CD con Azure DevOps - Integración de Terraform en pipelines
- Infraestructura como Código - Conceptos fundamentales de IaC
- Gestión de secretos - Seguridad en configuraciones Terraform
Conclusión
Terraform es una herramienta poderosa para escalar infraestructura como código, pero su éxito depende de implementar las mejores prácticas desde el principio. La modularización, el testing automatizado, la gestión adecuada del estado y la integración con CI/CD son fundamentales para proyectos exitosos a largo plazo.
Recuerda que el escalamiento no es solo técnico - también incluye procesos, documentación y capacitación del equipo. Comienza con implementaciones simples y evoluciona gradualmente hacia arquitecturas más complejas conforme tu equipo gane experiencia.
La inversión inicial en estructurar correctamente tu código Terraform pagará dividendos importantes cuando necesites gestionar infraestructura compleja en múltiples entornos y regiones.