Kubernetes
Kubernetes Production Guide
1. Resource Requests and Limits
Every container in production must have resource requests and limits defined. Requests drive scheduling; limits prevent a single container from consuming all node resources and triggering OOM kills on neighbors.
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m" Set limits to 2× requests as a starting point. Monitor actual usage with CloudWatch Container Insights or Prometheus for 1–2 weeks and tune from observed P95 values — not estimates.
Never
Deploy without resource limits in production. A container with no limits can consume all node memory, triggering an OOM kill that takes down unrelated pods on the same node.
2. Health Checks
Three distinct probe types, each with a specific role. Configuring all three correctly is what enables zero-downtime deployments and automatic recovery.
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10
# Gives 300s for startup before liveness takes over
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 15
failureThreshold: 3
# Restarts container if unhealthy 3× in a row
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
# Removes pod from load balancer if not ready startupProbe Gives slow-starting apps time to initialize without being killed by liveness. Set failureThreshold high for JVM apps.
livenessProbe Detects deadlocked or hung containers and restarts them. Should hit a lightweight endpoint that checks internal state.
readinessProbe Controls load balancer traffic. Useful for apps that need warmup time after restart before serving production traffic.
3. Pod Security Standards
Every production deployment runs as a non-root user with a read-only root filesystem. These settings prevent a compromised container from writing malicious files or escaping to the host.
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
automountServiceAccountToken: false
containers:
- name: webapp
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp
mountPath: /tmp # writable temp dir
volumes:
- name: tmp
emptyDir: 4. Network Policies
Start with a default-deny policy and explicitly allow only the required traffic paths. This implements microsegmentation — a compromised pod cannot reach other services it should not talk to.
# Default deny all ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: webapp-prod
spec:
podSelector:
policyTypes:
- Ingress
- Egress
---
# Allow only: ingress from ALB, egress to RDS
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: webapp-allow
namespace: webapp-prod
spec:
podSelector:
matchLabels:
app: webapp
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
egress:
- ports:
- port: 5432 # PostgreSQL
- port: 443 # AWS API endpoints 5. HPA Configuration
We configure HPA on all production deployments. The target CPU threshold should be set below 80% to allow headroom for rolling update capacity during deployments.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
namespace: webapp-prod
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale before saturation HPA requires resource requests to function. Without CPU/memory requests, the scheduler cannot calculate utilization percentages. Always set requests before enabling HPA. Also set maxUnavailable: 0 in your rolling update strategy to prevent downtime during scale events.