New engagements · 24h
Skip to main content
Home / Docs / Kubernetes Production

Kubernetes

Kubernetes Production Guide

1. Resource Requests and Limits

Every container in production must have resource requests and limits defined. Requests drive scheduling; limits prevent a single container from consuming all node resources and triggering OOM kills on neighbors.

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Set limits to 2× requests as a starting point. Monitor actual usage with CloudWatch Container Insights or Prometheus for 1–2 weeks and tune from observed P95 values — not estimates.

Never

Deploy without resource limits in production. A container with no limits can consume all node memory, triggering an OOM kill that takes down unrelated pods on the same node.

2. Health Checks

Three distinct probe types, each with a specific role. Configuring all three correctly is what enables zero-downtime deployments and automatic recovery.

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10
  # Gives 300s for startup before liveness takes over

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 15
  failureThreshold: 3
  # Restarts container if unhealthy 3× in a row

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3
  # Removes pod from load balancer if not ready
startupProbe

Gives slow-starting apps time to initialize without being killed by liveness. Set failureThreshold high for JVM apps.

livenessProbe

Detects deadlocked or hung containers and restarts them. Should hit a lightweight endpoint that checks internal state.

readinessProbe

Controls load balancer traffic. Useful for apps that need warmup time after restart before serving production traffic.

3. Pod Security Standards

Every production deployment runs as a non-root user with a read-only root filesystem. These settings prevent a compromised container from writing malicious files or escaping to the host.

spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
  automountServiceAccountToken: false
  containers:
    - name: webapp
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL
      volumeMounts:
        - name: tmp
          mountPath: /tmp          # writable temp dir
  volumes:
    - name: tmp
      emptyDir: 

4. Network Policies

Start with a default-deny policy and explicitly allow only the required traffic paths. This implements microsegmentation — a compromised pod cannot reach other services it should not talk to.

# Default deny all ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: webapp-prod
spec:
  podSelector: 
  policyTypes:
    - Ingress
    - Egress
---
# Allow only: ingress from ALB, egress to RDS
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: webapp-allow
  namespace: webapp-prod
spec:
  podSelector:
    matchLabels:
      app: webapp
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
  egress:
    - ports:
        - port: 5432    # PostgreSQL
        - port: 443     # AWS API endpoints

5. HPA Configuration

We configure HPA on all production deployments. The target CPU threshold should be set below 80% to allow headroom for rolling update capacity during deployments.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
  namespace: webapp-prod
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70    # scale before saturation
⚠️

HPA requires resource requests to function. Without CPU/memory requests, the scheduler cannot calculate utilization percentages. Always set requests before enabling HPA. Also set maxUnavailable: 0 in your rolling update strategy to prevent downtime during scale events.