New engagements · 24h
Skip to main content
Engineering Methodology

How we work

Every engagement follows the same process: assess, design, build, validate, document, operate. The same principles apply whether the project is a Terraform module or a production EKS cluster.

The process

6-phase engagement model

Each phase has defined entry criteria, deliverables, and exit criteria. Nothing is skipped. Phases 1–3 are sequential; 4–6 are continuous.

01

Discovery & Assessment

1–2 weeks

Understand the current infrastructure, identify gaps, define success criteria. No assumptions.

Current architecture diagram Gap analysis vs. target state Risk register Project scope document
02

Architecture Design

1–2 weeks

Design the target architecture with explicit trade-off documentation. Every decision is justified, not assumed.

Target architecture diagrams Infrastructure module design Security model (IAM, network, secrets) Architecture Decision Records (ADRs)
03

Implementation

2–8 weeks

Build the infrastructure using IaC-first principles. Every resource is in Terraform. Every application deployment is GitOps.

Terraform modules (per component) CI/CD pipelines (GitHub Actions + Jenkins) Kubernetes manifests with GitOps delivery Runbooks for operational procedures
04

Validation & Testing

Continuous

Every implementation is tested against the requirements defined in Phase 1. No deployment without a passing pipeline.

Automated test suite (pytest, ShellCheck) SonarCloud quality gate results Load test results for critical paths Security scan reports
05

Documentation

Continuous

Documentation is written alongside code, not after. Architecture decisions, runbooks, and operational guides are deliverables, not afterthoughts.

README per module with input/output documentation Runbooks for each operational scenario Architecture Decision Records On-call guide with escalation paths
06

Operations & Handoff

Ongoing

Production readiness review, observability validation, team enablement. The engagement ends when the client team can operate independently.

Production readiness checklist (47 points) Observability dashboards (Grafana) Alerting configured and tested Team knowledge transfer sessions

Non-negotiables

Engineering standards

These are not preferences. They are the minimum bar for every deliverable.

IaC-first

  • Every AWS resource in a Terraform module
  • No ClickOps in production, ever
  • S3 remote state + DynamoDB lock on all projects
  • terraform plan required before every apply

Semantic commits & Git hygiene

  • Conventional commits: feat/fix/ci/docs/chore
  • Protected main branch — PRs required
  • CI must pass before merge
  • No force-push to main

set -euo pipefail in all shell scripts

  • -e: exit on any error
  • -u: treat unset variables as errors
  • -o pipefail: catch pipeline failures
  • ShellCheck CI validation on all .sh files

Zero static credentials

  • GitHub Actions uses OIDC federation
  • EC2/EKS uses IAM roles (IRSA)
  • SSM Session Manager replaces SSH
  • No AWS_ACCESS_KEY_ID in any secret store

GitOps principles

Git is the source of truth. Always.

GitOps means the desired state of the system is declared in Git, and a controller continuously reconciles the actual state against it. Not as a best practice — as a technical guarantee.

  • No manual kubectl apply in production

    ArgoCD is the only mechanism for applying changes to a managed cluster. selfHeal=true enforces this.

  • Every infrastructure change is a commit

    terraform apply runs from CI only. Local applies are blocked on production backends.

  • Prune enabled on all ArgoCD applications

    Removing a manifest from Git removes the resource from the cluster. Git defines complete state.

  • Drift is an incident

    Any difference between Git and the cluster triggers an alert. ArgoCD Degraded state is paged, not ignored.

GitOps delivery pipeline

1 git push (feature branch)
Pull request + code review
2 CI: terraform plan + tests + docker build
Merge to main
3 Jenkins: build artifact + push ECR
Update image tag in k8s manifest
4 ArgoCD detects diff (≤ 3 min)
kubectl apply (by ArgoCD only)
Cluster reflects Git state

Security by default

Security is built in, not bolted on

Identity

  • OIDC federation for CI/CD
  • IAM roles with least-privilege
  • IRSA for Kubernetes workloads
  • MFA on all human accounts

Network

  • Compute in private subnets
  • No port 22 in security groups
  • SSM replaces SSH everywhere
  • VPC endpoints to reduce blast radius

Secrets

  • AWS Secrets Manager for runtime secrets
  • No secrets in environment variables
  • No secrets in code or Git history
  • Rotation policies on all credentials

Observability first

Prometheus + Grafana before go-live

No production deployment goes live without observability configured, dashboards built, and alerts tested. Observability is a gate, not an afterthought.

Prometheus metrics collection on all workloads
Grafana dashboards for cluster + application RED metrics
Alertmanager routes P1/P2 to on-call, P3/P4 to ticket
CloudWatch for AWS-level metrics and CloudTrail audit
Structured JSON logging from all services
Alert testing: chaos drill before go-live
RED
Rate · Errors · Duration
Per-service metrics tracked from day one
USE
Utilization · Saturation · Errors
Infrastructure metrics for every resource
< 3m
Alert response time
ArgoCD + Alertmanager default polling
47
Production readiness checks
Every engagement validated before go-live

Red Hat engineering standards

Battle-tested open source practices

The k8s-on-premise project applies Red Hat engineering practices to bare-metal Kubernetes. These same standards govern all lra-cloud-ops projects.

Idempotent provisioning

Every script can run twice and produce the same result. set -euo pipefail. Check before act.

Documentation in code

No separate docs directory. README files live next to the code they document.

Everything auditable

CloudTrail for AWS, ArgoCD history for K8s, git log for code. No change without a record.

Minimal dependencies

Use the standard library. Use the managed service. Add a dependency only when there is no alternative.

Explicit over implicit

All Terraform resources are explicit. No default values that hide security settings. No magic.

Automation over runbooks

If a runbook step can be automated, automate it. Runbooks are for decisions, not commands.

See the methodology in practice

Every project in the portfolio is built using this process. Review the case studies or schedule a conversation to discuss how it applies to your infrastructure.