Engineering Methodology

How we work

Every engagement follows the same process: assess, design, build, validate, document, operate. The same principles apply whether the project is a Terraform module or a production EKS cluster.

The process

6-phase engagement model

Each phase has defined entry criteria, deliverables, and exit criteria. Nothing is skipped. Phases 1–3 are sequential; 4–6 are continuous.

Discovery & Assessment

1–2 weeks

Understand the current infrastructure, identify gaps, define success criteria. No assumptions.

Current architecture diagram Gap analysis vs. target state Risk register Project scope document

Architecture Design

1–2 weeks

Design the target architecture with explicit trade-off documentation. Every decision is justified, not assumed.

Target architecture diagrams Infrastructure module design Security model (IAM, network, secrets) Architecture Decision Records (ADRs)

Implementation

2–8 weeks

Build the infrastructure using IaC-first principles. Every resource is in Terraform. Every application deployment is GitOps.

Terraform modules (per component) CI/CD pipelines (GitHub Actions + Jenkins) Kubernetes manifests with GitOps delivery Runbooks for operational procedures

Validation & Testing

Continuous

Every implementation is tested against the requirements defined in Phase 1. No deployment without a passing pipeline.

Automated test suite (pytest, ShellCheck) SonarCloud quality gate results Load test results for critical paths Security scan reports

Documentation

Continuous

Documentation is written alongside code, not after. Architecture decisions, runbooks, and operational guides are deliverables, not afterthoughts.

README per module with input/output documentation Runbooks for each operational scenario Architecture Decision Records On-call guide with escalation paths

Operations & Handoff

Ongoing

Production readiness review, observability validation, team enablement. The engagement ends when the client team can operate independently.

Production readiness checklist (47 points) Observability dashboards (Grafana) Alerting configured and tested Team knowledge transfer sessions

Non-negotiables

Engineering standards

These are not preferences. They are the minimum bar for every deliverable.

IaC-first

Every AWS resource in a Terraform module
No ClickOps in production, ever
S3 remote state + DynamoDB lock on all projects
terraform plan required before every apply

Semantic commits & Git hygiene

Conventional commits: feat/fix/ci/docs/chore
Protected main branch — PRs required
CI must pass before merge
No force-push to main

set -euo pipefail in all shell scripts

-e: exit on any error
-u: treat unset variables as errors
-o pipefail: catch pipeline failures
ShellCheck CI validation on all .sh files

Zero static credentials

GitHub Actions uses OIDC federation
EC2/EKS uses IAM roles (IRSA)
SSM Session Manager replaces SSH
No AWS_ACCESS_KEY_ID in any secret store

GitOps principles

Git is the source of truth. Always.

GitOps means the desired state of the system is declared in Git, and a controller continuously reconciles the actual state against it. Not as a best practice — as a technical guarantee.

No manual kubectl apply in production

ArgoCD is the only mechanism for applying changes to a managed cluster. selfHeal=true enforces this.
Every infrastructure change is a commit

terraform apply runs from CI only. Local applies are blocked on production backends.
Prune enabled on all ArgoCD applications

Removing a manifest from Git removes the resource from the cluster. Git defines complete state.
Drift is an incident

Any difference between Git and the cluster triggers an alert. ArgoCD Degraded state is paged, not ignored.

GitOps delivery pipeline

1 git push (feature branch)

↓ Pull request + code review

2 CI: terraform plan + tests + docker build

↓ Merge to main

3 Jenkins: build artifact + push ECR

↓ Update image tag in k8s manifest

4 ArgoCD detects diff (≤ 3 min)

↓ kubectl apply (by ArgoCD only)

✓ Cluster reflects Git state

Security by default

Security is built in, not bolted on

Identity

OIDC federation for CI/CD
IAM roles with least-privilege
IRSA for Kubernetes workloads
MFA on all human accounts

Network

Compute in private subnets
No port 22 in security groups
SSM replaces SSH everywhere
VPC endpoints to reduce blast radius

Secrets

AWS Secrets Manager for runtime secrets
No secrets in environment variables
No secrets in code or Git history
Rotation policies on all credentials

Observability first

Prometheus + Grafana before go-live

No production deployment goes live without observability configured, dashboards built, and alerts tested. Observability is a gate, not an afterthought.

Prometheus metrics collection on all workloads

Grafana dashboards for cluster + application RED metrics

Alertmanager routes P1/P2 to on-call, P3/P4 to ticket

CloudWatch for AWS-level metrics and CloudTrail audit

Structured JSON logging from all services

Alert testing: chaos drill before go-live

RED

Rate · Errors · Duration

Per-service metrics tracked from day one

USE

Utilization · Saturation · Errors

Infrastructure metrics for every resource

< 3m

Alert response time

ArgoCD + Alertmanager default polling

Production readiness checks

Every engagement validated before go-live

Red Hat engineering standards

Battle-tested open source practices

The k8s-on-premise project applies Red Hat engineering practices to bare-metal Kubernetes. These same standards govern all lra-cloud-ops projects.

Idempotent provisioning

Every script can run twice and produce the same result. set -euo pipefail. Check before act.

Documentation in code

No separate docs directory. README files live next to the code they document.

Everything auditable

CloudTrail for AWS, ArgoCD history for K8s, git log for code. No change without a record.

Minimal dependencies

Use the standard library. Use the managed service. Add a dependency only when there is no alternative.

Explicit over implicit

All Terraform resources are explicit. No default values that hide security settings. No magic.

Automation over runbooks

If a runbook step can be automated, automate it. Runbooks are for decisions, not commands.

See the methodology in practice

Every project in the portfolio is built using this process. Review the case studies or schedule a conversation to discuss how it applies to your infrastructure.

View case studies Schedule a conversation