Aaron Mark · DevOps / Platform Engineer · Open-Source Contributor

I build infrastructure that doesn't break.

Platform reliability engineer. I automate the fragile parts, harden the security boundaries, and keep Kubernetes workloads running without surprises.

PythonKubernetesVaultCI/CDOpenTelemetry

Flagship project

PatchPulse

Open-source, self-hosted AI-powered pre-flight risk analysis for Kubernetes.

100%

Open Source

About

Calm systems thinking.

I design platform foundations with failure modes in mind. My work spans Python automation, Kubernetes operations, Vault-driven secrets management, and observability patterns that keep production stable under pressure. I work extensively across AWS — EC2, EKS, S3, IAM, VPC, and CloudWatch — and use AI tooling daily: Cursor, Claude Code, and GitHub Copilot are core to how I build. I care about platforms where deployments are low-risk events, not high-stress ones — where on-call is boring because the alerting is precise and the runbooks are current. I also build in the open: PatchPulse is my self-hosted AI-powered pre-flight risk analysis platform for Kubernetes, built to catch risky changes before they reach production.

Tech Stack

Platform & Ops

KubernetesHelmDockerLinux

Cloud (AWS)

EC2EKSS3IAMVPCCloudWatchLightsail

Scripting & APIs

PythonFastAPIBash

IaC & GitOps

TerraformOpenTofuGitOps

CI/CD

GitLab CIJenkinsGitHub Actions

Security

VaultRBACSecret RotationPolicy Gates

Observability

OpenSearchOpenTelemetry

AI Tooling

CursorClaude CodeGitHub Copilot

Proof

One outcome, real impact.

Representative example of how I approach platform reliability.

PatchPulse: From idea to production

Problem
Production incidents from risky Kubernetes changes were frequent and hard to catch before deploy. Teams were firefighting post-merge instead of preventing issues pre-merge.
Solution
Built an open-source, self-hosted platform that runs AI-powered pre-flight risk analysis on every PR/MR, with policy gates and guardrails so teams can block or review before merge.
Outcome
Risky changes are caught at the PR stage with explainable AI evidence — turning pre-deployment review from a manual bottleneck into a fast, automated safety gate. Teams ship faster because the confidence is built into the pipeline.

Aaron's the person we call when the platform has to stay up. He doesn't just fix problems — he builds the systems that prevent them.

Engineering lead, platform team

Technical Expertise

Depth where reliability matters.

Depth in platform reliability, security, and Python automation.

  • Kubernetes cluster operations and workload hardening
  • AWS infrastructure across EC2, EKS, S3, IAM, VPC, and CloudWatch
  • Infrastructure patterns that prioritize uptime and blast-radius control

Current Tech Radar

Up to date on where DevOps and AI are going.

I continuously track and test emerging tooling so platform decisions stay modern, practical, and production-safe.

OpenClaw-style agent workflows

Evaluating AI agent orchestration patterns for operational automation with guardrails.

OpenTelemetry

Staying current on unified traces, metrics, and logs for faster incident diagnosis.

OpenTofu

Tracking modern IaC workflows and ecosystem evolution for secure, auditable provisioning.

AI-augmented CI/CD

Applying AI-assisted risk checks and policy gates to reduce deployment regressions.

Featured Projects

Operational architecture, built to endure.

PatchPulse.dev

Open SourceSelf-Hosted

Open-source, self-hosted AI-powered pre-flight risk analysis for Kubernetes that blocks risky changes before they reach production.

FastAPIKubernetesPostgreSQLGitHub/GitLabPolicy Engine
View project

Kubernetes Automation Framework

A Python-driven orchestration layer for safe cluster operations, policy enforcement, and repeatable rollout workflows.

PythonKubernetesGitLab CIHelm

Internal project

Vault Dynamic Secret Orchestration

An automation service that provisions short-lived credentials per workload and enforces role-based access boundaries.

VaultPythonKubernetesSecurity

Internal project

DevOps CLI Toolkit

A composable Python CLI that standardizes incident tasks, deployment workflows, and environment diagnostics.

PythonBashAutomationPlatform Ops

Internal project

OpenSearch Deployment Architecture

A resilient observability deployment blueprint focused on log ingestion integrity, query performance, and retention strategy.

OpenSearchKubernetesObservabilityIaC

Internal project

Philosophy

Engineering Principles

Automate everything.

Reduce human error.

Design for failure.

Observability first.

Security is not optional.

Simplicity scales.

FAQ

Common questions.

  • What kind of problems do you solve?

    Platform reliability gaps — risky deployments, fragile secrets management, observability blind spots, CI/CD that slows teams down. I build systems that remove those problems permanently.

  • How do you approach a new platform?

    I start by mapping the failure modes — what breaks, how often, and why. From there I identify the automation gaps, security boundaries, and observability blind spots. The goal is always a platform where deployments are low-risk and on-call is boring.

  • How to reach you?

    Use the contact section below — email or LinkedIn. I respond quickly.

Contact

Let's build something reliable.

If you're building a platform team or hiring for DevOps and want someone who can own infra, automation, and security end-to-end — reach out.

I respond quickly. Email me or connect on LinkedIn — happy to jump on a call.

amarkdotdev@gmail.com

Building a platform team? Let’s talk.

Recruiters and hiring managers: drop a line below.