Stelia logo

DevOps/Platform Engineer

Stelia
1 day ago
Full-time
Remote
United Kingdom
DevOps
Your role
Overview:
  • Build and run a reliable platform for services and data workflows across Kubernetes and Prefect.
  • Own CI/CD, observability, security, and developer experience for Python/Go/Rust services.
Responsibilities:
  • Design, provision, and operate Kubernetes workloads (deployments, networking, autoscaling, storage).
  • Build and maintain GitLab CI/CD pipelines for Python, Go, and Rust services (build, test, scan, release).
  • Operate Prefect (agents, work queues, deployments, concurrency limits, task execution environments).
  • Implement environment strategy and promotion flow (dev/staging/prod) with clear release gates.
  • Create golden paths and templates for FastAPI microservices and Prefect flows.
  • Manage secrets, configuration, and access (e.g., GitLab variables, K8s secrets).
  • Establish observability: logging, metrics, traces, alerting, runbooks, and SLOs.
  • Operate data stores (MySQL, PostgreSQL, Redis): provisioning, backups, migration execution, monitoring, and capacity planning.
  • Optimise build and runtime costs (container images, caching, autoscaling, resource requests/limits).
  • Lead incident response, postmortems, and reliability improvements.
Your profile
You have:
  • 4+ years in DevOps/SRE/Platform roles with production Kubernetes.
  • Strong GitLab CI/CD experience (pipelines, runners, caching, artifact management).
  • Proficiency with containers and image optimization; comfortable with Linux internals and networking.
  • Hands-on with Prefect in production (deployments, flow orchestration, storage, results).
  • Familiar with operating MySQL/PostgreSQL/Redis in production (availability, performance, backups).
  • Scripting/automation with Python or Go; ability to read Rust build pipelines.
  • Solid understanding of security fundamentals (least privilege, image scanning, SBOM, secret hygiene).
  • Experience instrumenting systems and creating actionable alerts.
Nice to have:
  • Helm/Kustomize, policy-as-code (OPA), and basic gRPC.
  • Performance tuning for high‑throughput data or API services.
  • Experience in multi‑tenant or multi‑cluster environments.
About us
At Stelia, we are building the AI Operating System for a distributed, intelligent world. Our mission is to dismantle the boundaries between humanity and technology by creating an Enterprise AI designed for trust, resilience, and scale.