Skip to main content

Stratos

Stratos is a Kubernetes operator that maintains pools of pre-warmed, reusable Kubernetes nodes. Nodes are launched once, fully initialized, then stopped and restarted on demand — giving you instant capacity with warm caches, pre-pulled images, and zero cold-start overhead.

The Problem

When Kubernetes needs more capacity, every existing autoscaler gives you a brand new machine. That means:

  1. Provisioning — Wait for the cloud provider to allocate and launch an instance
  2. Booting — OS initialization, kubelet startup, cluster join
  3. Networking — CNI plugin initialization, IP allocation
  4. Image pulls — Every DaemonSet image downloaded from scratch
  5. Application startup — Your workload's images pulled, caches empty, no local state

Cluster Autoscaler takes 3-8 minutes. Karpenter brought this down to ~40-50 seconds. But even at Karpenter speed, you still get a cold node every time — empty caches, no pre-pulled images, no local state. For workloads like CI/CD pipelines, LLM inference, or bursty applications, the cold environment is just as painful as the wait.

The Solution

Stratos takes a fundamentally different approach: nodes are initialized once and reused.

  1. Warmup — Stratos launches instances that join the cluster, initialize CNI, pull all DaemonSet images, run any custom setup, then self-stop
  2. Standby — Stopped instances sit in a pool, costing only EBS storage. The disk retains everything: images, caches, local state
  3. Scale-up — When pods are pending, Stratos starts a standby instance. Since the node is already initialized, it's ready in ~20 seconds
  4. Scale-down — Empty nodes are drained and stopped (not terminated), returning to standby with all their state intact

The key insight: Stratos stops and starts nodes instead of terminating and recreating them. This means every scale-up benefits from everything the node has accumulated — Docker layer caches, package manager caches, pre-pulled images, downloaded models, and any other local state.

Not Just Faster Boot — Faster Everything

Traditional autoscalers measure success by "time to node ready." Stratos is faster there too (~20 seconds vs ~40-50 seconds). But the real advantage is what happens after the node is ready:

Traditional AutoscalerStratos
Node provisioningLaunch new instance every timeStart existing instance (~20s)
DaemonSet imagesPull from registry every timeAlready on disk
Application imagesPull from registry every timeAlready on disk (if previously run)
Docker build cacheEmptyWarm from previous runs
Package manager cacheEmpty (npm install from scratch)Warm (node_modules cached)
Model weightsDownload every time (10+ min for LLMs)Already on disk
OS/system cachesColdWarm

A Karpenter node is ready in ~40 seconds, then your CI pipeline spends another 5 minutes pulling images and rebuilding dependencies. A Stratos node is ready in ~20 seconds, and your pipeline starts with warm caches from the last run.

Use Cases

CI/CD Pipelines

CI agents on Kubernetes typically get a fresh node with empty caches. Every docker build, npm install, or go mod download starts from scratch. Stratos nodes retain build caches across runs — your second pipeline is dramatically faster than the first, and every run after that benefits from the warm cache.

LLM / AI Model Serving

Model images are often 10-50GB+. Downloading them on every scale-up makes autoscaling impractical. With Stratos, the model image is pre-pulled during warmup and persists on the node's EBS volume. Scaling out goes from 15+ minutes to under 2.

Scale-to-Zero

Stratos's ~20-second startup makes true scale-to-zero viable. Pair it with an ingress doorman that holds requests for up to 30 seconds — when traffic hits a scaled-down service, a standby node starts and begins serving before the timeout. No idle compute, no cold-start frustration.

See the Use Cases guide for detailed configurations.

Key Features

  • Instant capacity: Start pre-warmed nodes in ~20 seconds
  • Warm caches: Nodes retain Docker layers, build caches, downloaded models, and local state across restarts
  • Pre-pulled images: DaemonSet images pulled automatically during warmup; configure additional images via preWarm.imagesToPull
  • Cost-efficient: Stopped instances only incur EBS storage costs
  • CNI-aware: Handles startup taints for VPC CNI, Cilium, and Calico
  • Kubernetes-native: Declarative NodePool and AWSNodeClass CRDs
  • Cloud-agnostic design: Built with a provider abstraction layer (AWS supported)

Quick Start

1. Install with Helm

helm install stratos oci://ghcr.io/stratos-sh/charts/stratos \
--namespace stratos-system --create-namespace \
--set clusterName=my-cluster

2. Create an AWSNodeClass and NodePool

awsnodeclass.yaml
apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
name: workers
spec:
instanceType: m5.large
ami: ami-0123456789abcdef0
subnetIds: ["subnet-12345678"]
securityGroupIds: ["sg-12345678"]
iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/node-role
userData: |
#!/bin/bash
/etc/eks/bootstrap.sh my-cluster \
--kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
until curl -sf http://localhost:10248/healthz; do sleep 5; done
sleep 30
poweroff
nodepool.yaml
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: workers
spec:
poolSize: 10
minStandby: 3
template:
nodeClassRef:
kind: AWSNodeClass
name: workers
labels:
stratos.sh/pool: workers
startupTaints:
- key: node.eks.amazonaws.com/not-ready
value: "true"
effect: NoSchedule
kubectl apply -f awsnodeclass.yaml
kubectl apply -f nodepool.yaml

3. Watch Nodes Scale

kubectl get nodes -l stratos.sh/pool=workers -w

How It Works

                    +---------+
| warmup | Launch, join cluster, pull images, run setup
+----+----+
|
self-stop |
v
+---------+
| standby | Stopped — disk retains all state
+----+----+
|
scale-up |
(start instance)|
v
+---------+
| running | Serving pods, accumulating caches
+----+----+
|
scale-down |
(drain & stop) |
v
+---------+
| standby | Back to pool — caches preserved
+---------+

Next Steps