Use Cases
Stratos is designed for workloads where cold-start latency is a bottleneck. Below are three scenarios where pre-warmed nodes make a significant difference.
CI/CD Pipelines
The Problem
CI/CD agents running on Kubernetes (Jenkins agents, GitHub Actions runners, GitLab runners, etc.) are typically ephemeral pods that scale based on pipeline demand. When a pipeline triggers and no capacity is available, traditional autoscalers spin up a fresh node. This creates a cascade of delays:
- Node provisioning — The autoscaler requests a new instance from the cloud provider (2-5 minutes)
- DaemonSet image pulls — Every DaemonSet (logging, monitoring, CNI, etc.) pulls its images from scratch on the new node
- CI agent image pull — The runner/agent container image is pulled
- Pipeline cold cache — Since the node is brand new, there are no Docker layer caches, no
node_modulescaches, no Go module caches — everydocker build,npm install, orgo mod downloadstarts from zero
With Karpenter this is faster (~40-50 seconds to node ready), but you still get a completely cold environment every time.
How Stratos Helps
Stratos fundamentally changes this in two ways:
Pre-pulled images. During the warmup phase, Stratos automatically pulls all DaemonSet images. You can also configure additional images to pre-pull (like your CI agent image) via preWarm.imagesToPull. When a pipeline triggers, the node starts in ~20 seconds with images already on disk.
Persistent caches. Unlike traditional autoscalers that terminate instances after use, Stratos stops instances and returns them to standby. The EBS volume — with all its caches — survives across runs. Your Docker layer cache, package manager caches, and build artifacts persist. The second pipeline run on a Stratos node is dramatically faster than the first, and every subsequent run benefits from the warm cache.
Example Configuration
apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
name: ci-runners
spec:
instanceType: m5.2xlarge
ami: ami-0123456789abcdef0
subnetIds: ["subnet-12345678"]
securityGroupIds: ["sg-12345678"]
iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/ci-node-role
blockDeviceMappings:
- deviceName: /dev/xvda
volumeSize: 200 # Large disk for Docker/build caches
volumeType: gp3
encrypted: true
userData: |
#!/bin/bash
set -e
/etc/eks/bootstrap.sh my-cluster \
--kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
until curl -sf http://localhost:10248/healthz >/dev/null 2>&1; do sleep 5; done
sleep 30
poweroff
---
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: ci-runners
spec:
poolSize: 20
minStandby: 5
template:
nodeClassRef:
kind: AWSNodeClass
name: ci-runners
labels:
stratos.sh/pool: ci-runners
node-role.kubernetes.io/ci: ""
startupTaints:
- key: node.eks.amazonaws.com/not-ready
value: "true"
effect: NoSchedule
startupTaintRemoval: WhenNetworkReady
preWarm:
timeout: 15m
timeoutAction: terminate
scaleDown:
enabled: true
emptyNodeTTL: 10m
drainTimeout: 5m
LLM / AI Model Serving
The Problem
Large language models and AI inference workloads have notoriously long startup times. A typical cold start involves:
- Node provisioning — Standard cloud instance launch (2-5 minutes)
- Model image pull — Model container images are often 10-50GB+, taking 5-15 minutes to download
- Model loading — Loading model weights into memory (or GPU VRAM) takes additional minutes
- Health check warm-up — The model needs to process a few warm-up requests before serving at full speed
Total cold-start time can exceed 15-20 minutes. When demand spikes, users wait in queues or requests time out. Scaling becomes impractical if every new replica takes that long to become ready.
How Stratos Helps
Stratos addresses the most time-consuming part of this process: the image pull. During the warmup phase, the model container image is pre-pulled onto the node's EBS volume. Since Stratos reuses nodes (stop/start rather than terminate/launch), the image persists on disk across scale events.
When demand spikes and a new replica is needed:
- Stratos starts a standby node in ~20 seconds
- The model image is already on disk — no download needed
- Only model loading into memory remains, cutting total startup from 15+ minutes to under 2 minutes
You can also use the user data script during warmup to pre-download model weights to a persistent volume, further reducing startup time.
Example Configuration
apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
name: inference
spec:
instanceType: g5.2xlarge # GPU instance
ami: ami-0123456789abcdef0 # GPU-optimized AMI
subnetIds: ["subnet-12345678"]
securityGroupIds: ["sg-12345678"]
iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/inference-node-role
blockDeviceMappings:
- deviceName: /dev/xvda
volumeSize: 500 # Large disk for model images
volumeType: gp3
iops: 16000
throughput: 1000
encrypted: true
userData: |
#!/bin/bash
set -e
/etc/eks/bootstrap.sh my-cluster \
--kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
until curl -sf http://localhost:10248/healthz >/dev/null 2>&1; do sleep 5; done
sleep 30
poweroff
---
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: inference
spec:
poolSize: 10
minStandby: 2
template:
nodeClassRef:
kind: AWSNodeClass
name: inference
labels:
stratos.sh/pool: inference
node-role.kubernetes.io/inference: ""
startupTaints:
- key: node.eks.amazonaws.com/not-ready
value: "true"
effect: NoSchedule
startupTaintRemoval: WhenNetworkReady
preWarm:
timeout: 20m
timeoutAction: terminate
imagesToPull:
- "my-registry.com/llama-3:70b" # Pre-pull the model image during warmup
scaleDown:
enabled: true
emptyNodeTTL: 30m # Keep nodes longer to avoid re-pulling models
drainTimeout: 5m
Scale-to-Zero Applications
The Problem
Scale-to-zero is the holy grail of cost optimization — pay nothing when there's no traffic. But traditional Kubernetes autoscaling makes this impractical for latency-sensitive services:
- Karpenter/Cluster Autoscaler: 40 seconds to 5+ minutes to provision a new node
- KEDA + HPA: Can scale pods to zero, but if no node is available, you're back to waiting for node provisioning
- Knative/Serverless: Designed for scale-to-zero but still bound by underlying node availability
Most teams settle for "scale-to-one" instead, keeping at least one replica running 24/7 to avoid cold starts. This wastes resources for services with low or sporadic traffic.
How Stratos Helps
Stratos's ~20-second pending-to-running time (when properly configured) makes true scale-to-zero viable. The pattern is straightforward:
- Scale pods to zero when idle using KEDA, CronJobs, or custom logic
- Use an ingress doorman (a lightweight proxy that holds incoming requests)
- When a request arrives at a scaled-down service, the doorman triggers pod creation
- The pod becomes Pending, Stratos starts a standby node in ~20 seconds
- The pod schedules and starts serving — all within a 30-second timeout window
This works because Stratos has already done all the heavy lifting during warmup. The node is fully initialized — kubelet running, CNI configured, DaemonSet images pulled. Starting it from standby is just a cloud API call to resume a stopped instance.
Example Configuration
apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
name: on-demand
spec:
instanceType: m5.large
ami: ami-0123456789abcdef0
subnetIds: ["subnet-12345678"]
securityGroupIds: ["sg-12345678"]
iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/node-role
userData: |
#!/bin/bash
set -e
/etc/eks/bootstrap.sh my-cluster \
--kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
until curl -sf http://localhost:10248/healthz >/dev/null 2>&1; do sleep 5; done
sleep 30
poweroff
---
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: on-demand
spec:
poolSize: 10
minStandby: 2 # Keep 2 nodes ready for instant scale-up
template:
nodeClassRef:
kind: AWSNodeClass
name: on-demand
labels:
stratos.sh/pool: on-demand
startupTaints:
- key: node.eks.amazonaws.com/not-ready
value: "true"
effect: NoSchedule
startupTaintRemoval: WhenNetworkReady
scaleDown:
enabled: true
emptyNodeTTL: 2m # Return nodes to standby quickly
drainTimeout: 30s
With this setup, you can confidently scale services to zero knowing that when traffic arrives, a node will be available in seconds — not minutes.
Next Steps
- Quickstart - Set up your first NodePool
- Scaling Policies - Fine-tune scale-up and scale-down behavior
- Monitoring - Track pool health and scale-up latency