Use Cases

Stratos is designed for workloads where cold-start latency is a bottleneck. Below are three scenarios where pre-warmed nodes make a significant difference.

CI/CD Pipelines

The Problem

CI/CD agents running on Kubernetes (Jenkins agents, GitHub Actions runners, GitLab runners, etc.) are typically ephemeral pods that scale based on pipeline demand. When a pipeline triggers and no capacity is available, traditional autoscalers spin up a fresh node. This creates a cascade of delays:

Node provisioning — The autoscaler requests a new instance from the cloud provider (2-5 minutes)
DaemonSet image pulls — Every DaemonSet (logging, monitoring, CNI, etc.) pulls its images from scratch on the new node
CI agent image pull — The runner/agent container image is pulled
Pipeline cold cache — Since the node is brand new, there are no Docker layer caches, no node_modules caches, no Go module caches — every docker build, npm install, or go mod download starts from zero

With Karpenter this is faster (~40-50 seconds to node ready), but you still get a completely cold environment every time.

How Stratos Helps

Stratos fundamentally changes this in two ways:

Pre-pulled images. During the warmup phase, Stratos automatically pulls all DaemonSet images. You can also configure additional images to pre-pull (like your CI agent image) via preWarm.imagesToPull. When a pipeline triggers, the node starts in ~20 seconds with images already on disk.

Persistent caches. Unlike traditional autoscalers that terminate instances after use, Stratos stops instances and returns them to standby. The EBS volume — with all its caches — survives across runs. Your Docker layer cache, package manager caches, and build artifacts persist. The second pipeline run on a Stratos node is dramatically faster than the first, and every subsequent run benefits from the warm cache.

Example Configuration

apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
  name: ci-runners
spec:
  instanceType: m5.2xlarge
  ami: ami-0123456789abcdef0
  subnetIds: ["subnet-12345678"]
  securityGroupIds: ["sg-12345678"]
  iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/ci-node-role
  blockDeviceMappings:
    - deviceName: /dev/xvda
      volumeSize: 200  # Large disk for Docker/build caches
      volumeType: gp3
      encrypted: true
  userData: |
    #!/bin/bash
    set -e
    /etc/eks/bootstrap.sh my-cluster \
      --kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
    until curl -sf http://localhost:10248/healthz >/dev/null 2>&1; do sleep 5; done
    sleep 30
    poweroff
---
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
  name: ci-runners
spec:
  poolSize: 20
  minStandby: 5
  template:
    nodeClassRef:
      kind: AWSNodeClass
      name: ci-runners
    labels:
      stratos.sh/pool: ci-runners
      node-role.kubernetes.io/ci: ""
    startupTaints:
      - key: node.eks.amazonaws.com/not-ready
        value: "true"
        effect: NoSchedule
    startupTaintRemoval: WhenNetworkReady
  preWarm:
    timeout: 15m
    timeoutAction: terminate
  scaleDown:
    enabled: true
    emptyNodeTTL: 10m
    drainTimeout: 5m

LLM / AI Model Serving

The Problem

Large language models and AI inference workloads have notoriously long startup times. A typical cold start involves:

Node provisioning — Standard cloud instance launch (2-5 minutes)
Model image pull — Model container images are often 10-50GB+, taking 5-15 minutes to download
Model loading — Loading model weights into memory (or GPU VRAM) takes additional minutes
Health check warm-up — The model needs to process a few warm-up requests before serving at full speed

Total cold-start time can exceed 15-20 minutes. When demand spikes, users wait in queues or requests time out. Scaling becomes impractical if every new replica takes that long to become ready.

How Stratos Helps

Stratos addresses the most time-consuming part of this process: the image pull. During the warmup phase, the model container image is pre-pulled onto the node's EBS volume. Since Stratos reuses nodes (stop/start rather than terminate/launch), the image persists on disk across scale events.

When demand spikes and a new replica is needed:

Stratos starts a standby node in ~20 seconds
The model image is already on disk — no download needed
Only model loading into memory remains, cutting total startup from 15+ minutes to under 2 minutes

You can also use the user data script during warmup to pre-download model weights to a persistent volume, further reducing startup time.

Example Configuration

apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
  name: inference
spec:
  instanceType: g5.2xlarge  # GPU instance
  ami: ami-0123456789abcdef0  # GPU-optimized AMI
  subnetIds: ["subnet-12345678"]
  securityGroupIds: ["sg-12345678"]
  iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/inference-node-role
  blockDeviceMappings:
    - deviceName: /dev/xvda
      volumeSize: 500  # Large disk for model images
      volumeType: gp3
      iops: 16000
      throughput: 1000
      encrypted: true
  userData: |
    #!/bin/bash
    set -e
    /etc/eks/bootstrap.sh my-cluster \
      --kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
    until curl -sf http://localhost:10248/healthz >/dev/null 2>&1; do sleep 5; done
    sleep 30
    poweroff
---
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
  name: inference
spec:
  poolSize: 10
  minStandby: 2
  template:
    nodeClassRef:
      kind: AWSNodeClass
      name: inference
    labels:
      stratos.sh/pool: inference
      node-role.kubernetes.io/inference: ""
    startupTaints:
      - key: node.eks.amazonaws.com/not-ready
        value: "true"
        effect: NoSchedule
    startupTaintRemoval: WhenNetworkReady
  preWarm:
    timeout: 20m
    timeoutAction: terminate
    imagesToPull:
      - "my-registry.com/llama-3:70b"  # Pre-pull the model image during warmup
  scaleDown:
    enabled: true
    emptyNodeTTL: 30m  # Keep nodes longer to avoid re-pulling models
    drainTimeout: 5m

Scale-to-Zero Applications

The Problem

Scale-to-zero is the holy grail of cost optimization — pay nothing when there's no traffic. But traditional Kubernetes autoscaling makes this impractical for latency-sensitive services:

Karpenter/Cluster Autoscaler: 40 seconds to 5+ minutes to provision a new node
KEDA + HPA: Can scale pods to zero, but if no node is available, you're back to waiting for node provisioning
Knative/Serverless: Designed for scale-to-zero but still bound by underlying node availability

Most teams settle for "scale-to-one" instead, keeping at least one replica running 24/7 to avoid cold starts. This wastes resources for services with low or sporadic traffic.

How Stratos Helps

Stratos's ~20-second pending-to-running time (when properly configured) makes true scale-to-zero viable. The pattern is straightforward:

Scale pods to zero when idle using KEDA, CronJobs, or custom logic
Use an ingress doorman (a lightweight proxy that holds incoming requests)
When a request arrives at a scaled-down service, the doorman triggers pod creation
The pod becomes Pending, Stratos starts a standby node in ~20 seconds
The pod schedules and starts serving — all within a 30-second timeout window

This works because Stratos has already done all the heavy lifting during warmup. The node is fully initialized — kubelet running, CNI configured, DaemonSet images pulled. Starting it from standby is just a cloud API call to resume a stopped instance.

Example Configuration

apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
  name: on-demand
spec:
  instanceType: m5.large
  ami: ami-0123456789abcdef0
  subnetIds: ["subnet-12345678"]
  securityGroupIds: ["sg-12345678"]
  iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/node-role
  userData: |
    #!/bin/bash
    set -e
    /etc/eks/bootstrap.sh my-cluster \
      --kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
    until curl -sf http://localhost:10248/healthz >/dev/null 2>&1; do sleep 5; done
    sleep 30
    poweroff
---
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
  name: on-demand
spec:
  poolSize: 10
  minStandby: 2  # Keep 2 nodes ready for instant scale-up
  template:
    nodeClassRef:
      kind: AWSNodeClass
      name: on-demand
    labels:
      stratos.sh/pool: on-demand
    startupTaints:
      - key: node.eks.amazonaws.com/not-ready
        value: "true"
        effect: NoSchedule
    startupTaintRemoval: WhenNetworkReady
  scaleDown:
    enabled: true
    emptyNodeTTL: 2m   # Return nodes to standby quickly
    drainTimeout: 30s

With this setup, you can confidently scale services to zero knowing that when traffic arrives, a node will be available in seconds — not minutes.

Next Steps

Quickstart - Set up your first NodePool
Scaling Policies - Fine-tune scale-up and scale-down behavior
Monitoring - Track pool health and scale-up latency

CI/CD Pipelines​

The Problem​

How Stratos Helps​

Example Configuration​

LLM / AI Model Serving​

The Problem​

How Stratos Helps​

Example Configuration​

Scale-to-Zero Applications​

The Problem​

How Stratos Helps​

Example Configuration​

Next Steps​

CI/CD Pipelines

The Problem

How Stratos Helps

Example Configuration

LLM / AI Model Serving

The Problem

How Stratos Helps

Example Configuration

Scale-to-Zero Applications

The Problem

How Stratos Helps

Example Configuration

Next Steps