Quickstart

This guide walks you through creating your first NodePool and testing scale-up.

Prerequisites

Before creating a NodePool, ensure you have:

Stratos installed via Helm (see Installation)
AWS credentials configured (see AWS Setup)
EKS-optimized AMI ID for your region
Subnet and security group IDs
IAM instance profile for worker nodes

Understanding the NodeClass Pattern

Stratos separates AWS-specific configuration from pool management:

AWSNodeClass: Defines EC2 instance configuration (instance type, AMI, networking, IAM, user data)
NodePool: Defines pool sizing, scaling behavior, and node template (labels, taints)

You create an AWSNodeClass first, then create a NodePool that references it.

+------------------+       references        +------------------+
|    NodePool      | ----------------------> |  AWSNodeClass    |
+------------------+                         +------------------+
| - poolSize: 10   |                         | - instanceType   |
| - minStandby: 3  |                         | - ami            |
| - labels, taints |                         | - subnetIds      |
+------------------+                         | - userData       |
                                             +------------------+

Step 1: Prepare the User Data Script

The user data script runs during the warmup phase. It must:

Join the Kubernetes cluster
Register with startup taints
Wait for the node to be healthy
Self-stop (poweroff) when ready

Important

The startup taint in --register-with-taints must match the startupTaints field in the NodePool spec. Mismatched taints will cause scheduling issues.

Example for EKS:

user-data.sh
#!/bin/bash
set -e

# Join the EKS cluster with a startup taint
# The taint prevents pod scheduling until CNI is ready
/etc/eks/bootstrap.sh my-cluster \
  --kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'

# Wait for kubelet to be healthy
until curl -sf http://localhost:10248/healthz >/dev/null 2>&1; do
  sleep 5
done

# Give time for node to fully register
sleep 30

# Note: Stratos automatically pre-pulls DaemonSet images during warmup.
# You can configure additional images via preWarm.imagesToPull in the NodePool spec.

# Signal warmup complete by stopping the instance
poweroff

Using Bottlerocket?

Bottlerocket uses TOML configuration and doesn't support shell scripts. Use ControllerStop completion mode instead:

preWarm:
  completionMode: ControllerStop

In this mode, Stratos stops the instance when the node becomes Ready, eliminating the need for a poweroff script. See Bottlerocket Setup for details.

Step 2: Create the AWSNodeClass

Create a file named awsnodeclass-workers.yaml:

awsnodeclass-workers.yaml
apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
  name: workers
spec:
  region: us-east-1
  instanceType: m5.large
  ami: ami-0123456789abcdef0  # Your EKS-optimized AMI
  subnetIds:
    - subnet-12345678
    - subnet-87654321
  securityGroupIds:
    - sg-12345678
  iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/eks-node-role

  # User data script that joins cluster and self-stops
  userData: |
    #!/bin/bash
    set -e
    /etc/eks/bootstrap.sh my-cluster \
      --kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
    until curl -sf http://localhost:10248/healthz >/dev/null 2>&1; do sleep 5; done
    sleep 30
    poweroff

  blockDeviceMappings:
    - deviceName: /dev/xvda
      volumeSize: 50
      volumeType: gp3
      encrypted: true

  tags:
    Environment: production
    ManagedBy: stratos

Apply the AWSNodeClass:

kubectl apply -f awsnodeclass-workers.yaml

Verify it was created:

kubectl get awsnodeclasses

Step 3: Create the NodePool

Create a file named nodepool-workers.yaml:

nodepool-workers.yaml
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
  name: workers
spec:
  # Maximum nodes in the pool (standby + running)
  poolSize: 10

  # Minimum standby nodes to maintain
  minStandby: 3

  # Reconciliation interval
  reconciliationInterval: 30s

  template:
    # Reference to AWSNodeClass for EC2 configuration
    nodeClassRef:
      kind: AWSNodeClass
      name: workers

    # Labels applied to managed nodes
    labels:
      stratos.sh/pool: workers
      node-role.kubernetes.io/worker: ""

    # Startup taints - MUST match --register-with-taints in userData
    # Stratos removes these when the CNI is ready
    startupTaints:
      - key: node.eks.amazonaws.com/not-ready
        value: "true"
        effect: NoSchedule

    # How startup taints are removed
    startupTaintRemoval: WhenNetworkReady

  # Pre-warm configuration
  preWarm:
    timeout: 15m
    timeoutAction: terminate

  # Scale-down configuration
  scaleDown:
    enabled: true
    emptyNodeTTL: 5m
    drainTimeout: 5m

Apply the NodePool:

kubectl apply -f nodepool-workers.yaml

Order Matters

The AWSNodeClass must exist before creating the NodePool. If the NodePool references a non-existent AWSNodeClass, it will be marked as Degraded with reason NodeClassNotFound.

Step 4: Verify the NodePool

Check the NodePool status:

kubectl get nodepools

Expected output:

NAME      POOLSIZE   MINSTANDBY   STANDBY   RUNNING   READY   AGE
workers   10         3            0         0         False   10s

Watch nodes being created:

kubectl get nodes -l stratos.sh/pool=workers -w

View detailed status:

kubectl describe nodepool workers

Check the AWSNodeClass status:

kubectl describe awsnodeclass workers

note

Initial warmup takes several minutes. Nodes will transition through warmup to standby state.

After warmup completes (3-5 minutes), you should see:

NAME      POOLSIZE   MINSTANDBY   STANDBY   RUNNING   READY   AGE
workers   10         3            3         0         True    5m

Step 5: Test Scale-Up

Create a deployment that requests resources:

test-workload.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-workload
spec:
  replicas: 5
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
        - name: nginx
          image: nginx:latest
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"

kubectl apply -f test-workload.yaml

When pods become pending due to insufficient capacity, Stratos will:

Detect the unschedulable pods
Start standby nodes (pre-warmed instances resume in ~15-20 seconds)
Nodes become Ready in ~20-25 seconds total
Pods are scheduled on the newly available nodes

This is roughly half the time of Karpenter (~40-50 seconds) because Stratos nodes have already completed boot, cluster join, CNI initialization, and DaemonSet image pulls during the warmup phase.

Monitor the process:

# Watch pods
kubectl get pods -w

# Watch nodes
kubectl get nodes -l stratos.sh/pool=workers -w

# Check NodePool events
kubectl describe nodepool workers

Step 6: Test Scale-Down

Delete the test workload:

kubectl delete deployment test-workload

After emptyNodeTTL (5 minutes by default), Stratos will:

Detect empty nodes
Drain and cordon the nodes
Stop the instances
Return them to standby state

Troubleshooting

NodePool Shows Degraded with NodeClassNotFound

The AWSNodeClass doesn't exist or has a different name:

# Check if AWSNodeClass exists
kubectl get awsnodeclasses

# Check the NodePool's nodeClassRef
kubectl get nodepool workers -o jsonpath='{.spec.template.nodeClassRef}'

Nodes Stuck in Warmup

Check the user data script output:

aws ec2 get-console-output --instance-id <instance-id>

Common issues:

User data script failing
Missing poweroff command
Network issues preventing cluster join

Scale-Up Not Triggering

Verify standby nodes are available:

kubectl get nodes -l stratos.sh/pool=workers,stratos.sh/state=standby

Check controller logs:

kubectl -n stratos-system logs deployment/stratos

AWSNodeClass Status Shows Issues

Check the AWSNodeClass conditions:

kubectl describe awsnodeclass workers

Look for:

Valid condition: Is the spec valid?
InUse condition: Is it referenced by NodePools?

Next Steps

Node Lifecycle - Understand node states
AWSNodeClass Reference - Complete AWSNodeClass API
Scaling Policies - Configure scaling behavior
Monitoring - Set up monitoring and alerts

Prerequisites​

Understanding the NodeClass Pattern​

Step 1: Prepare the User Data Script​

Step 2: Create the AWSNodeClass​

Step 3: Create the NodePool​

Step 4: Verify the NodePool​

Step 5: Test Scale-Up​

Step 6: Test Scale-Down​

Troubleshooting​

NodePool Shows Degraded with NodeClassNotFound​

Nodes Stuck in Warmup​

Scale-Up Not Triggering​

AWSNodeClass Status Shows Issues​

Next Steps​