Skip to main content

NodePool API Reference

The NodePool resource is a cluster-scoped custom resource that defines a pool of pre-warmed instances.

Resource Definition

apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: <pool-name>
spec:
poolSize: <int32>
minStandby: <int32>
reconciliationInterval: <duration>
maxNodeRuntime: <duration>
template:
labels: <map[string]string>
taints: <[]Taint>
startupTaints: <[]Taint>
startupTaintRemoval: <string>
nodeClassRef:
kind: <string>
name: <string>
preWarm:
timeout: <duration>
timeoutAction: <string>
completionMode: <string>
scaleUp:
defaultPodResources: <ResourceRequirements>
scaleDown:
enabled: <bool>
emptyNodeTTL: <duration>
drainTimeout: <duration>
status:
conditions: <[]Condition>
observedGeneration: <int64>
warmup: <int32>
standby: <int32>
running: <int32>
total: <int32>
lastReconcileTime: <Time>

Spec Fields

Core Configuration

FieldTypeRequiredDefaultDescription
poolSizeint32Yes-Maximum total nodes in the pool (standby + running, excluding warmup). Range: 1-1000.
minStandbyint32Yes-Minimum number of nodes to maintain in standby (stopped) state. Must be ≤ poolSize.
reconciliationIntervaldurationNo30sHow often to run the maintenance reconciliation loop.
maxNodeRuntimedurationNodisabledMaximum time a node can run before being recycled. Set to 0 or omit to disable.

Template

The template field defines the configuration for nodes in this pool.

FieldTypeRequiredDefaultDescription
nodeClassRefNodeClassRefYes-Reference to the cloud-specific NodeClass (e.g., AWSNodeClass) containing instance configuration.
labelsmap[string]stringNo-Labels to apply to managed nodes. The stratos.sh/pool label is automatically added.
taints[]TaintNo-Permanent taints for workload isolation. These persist throughout the node lifecycle.
startupTaints[]TaintNo-Taints applied during startup, removed when CNI is ready. Must match --register-with-taints in userData.
startupTaintRemovalstringNoWhenNetworkReadyHow startup taints are removed: WhenNetworkReady or External.

NodeClassRef

The nodeClassRef field references a cloud-specific NodeClass resource that defines instance configuration.

FieldTypeRequiredDescription
kindstringYesNodeClass kind. Currently only AWSNodeClass is supported.
namestringYesName of the NodeClass resource to reference.
template:
nodeClassRef:
kind: AWSNodeClass
name: standard-nodes
note

The referenced NodeClass must exist before creating the NodePool. If the NodeClass is not found, the NodePool will be marked as degraded with reason NodeClassNotFound.

Startup Taint Removal Modes

ModeDescription
WhenNetworkReadyStratos monitors network conditions and removes taints when the CNI is ready. Supports EKS VPC CNI, Cilium, and Calico.
ExternalStratos waits for an external controller (like the CNI plugin) to remove the taints.

PreWarm Configuration

FieldTypeRequiredDefaultDescription
timeoutdurationNo10mMaximum time to wait for warmup to complete. In SelfStop mode, how long to wait for instance to self-stop. In ControllerStop mode, how long to wait for node to become Ready.
timeoutActionstringNostopAction when warmup times out: stop (force stop) or terminate (terminate instance).
completionModestringNoSelfStopHow warmup completes: SelfStop or ControllerStop.

Warmup Completion Modes

ModeDescription
SelfStopInstance self-stops via userdata script after joining the cluster (default, existing behavior).
ControllerStopStratos stops the instance when the node becomes Ready. Use for OS images that cannot run shutdown scripts (e.g., Bottlerocket).

ScaleUp Configuration

FieldTypeRequiredDefaultDescription
defaultPodResourcesResourceRequirementsNo-Default resource requests for pods without explicit requests. Used in scale-up calculations.

ScaleDown Configuration

FieldTypeRequiredDefaultDescription
enabledboolNotrueWhether automatic scale-down is enabled.
emptyNodeTTLdurationNo5mHow long a node must be empty before scale-down.
drainTimeoutdurationNo5mMaximum time to wait for node drain before force-stopping.

Status Fields

The NodePool status is updated by the controller:

FieldTypeDescription
conditions[]ConditionCurrent conditions (Ready, Degraded, etc.)
observedGenerationint64Last observed spec generation
warmupint32Count of nodes in warmup state
standbyint32Count of nodes in standby state
runningint32Count of nodes in running state
totalint32Total managed node count
lastReconcileTimeTimeWhen the pool was last reconciled

Conditions

ConditionDescription
ReadyPool has at least minStandby nodes in standby state
DegradedPool cannot meet minStandby (validation errors, cloud issues, NodeClass not found)
ReconcilingPool is being reconciled
ScaleUpInProgressScale-up operation in progress
ScaleDownInProgressScale-down operation in progress

Examples

Minimal Configuration

First, create an AWSNodeClass:

awsnodeclass.yaml
apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
name: standard-nodes
spec:
instanceType: m5.large
ami: ami-0123456789abcdef0
subnetIds:
- subnet-12345678
securityGroupIds:
- sg-12345678
iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/node-role
userData: |
#!/bin/bash
/etc/eks/bootstrap.sh my-cluster \
--kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
until curl -sf http://localhost:10248/healthz; do sleep 5; done
sleep 30
poweroff

Then create the NodePool referencing it:

nodepool-minimal.yaml
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: workers
spec:
poolSize: 5
minStandby: 2
template:
nodeClassRef:
kind: AWSNodeClass
name: standard-nodes
labels:
stratos.sh/pool: workers
startupTaints:
- key: node.eks.amazonaws.com/not-ready
value: "true"
effect: NoSchedule

Full Configuration

awsnodeclass-full.yaml
apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
name: production-nodes
spec:
region: us-east-1
instanceType: m5.large
ami: ami-0123456789abcdef0
subnetIds:
- subnet-12345678
- subnet-87654321
securityGroupIds:
- sg-12345678
iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/node-role
userData: |
#!/bin/bash
set -e
/etc/eks/bootstrap.sh my-cluster \
--kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
until curl -sf http://localhost:10248/healthz >/dev/null 2>&1; do
sleep 5
done
sleep 30
poweroff
blockDeviceMappings:
- deviceName: /dev/xvda
volumeSize: 100
volumeType: gp3
encrypted: true
iops: 3000
throughput: 125
tags:
Environment: production
Team: platform
CostCenter: engineering
nodepool-full.yaml
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: workers
spec:
poolSize: 20
minStandby: 5
reconciliationInterval: 30s
maxNodeRuntime: 24h

template:
nodeClassRef:
kind: AWSNodeClass
name: production-nodes
labels:
stratos.sh/pool: workers
node-role.kubernetes.io/worker: ""
environment: production
taints:
- key: dedicated
value: workers
effect: NoSchedule
startupTaints:
- key: node.eks.amazonaws.com/not-ready
value: "true"
effect: NoSchedule
startupTaintRemoval: WhenNetworkReady

preWarm:
timeout: 15m
timeoutAction: terminate

scaleUp:
defaultPodResources:
requests:
cpu: "500m"
memory: "1Gi"

scaleDown:
enabled: true
emptyNodeTTL: 5m
drainTimeout: 5m

Cilium CNI Configuration

nodepool-cilium.yaml
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: cilium-workers
spec:
poolSize: 10
minStandby: 3
template:
nodeClassRef:
kind: AWSNodeClass
name: standard-nodes
labels:
stratos.sh/pool: cilium-workers
startupTaints:
- key: node.cilium.io/agent-not-ready
value: "true"
effect: NoSchedule
startupTaintRemoval: External # Cilium removes the taint

Bottlerocket with ControllerStop Mode

Bottlerocket uses TOML configuration and doesn't support shell scripts in user data. Use ControllerStop mode so Stratos manages the warmup-to-standby transition.

awsnodeclass-bottlerocket.yaml
apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
name: bottlerocket-nodes
spec:
instanceType: m5.large
ami: ami-bottlerocket-xxxx # Bottlerocket EKS-optimized AMI
subnetIds:
- subnet-12345678
securityGroupIds:
- sg-12345678
iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/node-role
blockDeviceMappings:
- deviceName: /dev/xvda
volumeSize: 8
volumeType: gp3
encrypted: true
- deviceName: /dev/xvdb # Bottlerocket data volume
volumeSize: 20
volumeType: gp3
encrypted: true
# Bottlerocket TOML user data - no shutdown script needed
userData: |
[settings.kubernetes]
cluster-name = "my-cluster"
api-server = "https://my-cluster.region.eks.amazonaws.com"
cluster-certificate = "base64-encoded-ca-cert"

[settings.kubernetes.node-taints]
"node.eks.amazonaws.com/not-ready" = "true:NoSchedule"

[settings.kubernetes.node-labels]
"stratos.sh/pool" = "bottlerocket-workers"
nodepool-bottlerocket.yaml
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: bottlerocket-workers
spec:
poolSize: 10
minStandby: 3
preWarm:
# ControllerStop: Stratos stops instance when node is Ready
completionMode: ControllerStop
timeout: 10m
timeoutAction: stop
template:
nodeClassRef:
kind: AWSNodeClass
name: bottlerocket-nodes
labels:
stratos.sh/pool: bottlerocket-workers
startupTaints:
- key: node.eks.amazonaws.com/not-ready
value: "true"
effect: NoSchedule
startupTaintRemoval: WhenNetworkReady

Multiple NodePools Sharing an AWSNodeClass

One AWSNodeClass can be shared by multiple NodePools with different scaling configurations:

shared-awsnodeclass.yaml
apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
name: shared-ec2-config
spec:
instanceType: m5.large
ami: ami-0123456789abcdef0
subnetIds:
- subnet-12345678
- subnet-87654321
securityGroupIds:
- sg-12345678
iamInstanceProfile: arn:aws:iam::123456789012:instance-profile/node-role
userData: |
#!/bin/bash
/etc/eks/bootstrap.sh my-cluster \
--kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
until curl -sf http://localhost:10248/healthz; do sleep 5; done
sleep 30
poweroff
---
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: workers-low
spec:
poolSize: 5
minStandby: 1
template:
nodeClassRef:
kind: AWSNodeClass
name: shared-ec2-config
labels:
stratos.sh/pool: workers-low
tier: low-priority
---
apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
name: workers-high
spec:
poolSize: 20
minStandby: 5
template:
nodeClassRef:
kind: AWSNodeClass
name: shared-ec2-config
labels:
stratos.sh/pool: workers-high
tier: high-priority

Kubectl Commands

# List all NodePools
kubectl get nodepools

# Short name
kubectl get np

# Get detailed status
kubectl describe nodepool workers

# Get YAML output
kubectl get nodepool workers -o yaml

# Watch NodePool status
kubectl get nodepools -w

# See which AWSNodeClass each NodePool references
kubectl get nodepools -o custom-columns='NAME:.metadata.name,NODECLASS:.spec.template.nodeClassRef.name'

Next Steps