
I was eager to explore OpenShift’s autoscaling capabilities, specifically the Horizontal Pod Autoscaler (HPA).
But I did not just want to watch replicas scale up and down in the cluster view. I wanted clear, observable proof that those replicas were alive, serving traffic, and generating logs that could be traced back to individual pods and nodes.
The goal was simple. Build a demo that behaves the way a real deployment should, including production-correct storage using dynamically provisioned PersistentVolumeClaims (PVCs).
nfs-dynamic)nfs-dynamic)pod, pod_ip, node, namespaceThe intent was to validate autoscaling in a way that is observable, repeatable, and aligned with how workloads should be built from the start, not something bolted on afterward.
After I finished this up I realized this would be a great first real AWX play or using Helm so I will probably roll that up to a post at some point
oc new-project grafana-demo
oc project grafana-demo
oc apply -f - <<'YAML'
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: loki-data
namespace: grafana-demo
spec:
accessModes: [ReadWriteMany]
storageClassName: nfs-dynamic
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-data
namespace: grafana-demo
spec:
accessModes: [ReadWriteMany]
storageClassName: nfs-dynamic
resources:
requests:
storage: 5Gi
YAML
Wait until both are Bound:
oc get pvc -n grafana-demo
oc apply -f - <<'YAML'
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-config
namespace: grafana-demo
data:
loki.yaml: |
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
replication_factor: 1
ring:
kvstore:
store: inmemory
ingester:
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
chunk_idle_period: 1h
chunk_retain_period: 30s
schema_config:
configs:
- from: 2024-01-01
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: /loki/chunks
compactor:
working_directory: /loki/compactor
shared_store: filesystem
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: loki
namespace: grafana-demo
spec:
replicas: 1
selector:
matchLabels: {app: loki}
template:
metadata:
labels: {app: loki}
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: loki
image: docker.io/grafana/loki:2.9.3
args: ["-config.file=/etc/loki/loki.yaml"]
ports: [{containerPort: 3100}]
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 300m
memory: 512Mi
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop: ["ALL"]
volumeMounts:
- name: config
mountPath: /etc/loki
- name: data
mountPath: /loki
volumes:
- name: config
configMap: {name: loki-config}
- name: data
persistentVolumeClaim: {claimName: loki-data}
---
apiVersion: v1
kind: Service
metadata:
name: loki
namespace: grafana-demo
spec:
selector: {app: loki}
ports:
- port: 3100
targetPort: 3100
YAML
Verify Loki is ready: Note here it takes a minute for it to become ready. Trust me I can be impatient.
oc exec -n grafana-demo deploy/loki -- wget -qO- http://localhost:3100/ready
# expected: ready
oc apply -f - <<'YAML'
apiVersion: v1
kind: ConfigMap
metadata:
name: promtail-sidecar-config
namespace: grafana-demo
data:
promtail.yaml: |
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: grafana
static_configs:
- targets: [localhost]
labels:
app: grafana
namespace: ${POD_NAMESPACE}
pod: ${POD_NAME}
pod_ip: ${POD_IP}
node: ${NODE_NAME}
__path__: /var/log/grafana/grafana.log
YAML
Critical lesson here: when I first deployed this I could not get grafana to autoscale. Every container in the pod must define CPU requests, and I had missed that part. Once both containers had CPU requests defined things started to work as expected.
This is the chunk of YAML that got it done.
resources:
requests:
cpu: 10m
memory: 32Mi
limits:
cpu: 50m
memory: 128Mi
oc apply -f - <<'YAML'
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: grafana-demo
spec:
replicas: 1
selector:
matchLabels: {app: grafana}
template:
metadata:
labels: {app: grafana}
spec:
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: grafana
image: quay.io/openshift/origin-grafana:latest
ports: [{containerPort: 3000}]
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin
- name: GF_LOG_MODE
value: file
- name: GF_LOG_LEVEL
value: info
- name: GF_LOG_FILE
value: /var/log/grafana/grafana.log
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 300m
memory: 512Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: grafana-data
mountPath: /var/lib/grafana
- name: grafana-logs
mountPath: /var/log/grafana
- name: promtail
image: docker.io/grafana/promtail:2.9.3
args:
- -config.expand-env=true
- -config.file=/etc/promtail/promtail.yaml
env:
- name: POD_NAME
valueFrom: {fieldRef: {fieldPath: metadata.name}}
- name: POD_NAMESPACE
valueFrom: {fieldRef: {fieldPath: metadata.namespace}}
- name: POD_IP
valueFrom: {fieldRef: {fieldPath: status.podIP}}
- name: NODE_NAME
valueFrom: {fieldRef: {fieldPath: spec.nodeName}}
resources:
requests:
cpu: 10m
memory: 32Mi
limits:
cpu: 50m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: grafana-logs
mountPath: /var/log/grafana
- name: promtail-config
mountPath: /etc/promtail
volumes:
- name: grafana-data
persistentVolumeClaim: {claimName: grafana-data}
- name: grafana-logs
emptyDir: {}
- name: promtail-config
configMap: {name: promtail-sidecar-config}
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: grafana-demo
spec:
selector: {app: grafana}
ports:
- port: 3000
targetPort: 3000
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: grafana
namespace: grafana-demo
spec:
to:
kind: Service
name: grafana
port:
targetPort: 3000
YAML
Verify Grafana pod is 2/2 Running:
oc get pods -n grafana-demo
Get URL:
oc get route grafana -n grafana-demo
Login: admin / admin
oc get route grafana
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
grafana grafana-grafana-demo.apps.okd.vv-int.io grafana 3000 None
Grafana → Data sources → Add → Loki
URL:
http://loki:3100
Save & Test.
Explore query:
{app="grafana"}
You should see logs immediately.
And you can prove routing/placement:
sum by (pod, node, pod_ip) (count_over_time({app="grafana"}[1m]))
I am intentionally aggressive here because I want to see it scale. Kubernetes scales up quickly and scales down slowly by design. I can tune stabilization windows and policies later. For now, the objective is visibility. I want to watch the replicas spin up.
The advantage of OpenShift is that the metrics stack is already wired in. There is no metrics server installation and no adapter plumbing. You are simply defining the HPA and letting the controller do its job.
oc autoscale deployment grafana -n grafana-demo \
--cpu-percent=20 \
--min=1 \
--max=6
oc patch hpa grafana -n grafana-demo -p '{
"spec": {
"behavior": {
"scaleUp": {
"stabilizationWindowSeconds": 0,
"policies": [{
"type": "Percent",
"value": 100,
"periodSeconds": 15
}]
}
}
}
}'
Confirm HPA is no longer unknown :
oc get hpa grafana -n grafana-demo
You should see cpu: X%/20% (a number, not unknown).
This is basically a CLI that hammers grafana with curl
oc apply -f - <<'YAML'
apiVersion: v1
kind: Pod
metadata:
name: grafana-load
namespace: grafana-demo
spec:
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: loadgen
image: quay.io/openshift/origin-cli:latest
command:
- sh
- -c
- |
while true; do
for i in $(seq 1 50); do
curl -s http://grafana:3000 >/dev/null &
done
wait
done
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
resources:
requests:
cpu: 200m
memory: 64Mi
limits:
cpu: 500m
memory: 128Mi
YAML
Watch scaling:
oc get hpa grafana -n grafana-demo -w
and:
oc get pods -n grafana-demo -w
You should see replicas climb: 1 → 2 → 4 → 6 (or similar).
In Grafana Dashboards/Import
For this step I am including an importable Grafana-Dashboard
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": "auto",
"displayMode": "auto"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "red", "value": 80 }
]
}
},
"overrides": [
{
"matcher": { "id": "byName", "options": "Time" },
"properties": [
{ "id": "custom.width", "value": 235 }
]
}
]
},
"gridPos": { "h": 18, "w": 22, "x": 0, "y": 0 },
"id": 2,
"options": {
"footer": {
"fields": "",
"reducer": ["sum"],
"show": false
},
"showHeader": true,
"sortBy": []
},
"targets": [
{
"datasource": "Loki",
"expr": "sum by (node, pod, pod_ip) (\n count_over_time({app=\"grafana\"}[15m])\n)",
"refId": "A"
}
],
"title": "Autoscale-Validation",
"transformations": [
{ "id": "labelsToFields", "options": {} },
{
"id": "reduce",
"options": {
"reducers": ["last"]
}
}
],
"type": "table"
}
],
"refresh": "",
"schemaVersion": 34,
"style": "dark",
"tags": [],
"templating": { "list": [] },
"time": { "from": "now-6h", "to": "now" },
"timepicker": {},
"timezone": "",
"title": "Autoscale-Validation",
"uid": "eAHCVDvDz",
"version": 1,
"weekStart": ""
}
Kubernetes scales up fast and scales down slow, I did not want to wait on my video above 😀
oc delete pod grafana-load -n grafana-demo
The one truth command:
oc describe hpa grafana -n grafana-demo
Thats about it again pretty cool stuff, not a new thing in Kubernetes land however not having to wire in all the things to make something like this go really is a selling point for OpenShift.
Thanks for reading, -Christian