Storage and State

Misconfigured storage is the only Kubernetes failure mode that can cause irreversible data loss. Unlike compute issues (which resolve by restarting pods) or network issues (which resolve by fixing policies), a deleted PersistentVolume with reclaimPolicy: Delete destroys the underlying disk permanently. Every storage decision must account for data durability.

The PV/PVC Model

Kubernetes abstracts storage through three resources:

PersistentVolume (PV): Represents a piece of provisioned storage -- a cloud disk, an NFS share, or a local SSD. PVs are cluster-scoped, not namespaced.
PersistentVolumeClaim (PVC): A namespaced request for storage. Specifies size, access mode, and StorageClass. The control plane binds the PVC to a PV that satisfies its requirements.
StorageClass: Defines how PVs are dynamically provisioned. Specifies the CSI driver, parameters (disk type, encryption), reclaim policy, and binding mode.

Dynamic provisioning is the default workflow: a PVC references a StorageClass, the CSI driver provisions a volume, and the control plane creates a PV and binds it to the PVC automatically.

StorageClass: Critical Fields

Two StorageClass defaults are dangerous for production data:

reclaimPolicy: Delete (the default) destroys the underlying volume when the PVC is deleted. A single kubectl delete pvc command permanently deletes the data. Production StorageClasses must use Retain, which preserves the volume for manual recovery.

volumeBindingMode: Immediate (the default) provisions the volume before a pod is scheduled. This can place the volume in a different availability zone than the pod, causing the pod to stay Pending indefinitely. WaitForFirstConsumer provisions the volume in the same zone as the pod.

Always set allowVolumeExpansion: true so PVCs can be resized without recreation. PVCs can be expanded but never shrunk.

Access Modes

Mode	Abbreviation	Meaning	Supported by
`ReadWriteOnce`	RWO	One node mounts read-write	All block storage (EBS, PD, Azure Disk)
`ReadOnlyMany`	ROX	Many nodes mount read-only	NFS, CephFS, cloud file storage
`ReadWriteMany`	RWX	Many nodes mount read-write	NFS, CephFS, EFS, Azure Files
`ReadWriteOncePod`	RWOP	Exactly one pod mounts read-write	CSI drivers supporting RWOP (1.29+ GA)

The most common mistake: requesting ReadWriteMany with a block storage provisioner. Block storage is physically attached to a single node and cannot support RWX. The PVC stays in Pending state with no clear error message. Use a file storage solution (EFS, Filestore, Azure Files) for shared access.

For databases, prefer ReadWriteOncePod over ReadWriteOnce. RWO allows multiple pods on the same node to mount the volume, which can cause data corruption. RWOP restricts access to exactly one pod.

Dynamic Provisioning and CSI Drivers

Each cloud provider and storage platform has a CSI driver:

Environment	Block storage CSI	File storage CSI
AWS EKS	`ebs.csi.aws.com`	`efs.csi.aws.com`
GKE	`pd.csi.storage.gke.io`	`filestore.csi.storage.gke.io`
Azure AKS	`disk.csi.azure.com`	`file.csi.azure.com`
Bare metal	Longhorn, Rook-Ceph, OpenEBS	Rook-CephFS, NFS provisioner

All major CSI drivers support snapshots, volume expansion, and encryption. Always enable encryption (parameters.encrypted: "true") for production StorageClasses.

VolumeSnapshot for Backup and Restore

VolumeSnapshots provide point-in-time copies of PVCs. They are the primary mechanism for data protection before destructive operations:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: db-snapshot-2025-04-12
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: data-postgres-0

To restore, create a new PVC with dataSource referencing the snapshot. The CSI driver provisions a new volume from the snapshot data.

Critical rules for snapshots:

Always snapshot before PVC deletion, StorageClass migration, or major upgrades.
Snapshots may be crash-consistent, not application-consistent. For databases, run logical backups (pg_dump, mysqldump) alongside snapshots.
Test restore procedures regularly. A backup never restored is not a backup.

Ephemeral Storage: emptyDir

emptyDir volumes are tied to the pod lifecycle -- deleted when the pod is removed. Use them for scratch space, caches, and temporary files required by readOnlyRootFilesystem: true.

Always set sizeLimit on emptyDir volumes. Without it, a runaway process can fill the node's disk and trigger eviction of every pod on that node. For in-memory emptyDirs (medium: Memory), the size counts against the container's memory limit.

StatefulSet volumeClaimTemplates

StatefulSets create one PVC per replica automatically. PVCs created by volumeClaimTemplates are intentionally not deleted when the StatefulSet is deleted or scaled down -- this protects data. To reclaim storage, delete the PVCs manually after verifying the data is no longer needed.

The persistentVolumeClaimRetentionPolicy field (1.27+) can configure automatic PVC deletion on scale-down or StatefulSet deletion, but use it with extreme caution in production.

fsGroup and Permissions

When running containers as non-root with readOnlyRootFilesystem: true, mounted PVCs may not be writable because the volume's filesystem ownership does not match the container's user. Set fsGroup in the pod security context to ensure the mounted volume is writable by the pod's group:

securityContext:
  runAsUser: 10000
  runAsGroup: 10000
  fsGroup: 10000

Without fsGroup, the pod mounts the volume but cannot write to it, causing application errors that appear to be permission issues inside the container.

Storage and State

Storage and State

The PV/PVC Model

StorageClass: Critical Fields

Access Modes

Dynamic Provisioning and CSI Drivers

VolumeSnapshot for Backup and Restore

Ephemeral Storage: emptyDir

StatefulSet volumeClaimTemplates

fsGroup and Permissions

Further Reading

results matching ""

No results matching ""