Backups
This page will walk you through the main steps required to manually back up the state of a Pachyderm cluster in production. Details on how to perform those steps might vary depending on your infrastructure and setup. Refer to your provider’s documentation when applicable.
Before You Start #
- Make sure to retain a copy of the Helm values used to deploy your cluster
- Suspend any state-mutating operations
- Make sure that you have a bucket for backup use, separate from the object store used by your cluster
Downtime Considerations #
- Backups incur downtime until operations are resumed
- Operational best practices include notifying Pachyderm users of the outage and providing an estimated time when downtime will cease
- Downtime duration is dependent on the size of the data to be backed up and the networks involved
- Testing before going into production and monitoring backup times on an ongoing basis might help make accurate predictions
How to Create a Backup #
Pachyderm state is stored in two main places:
- An object-store holding Pachyderm’s data.
- A PostgreSQL instance made up of one or two databases:
pachyderm
holding Pachyderm’s metadatadex
holding authentication data
Backing up a Pachyderm cluster involves snapshotting both the object store and the PostgreSQL database(s), in a consistent state, at a given point in time. Restoring a cluster involves re-populating the database(s) and the object store using those backups, then recreating a Pachyderm cluster.
- Review any cloud-specific backup and restore procedures for your PostgresSQL instance.
- Retain a copy of the Helm values file used to deploy your cluster.
helm get values <release-name> > /path/to/values.yaml
- Pause or queue/divert any external automated process ingressing data to Pachyderm input repos.
- Suspend all mutation of state by scaling
pachd
and the worker pods down.pachctl enterprise pause
- Ensure you are using the right context.
kubectl config get-contexts kubectl config use-context <context-name>
- Scale down the
pachd
deployment and the worker pods.kubectl scale deployment pachd --replicas 0 kubectl scale rc --replicas 0 -l suite=pachyderm,component=worker
- Monitor the state of
pachd
and the worker pods.watch -n 5 kubectl get pods
- Ensure you are using the right context.
- Dump your PostgresSQL state using
pg_dumpall
orpg_dump
depending on whether the database is solely used by Pachyderm or shared with other applications.pg_dumpall -U postgres > /path/to/backup.sql
pg_dump -U postgres -d pachyderm > /path/to/backup.sql pg_dump -U postgres -d dex > /path/to/backup.sql
- Backup your object store. Refer to your cloud provider’s documentation for details.
How to Resume Operations #
Once your backup is completed, resume your normal operations by scaling pachd
back up. It will take care of restoring the worker pods:
- Enterprise:
pachctl enterprise unpause
. - CE:
kubectl scale deployment pachd --replicas 1