On Premises
Learn how to deploy Pachyderm on-premises.
March 30, 2023
This page walks you through the fundamentals of what you need to know about Kubernetes, persistent volumes, and object stores to deploy Pachyderm on-premises.
- Read our infrastructure recommendations. You will find instructions on how to set up an ingress controller, a load balancer, or connect an Identity Provider for access control.
- If you are planning to install Pachyderm UI. Read our Console deployment instructions. Note that, unless your deployment is
LOCAL
(i.e., on a local machine for development only, for example, on Minikube or Docker Desktop), the deployment of Console requires, at a minimum, the set up of an Ingress. - Troubleshooting a deployment? Check out Troubleshooting Deployments.
We are now shipping Pachyderm with an optional embedded proxy allowing your cluster to expose one single port externally. This deployment setup is optional.
If you choose to deploy Pachyderm with a Proxy, check out our new recommended architecture and deployment instructions.
Introduction #
Deploying Pachyderm successfully on-premises requires a few prerequisites. Pachyderm is built on Kubernetes. Before you can deploy Pachyderm, you will need to perform the following actions:
- Deploy Kubernetes on-premises.
- Deploy two Kubernetes persistent volumes that Pachyderm will use to store its metadata.
- Deploy an on-premises object store using a storage provider like MinIO, EMC’s ECS, or SwiftStack to provide S3-compatible access to your data storage.
- Finally, Deploy Pachyderm using Helm by running the
helm install
command with the appropriate values configured in your values.yaml. We recommend reading these generic deployment steps if you are unfamiliar with Helm.
Prerequisites #
Before you start, you will need the following clients installed:
Setting Up To Deploy On-Premises #
Deploying Kubernetes #
The Kubernetes docs have instructions for deploying Kubernetes in a variety of on-premise scenarios. We recommend following one of these guides to get Kubernetes running.
Pachyderm recommends running your cluster on Kubernetes 1.19.0 and above.
Storage Classes #
Once you deploy Kubernetes, you will also need to configure storage classes to consume persistent volumes for etcd
and postgresql
.
The database and metadata service (Persistent disks) generally requires a small persistent volume size (i.e. 10GB) but high IOPS (1500), therefore, depending on your storage provider, you may need to oversize the volume significantly to ensure enough IOPS.
Once you have determined the name of the storage classes you are going to use and the sizes, you can add them to your helm values file, specifically:
etcd:
storageClass: MyStorageClass
size: 10Gi
postgresql:
persistence:
storageClass: MyStorageClass
size: 10Gi
Deploying An Object Store #
An object store is used by Pachyderm’s pachd
for storing all your data.
The object store you use must be accessible via a low-latency, high-bandwidth connection.
For an on-premises deployment, it is not advisable to use a cloud-based storage mechanism. Do not deploy an on-premises Pachyderm cluster against cloud-based object stores (such as S3, GCS, Azure Blob Storage).
You will, however, access your Object Store using the S3 protocol.
Storage providers like MinIO (the most common and officially supported option), EMC’s ECS, Ceph, or SwiftStack provide S3-compatible access to enterprise storage for on-premises deployment.
Sizing And Configuring The Object Store #
Start with a large multiple of your current data set size.
You will need four items to configure the object store. We are prefixing each item with how we will refer to it in the helm values file.
endpoint
: The access endpoint. For example, MinIO’s endpoints are usually something likeminio-server:9000
.Do not begin it with the protocol; it is an endpoint, not an url. Also, check if your object store (e.g. MinIO) is using SSL/TLS. If not, disable it using
secure: false
.bucket
: The bucket name you are dedicating to Pachyderm. Pachyderm will need exclusive access to this bucket.id
: The access key id for the object store.secret
: The secret key for the object store.
pachd:
storage:
backend: minio
minio:
bucket: ""
endpoint: ""
id: ""
secret: ""
secure: ""
Next Step: Proceed to your Helm installation #
Once you have Kubernetes deployed, your storage classes setup, and your object store configured, follow those steps to Helm install Pachyderm on your cluster.