Pachyderm Deployment Manifest

This section provides an overview of the Kubernetes manifest that you use to deploy your Pachyderm cluster. This section is provided for your reference and does not include configuration steps. If you are familiar with Kubernetes or do not have immediate questions about the configuration parameters, you can skip this section and proceed to Configuring Persistent Disk Parameters.

When you run the pachctl deploy command, Pachyderm generates a JSON-encoded Kubernetes manifest which consists of sections that describe a Pachyderm deployment.

Pachyderm deploys the following sets of application components:

  • pachd: The main Pachyderm pod.
  • etcd: The administrative datastore for pachd.
  • dash: The web-based UI for Pachyderm Enterprise Edition.

Example:

pachctl deploy custom --persistent-disk <persistent disk backend> --object-store <object store backend> \
    <persistent disk arg1>  <persistent disk arg 2> \
    <object store arg 1> <object store arg 2>  <object store arg 3>  <object store arg 4> \
    [[--dynamic-etcd-nodes n] | [--static-etcd-volume <volume name>]]
    [optional flags]

As you can see in the example command above, you can run the pachctl deploy custom command with different flags that generate an appropriate manifest for your infrastructure. The flags broadly fall into the following categories:

Category Description
--persistent-disk Configures the storage resource for etcd. Pachyderm uses etcd
to manage administrative metadata. User data is stored in an
object store, not in etcd.
--object-store Configures the object store that Pachyderm uses for storing all
user data that you want to be versioned and managed.
Optional flags Optional flags that are not required to deploy Pachyderm but
enable you to configure access, output format, logging verbosity,
and other parameters.

Kubernetes Manifest Parameters

Your Kubernetes manifest includes sections that describe the configuration of your Pachyderm cluster.

The manifest includes the following sections:

Roles and permissions manifests

Manifest Description
ServiceAccount Typically at the top of the manifest file Pachyderm produces,
a roles and permissions manifest has the kind key set to ServiceAccount.
Kubernetes uses ServiceAccounts to assign namespace-specific privileges
to applications in a lightweight way. Pachyderm's
service account is called pachyderm.
Role or ClusterRole Depending on whether you used the --local-roles flag or not,
the next manifest kind is Role or ClusterRole.
RoleBinding or
ClusterRoleBinding
This manifest binds the Role or ClusterRole to the
ServiceAccount created above.

Application-related manifests

Manifest Description
PersistentVolume If you used --static-etcd-volume to deploy Pachyderm, the
value that you specify for --persistent-disk causes pachctl to write
a manifest for creating a PersistentVolume that Pachyderm’s etcd uses in its
PersistentVolumeClaim. A common persistent volume that is used in
enterprises is an NFS mount backed by a storage fabric. In this case, a
StorageClass for an NFS mount is made available for consumption. Consult
with your Kubernetes administrators to learn what resources are available
for your deployment.
PersistentVolumeClaim If you deployed Pachyderm by using --static-etc-volume,
the Pachyderm's etcd store uses this PersistentVolumeClaim.
See this manifest's name in the Deployment manifest
for the etcd pod, described below in Pachyderm pods manifests.
StorageClass If you used the --dynamic-etcd-nodes flag to deploy Pachyderm,
this manifest specifies the kind of storage and provisioner that
is appropriate for what you have specified in the --persistent-disk flag.
Note: You will not see this manifest if you specified azure as the argument
to --persistent-disk, since Azure has their own provisioner.
Service In a typical Pachyderm deployment, you see three Service manifests.
A Service is a Kubernetes abstraction that exposes Pods to the network.
If you use StatefulSets to deploy Pachyderm, that is, you used the
--dynamic-etcd-nodes flag, Pachyderm deploys one Service for etcd-headless,
one for pachd, and one for dash. A static deployment has Services
for etcd, pachd, and dash. If you use the --no-dashboard flag,
Pachyderm does not create a Service and Deployment for the dashboard.
Similarly, if --dashboard-only is specified, Pachyderm generates
the manifests for the Pachyderm enterprise UI only. The most common items
that you can edit in Service manifests are the NodePort values for
various services, and the containerPort values for Deployment manifests.
To make your containerPort values work properly, you will need to
add environment variables to a Deployment or StatefulSet object.
You can see how what environment variables to add in the OpenShift example.

Pachyderm pods manifests

Manifest Description
Deployment Declares the desired state of application pods to Kubernetes. If you
configure a static deployment, Pachyderm deploys Deployment manifests for
etcd, pachd, and dash. If you specify --dynamic-etcd-nodes,
Pachyderm deploys the pachd and dash as Deployment and etcd as a
StatefulSet. If you run the deploy command with the --no-dashboard
flag, Pachyderm omits the deployment of the dash Service and Deployment.
StatefulSet For a --dynamic-etcd-nodes deployment, Pachyderm replaces the etcd Deployment
manifest with a StatefulSet.

Pachyderm Kubernetes secrets manifests

Manifest Description
Secret Pachyderm uses the Kubernetes Secret manifest to store the credentials that
are necessary to access object storage. The final manifest uses the
command-line arguments that you submit to the pachctl deploy
command to store such parameters as region, secret, token, and endpoint, that are
used to access an object store. The exact values in the secret
depend on the kind of object store you configure for your deployment. You
can update the values after the deployment either by using kubectl
to deploy a new Secret or the pachctl deploy storage command.