OpenShift is a popular enterprise Kubernetes distribution. Pachyderm can run on OpenShift with a few small tweaks in the deployment process, which will be outlined below.

Deploy Pachyderm

  1. How you deploy Pachyderm on OpenShift is largely going to depend on where OpenShift is deployed.
  • OpenShift Deployed on AWS
  • OpenShift Deployed on GCP
  • OpenShift Deployed on Azure
  • OpenShift Deployed on-premise
  1. Replace hostPath with emptyDir in your cluster manifest (Your manifest is generated by the pachctl deploy ... command or can be generated manually. To only generate the manifest, run pachctl deploy ... with the --dry-run flag).

          "spec": {
            "volumes": [
                "name": "pach-disk",
                "emptyDir": {}
     ... <snip>  ...
          "spec": {
            "volumes": [
                "name": "etcd-storage",
                "emptyDir": {}

    Please note that emptyDir does not persist your data. You need to configure persistent volume or hostPath to persist your data.

  1. Deploy Pachyderm manifest you modified.

    $ oc create -f pachyderm.json

    You can see the cluster status by using oc get pods as in upstream Kubernetes:

    $ oc get pods
    NAME                     READY     STATUS    RESTARTS   AGE
    dash-6c9dc97d9c-89dv9    2/2       Running   0          1m
    etcd-0                   1/1       Running   0          4m
    pachd-65fd68d6d4-8vjq7   1/1       Running   0          4m

Configure your cluster to run pipelines

  1. Add cluster-reader and edit role to pachyderm service account:

    $ oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:<PROJECT_NAME>:pachyderm
    $ oadm policy add-cluster-role-to-user edit system:serviceaccount:<PROJECT_NAME>:pachyderm
  2. Add the pachyderm service account to the pipeline Pod (ReplicationController).

    oc patch rc pipeline-edges-v1 -p 'spec:
          serviceAccount: pachyderm
          serviceAccountName: pachyderm'

    or manually edit rc oc edit rc <RC_PIPELINE> -o json:

                "dnsPolicy": "ClusterFirst",
                "serviceAccountName": "pachyderm",
                "serviceAccount": "pachyderm",
                "securityContext": {}
  3. Replace hostPath with emptyDir. Again, please note that emptyDir does not persist your data. You need to configure persistent volume or hostPath to persist.

  4. Redeploy the updated Pods.

    $ oc scale rc pipeline-edges-v1 --replicas=0
    $ oc scale rc pipeline-edges-v1 --replicas=4

    You can see the pipeline pods are running and successful job.

    $ oc get pods
    NAME                      READY     STATUS    RESTARTS   AGE
    etcd-kbi4n                1/1       Running   0          1h
    pachd-z3b7y               1/1       Running   0          1h
    pipeline-edges-v1-28vdj   1/1       Running   0          12s
    pipeline-edges-v1-fpa8v   1/1       Running   0          12s
    pipeline-edges-v1-mshi0   1/1       Running   0          12s
    pipeline-edges-v1-yx2wa   1/1       Running   0          12s
    $ pachctl list-job
    ID                                   OUTPUT COMMIT                          STARTED        DURATION   RESTART PROGRESS STATE
    1b2c1b49-f536-484f-b0e3-07b3906572be edges/006f0aecb2b048d5b5edee0cdb766879 55 minutes ago 51 minutes 0       1 / 1    success

Problems related to OpenShift deployment are tracked in this issue. If you have additional related questions, please ask them on Pachyderm’s slack channel or via email