Run Commands

Helm

series

Deploy Target HCVs

About #

The Deploy Target section defines where you’re deploying Pachyderm; this is typically located at the top of your values.yaml file.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:
Global HCVs

About #

The Global section configures the connection to the PostgreSQL database. By default, it uses the included Postgres service.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:
Console HCVs

About #

Console is the Graphical User Interface (GUI) for Pachyderm. Users that would prefer to navigate and manage through their project resources visually can connect to Console by authenticating against your configured OIDC. For personal-machine installations of Pachyderm, a user may access Console without authentication via localhost.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:
Enterprise Server HCVs

About #

Enterprise Server is a production management layer that centralizes the licensing registration of multiple Pachyderm clusters for Enterprise use and the setup of user authorization/authentication via OIDC.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:
ETCD HCVs

About #

The ETCD section configures the ETCD cluster in the deployment.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

etcd:
  affinity: {}
  annotations: {}
  dynamicNodes: 1 # sets the number of nodes in the etcd StatefulSet;  analogous to the --dynamic-etcd-nodes argument to pachctl
  image:
    repository: "pachyderm/etcd"
    tag: "v3.5.1"
    pullPolicy: "IfNotPresent"
  maxTxnOps: 10000 # sets the --max-txn-ops in the container args
  priorityClassName: ""
  nodeSelector: {}
  podLabels: {} # specifies labels to add to the etcd pod.
  
  resources: # specifies the resource request and limits
    {}

    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"

  storageClass: "" #  defines what existing storage class to use; analogous to --etcd-storage-class argument to pachctl 
  storageSize: 10Gi # specifies the size of the volume to use for etcd.
  service:
    annotations: {} # specifies annotations to add to the etcd service.
    labels: {} # specifies labels to add to the etcd service.
    type: ClusterIP # specifies the Kubernetes type of the etcd service.
  tolerations: []
Ingress HCVs

About #

⚠️

ingress will be removed from the helm chart once the deployment of Pachyderm with a proxy becomes mandatory.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:
Loki HCVs

About #

Loki Stack contains values that are passed to the loki-stack subchart. For more details on each service, see their official documentation:

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

loki-stack:
  loki:
    serviceAccount:
      automountServiceAccountToken: false
    persistence:
      enabled: true
      accessModes:
        - ReadWriteOnce
      size: 10Gi
      # More info for setting up storage classes on various cloud providers:
      # AWS: https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html
      # GCP: https://cloud.google.com/compute/docs/disks/performance#disk_types
      # Azure: https://docs.microsoft.com/en-us/azure/aks/concepts-storage#storage-classes
      storageClassName: ""
      annotations: {}
      priorityClassName: ""
      nodeSelector: {}
      tolerations: []
    config:
      limits_config:
        retention_period: 24h
        retention_stream:
          - selector: '{suite="pachyderm"}'
            priority: 1
            period: 168h # = 1 week
  grafana:
    enabled: false
  promtail:
    config:
      clients:
        - url: "http://{{ .Release.Name }}-loki:3100/loki/api/v1/push"
      snippets:
        # The scrapeConfigs section is copied from loki-stack-2.6.4
        # The pipeline_stages.match stanza has been added to prevent multiple lokis in a cluster from mixing their logs.
        scrapeConfigs: |
          - job_name: kubernetes-pods
            pipeline_stages:
              {{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}
              - match:
                  selector: '{namespace!="{{ .Release.Namespace }}"}'
                  action: drop
            kubernetes_sd_configs:
              - role: pod
            relabel_configs:
              - source_labels:
                  - __meta_kubernetes_pod_controller_name
                regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
                action: replace
                target_label: __tmp_controller_name
              - source_labels:
                  - __meta_kubernetes_pod_label_app_kubernetes_io_name
                  - __meta_kubernetes_pod_label_app
                  - __tmp_controller_name
                  - __meta_kubernetes_pod_name
                regex: ^;*([^;]+)(;.*)?$
                action: replace
                target_label: app
              - source_labels:
                  - __meta_kubernetes_pod_label_app_kubernetes_io_instance
                  - __meta_kubernetes_pod_label_release
                regex: ^;*([^;]+)(;.*)?$
                action: replace
                target_label: instance
              - source_labels:
                  - __meta_kubernetes_pod_label_app_kubernetes_io_component
                  - __meta_kubernetes_pod_label_component
                regex: ^;*([^;]+)(;.*)?$
                action: replace
                target_label: component
              {{- if .Values.config.snippets.addScrapeJobLabel }}
              - replacement: kubernetes-pods
                target_label: scrape_job
              {{- end }}
              {{- toYaml .Values.config.snippets.common | nindent 4 }}
              {{- with .Values.config.snippets.extraRelabelConfigs }}
              {{- toYaml . | nindent 4 }}
              {{- end }}
        pipelineStages:
          - cri: {}
        common:
          # This is copy and paste of existing actions, so we don't lose them.
          # Cf. https://github.com/grafana/loki/issues/3519#issuecomment-1125998705
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_node_name
            target_label: node_name
          - action: replace
            source_labels:
              - __meta_kubernetes_namespace
            target_label: namespace
          - action: replace
            replacement: $1
            separator: /
            source_labels:
              - namespace
              - app
            target_label: job
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_name
            target_label: pod
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_container_name
            target_label: container
          - action: replace
            replacement: /var/log/pods/*$1/*.log
            separator: /
            source_labels:
              - __meta_kubernetes_pod_uid
              - __meta_kubernetes_pod_container_name
            target_label: __path__
          - action: replace
            regex: true/(.*)
            replacement: /var/log/pods/*$1/*.log
            separator: /
            source_labels:
              - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
              - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
              - __meta_kubernetes_pod_container_name
            target_label: __path__
          - action: keep
            regex: pachyderm
            source_labels:
              - __meta_kubernetes_pod_label_suite
          # this gets all kubernetes labels as well
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
    livenessProbe:
      failureThreshold: 5
      tcpSocket:
        port: http-metrics
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
PachD HCVs

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:
PachW HCVs

About #

PachW enables fine-grained control of where compaction and object-storage interaction occur by running storage tasks in a dedicated Kubernetes deployment. Users can configure PachW’s min and max replicas as well as define nodeSelectors, tolerations, and resource requests. Using PachW allows power users to save on costs by claiming fewer resources and running storage tasks on less expensive nodes.

⚠️

If you are upgrading to 2.5.0+ for the first time and you wish to use PachW, you must calculate how many maxReplicas you need. By default, PachW is set to maxReplicas:1 — however, that is not sufficient for production runs.

maxReplicas #

You should set the maxReplicas value to at least match the number of pipeline replicas that you have. For high performance, we suggest taking the following approach:

number of pipelines * highest parallelism spec * 1.5 = maxReplicas

Let’s say you have 6 pipelines. One of these pipelines has a parallelism spec value of 6, and the rest are 5 or fewer.

6 * 6 * 1.5 = 54

minReplicas #

Workloads that constantly process storage and compaction tasks because they are committing rapidly may want to increase minReplicas to have instances on standby.

nodeSelectors #

Workloads that utilize GPUs and other expensive resources may want to add a node selector to scope PachW instances to less expensive nodes.

Values #

Options:
Kube Event Tail HCVs

About #

Kube Event Tail deploys a lightweight app that watches Kubernetes events and echoes them into logs.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:
PGBouncer HCVs

About #

The PGBouncer section configures a PGBouncer Postgres connection pooler.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.


pgbouncer:
  service:
    type: ClusterIP # defines the Kubernetes service type.
  annotations: {}
  priorityClassName: ""
  nodeSelector: {}
  tolerations: []
  image:
    repository: pachyderm/pgbouncer
    tag: 1.16.1-debian-10-r82
  resources: # defines resources in standard kubernetes format; unset by default.
    {}

    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"

  maxConnections: 1000 # defines the maximum number of concurrent connections into pgbouncer.
  defaultPoolSize: 20 # specifies the maximum number of concurrent connections from pgbouncer to the postgresql database.
PostgreSQL Subchart HCVs

About #

The PostgresQL section controls the Bitnami PostgreSQL subchart. Pachyderm runs on Kubernetes, is backed by an object store of your choice, and comes with a bundled version of PostgreSQL (metadata storage) by default.

We recommended disabling this bundled PostgreSQL and using a managed database instance (such as RDS, CloudSQL, or PostgreSQL Server) for production environments.

See storage class details for your provider:

  • AWS | Min: 500Gi (GP2) / 1,500 IOP
  • GCP | Min: 50Gi / 1,500 IOPS
  • Azure | Min: 256Gi / 1,100 IOPS

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:
CloudSQL Auth Proxy HCVs

About #

The CloudSQL Auth Proxy section configures the CloudSQL Auth Proxy for deploying Pachyderm on GCP with CloudSQL.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:
OpenID Connect HCVs

About #

The OIDC section of the helm chart enables you to set up authentication through upstream IDPs. To use authentication, you must have an Enterprise license.

We recommend setting up this section alongside the Enterprise Server section of your Helm chart so that you can easily scale multiple clusters using the same authentication configurations.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:
Test Connection HCVs

About #

The Test Connection section is used by Pachyderm to test the connection during installation. This config is used by organizations that do not have permission to pull Docker images directly from the Internet, and instead need to mirror locally.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

testConnection:
  image:
    repository: alpine
    tag: latest
Proxy HCVs

About #

Proxy is a service that handles all Pachyderm traffic (S3, Console, OIDC, Dex, GRPC) on a single port; It’s great for exposing directly to the Internet.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:

Helm Series

Deploy Target HCVs
Global HCVs
Console HCVs
Enterprise Server HCVs
ETCD HCVs
Ingress HCVs
Loki HCVs
PachD HCVs
PachW HCVs
Kube Event Tail HCVs
PGBouncer HCVs
PostgreSQL Subchart HCVs
CloudSQL Auth Proxy HCVs
OpenID Connect HCVs
Test Connection HCVs
Proxy HCVs