Helm Chart Values
Learn about the configurable helm chart attributes for Pachyderm.
March 24, 2023
This document discusses each of the fields present in the values.yaml
that can be used to deploy with Helm.
To see how to use a helm values files to customize your deployment, refer to our Helm Deployment Documentation section.
You rarely need to specify all the fields. Most fields either come with sensible defaults or can be empty.
Values that are unchanged from the defaults can be omitted from the values file you supply at installation.
Take a look at our deployment instructions locally or in the cloud to identify which of those are required for your deployment target.
Values.yaml #
The following section displays the complete list of fields available in the values.yaml. Each section is further detailed in its own sub-chapter.
# SPDX-FileCopyrightText: Pachyderm, Inc. <info@pachyderm.com>
# SPDX-License-Identifier: Apache-2.0
# Deploy Target configures the storage backend to use and cloud provider
# settings (storage classes, etc). It must be one of GOOGLE, AMAZON,
# MINIO, MICROSOFT, CUSTOM or LOCAL.
deployTarget: ""
global:
postgresql:
# The auth type to use with postgres and pg-bouncer; md5 is the default.
# Options: "scram-sha-256"
postgresqlAuthType: "md5"
# postgresqlUsername is the username to access the pachyderm and dex databases
postgresqlUsername: "pachyderm"
# postgresqlPassword to access the postgresql database. We set a default well-known password to
# facilitate easy upgrades when testing locally. Any sort of install that needs to be secure
# must specify a secure password here, or provide the postgresqlExistingSecretName and
# postgresqlExistingSecretKey secret. If using an external Postgres instance (CloudSQL / RDS /
# etc.), this is the password that Pachyderm will use to connect to it.
postgresqlPassword: "insecure-user-password"
# When installing a local Postgres instance, postgresqlPostgresPassword defines the root
# ('postgres') user's password. It must remain consistent between upgrades, and must be
# explicitly set to a value if security is desired. Pachyderm does not use this account; this
# password is only required so that administrators can manually perform administrative tasks.
postgresqlPostgresPassword: "insecure-root-password"
# If you want to supply the postgresql password in an existing secret, leave Password blank and
# Supply the name of the existing secret in the namespace and the key in that secret with the password
postgresqlExistingSecretName: ""
postgresqlExistingSecretKey: ""
# postgresqlDatabase is the database name where pachyderm data will be stored
postgresqlDatabase: "pachyderm"
# The postgresql database host to connect to. Defaults to postgres service in subchart
postgresqlHost: "postgres"
# The postgresql database port to connect to. Defaults to postgres server in subchart
postgresqlPort: "5432"
# postgresqlSSL is the SSL mode to use for pg-bouncer connecting to Postgres, for the default local postgres it is disabled
postgresqlSSL: "disable"
# CA Certificate required to connect to Postgres
postgresqlSSLCACert: ""
# TLS Secret with cert/key to connect to Postgres
postgresqlSSLSecret: ""
# Indicates the DB name that dex connects to
# Indicates the DB name that dex connects to. Defaults to "Dex" if not set.
identityDatabaseFullNameOverride: ""
# imagePullSecrets allow you to pull images from private repositories, these will also be added to pipeline workers
# https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
# Example:
# imagePullSecrets:
# - regcred
imagePullSecrets: []
# when set, the certificate file in pachd-tls-cert will be loaded as the root certificate for pachd, console, and enterprise-server pods
customCaCerts: false
# Sets the HTTP/S proxy server address for console, pachd, and enterprise server. (This is for
# traffic leaving the cluster, not traffic coming into the cluster.)
proxy: ""
# If proxy is set, this allows you to set a comma-separated list of destinations that bypass the proxy
noProxy: ""
# Set security context runAs users. If running on openshift, set enabled to false as openshift creates its own contexts.
securityContexts:
enabled: true
console:
# enabled controls whether the console manifests are created or not.
enabled: true
annotations: {}
image:
# repository is the image repo to pull from; together with tag it
# replicates the --console-image & --registry arguments to pachctl
# deploy.
repository: "pachyderm/haberdashery"
pullPolicy: "IfNotPresent"
# tag is the image repo to pull from; together with repository it
# replicates the --console-image argument to pachctl deploy.
tag: "2.3.3-1"
priorityClassName: ""
nodeSelector: {}
tolerations: []
# podLabels specifies labels to add to the console pod.
podLabels: {}
# resources specifies the resource request and limits.
resources:
{}
#limits:
# cpu: "1"
# memory: "2G"
#requests:
# cpu: "1"
# memory: "2G"
config:
reactAppRuntimeIssuerURI: "" # Inferred if running locally or using ingress
oauthRedirectURI: "" # Infered if running locally or using ingress
oauthClientID: "console"
oauthClientSecret: "" # Autogenerated on install if blank
# oauthClientSecretSecretName is used to set the OAuth Client Secret via an existing k8s secret.
# The value is pulled from the key, "OAUTH_CLIENT_SECRET".
oauthClientSecretSecretName: ""
graphqlPort: 4000
pachdAddress: "pachd-peer:30653"
disableTelemetry: false # Disables analytics and error data collection
service:
annotations: {}
# labels specifies labels to add to the console service.
labels: {}
# type specifies the Kubernetes type of the console service.
type: ClusterIP
etcd:
affinity: {}
annotations: {}
# dynamicNodes sets the number of nodes in the etcd StatefulSet. It
# is analogous to the --dynamic-etcd-nodes argument to pachctl
# deploy.
dynamicNodes: 1
image:
repository: "pachyderm/etcd"
tag: "v3.5.1"
pullPolicy: "IfNotPresent"
# maxTxnOps sets the --max-txn-ops in the container args
maxTxnOps: 10000
priorityClassName: ""
nodeSelector: {}
# podLabels specifies labels to add to the etcd pod.
podLabels: {}
# resources specifies the resource request and limits
resources:
{}
#limits:
# cpu: "1"
# memory: "2G"
#requests:
# cpu: "1"
# memory: "2G"
# storageClass indicates the etcd should use an existing
# StorageClass for its storage. It is analogous to the
# --etcd-storage-class argument to pachctl deploy.
# More info for setting up storage classes on various cloud providers:
# AWS: https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html
# GCP: https://cloud.google.com/compute/docs/disks/performance#disk_types
# Azure: https://docs.microsoft.com/en-us/azure/aks/concepts-storage#storage-classes
storageClass: ""
# storageSize specifies the size of the volume to use for etcd.
# Recommended Minimum Disk size for Microsoft/Azure: 256Gi - 1,100 IOPS https://azure.microsoft.com/en-us/pricing/details/managed-disks/
# Recommended Minimum Disk size for Google/GCP: 50Gi - 1,500 IOPS https://cloud.google.com/compute/docs/disks/performance
# Recommended Minimum Disk size for Amazon/AWS: 500Gi (GP2) - 1,500 IOPS https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html
storageSize: 10Gi
service:
# annotations specifies annotations to add to the etcd service.
annotations: {}
# labels specifies labels to add to the etcd service.
labels: {}
# type specifies the Kubernetes type of the etcd service.
type: ClusterIP
tolerations: []
enterpriseServer:
enabled: false
affinity: {}
annotations: {}
tolerations: []
priorityClassName: ""
nodeSelector: {}
service:
type: ClusterIP
apiGRPCPort: 31650
prometheusPort: 31656
oidcPort: 31657
identityPort: 31658
s3GatewayPort: 31600
# There are three options for TLS:
# 1. Disabled
# 2. Enabled, existingSecret, specify secret name
# 3. Enabled, newSecret, must specify cert, key and name
tls:
enabled: false
secretName: ""
newSecret:
create: false
crt: ""
key: ""
resources:
{}
#limits:
# cpu: "1"
# memory: "2G"
#requests:
# cpu: "1"
# memory: "2G"
# podLabels specifies labels to add to the pachd pod.
podLabels: {}
clusterDeploymentID: ""
image:
repository: "pachyderm/pachd"
pullPolicy: "IfNotPresent"
# tag defaults to the chart’s specified appVersion.
tag: ""
ingress:
enabled: false
annotations: {}
host: ""
# when set to true, uriHttpsProtoOverride will add the https protocol to the ingress URI routes without configuring certs
uriHttpsProtoOverride: false
# There are three options for TLS:
# 1. Disabled
# 2. Enabled, existingSecret, specify secret name
# 3. Enabled, newSecret, must specify cert, key, secretName and set newSecret.create to true
tls:
enabled: false
secretName: ""
newSecret:
create: false
crt: ""
key: ""
# loki-stack contains values that will be passed to the loki-stack subchart
loki-stack:
loki:
serviceAccount:
automountServiceAccountToken: false
persistence:
enabled: true
accessModes:
- ReadWriteOnce
size: 10Gi
# More info for setting up storage classes on various cloud providers:
# AWS: https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html
# GCP: https://cloud.google.com/compute/docs/disks/performance#disk_types
# Azure: https://docs.microsoft.com/en-us/azure/aks/concepts-storage#storage-classes
storageClassName: ""
annotations: {}
priorityClassName: ""
nodeSelector: {}
tolerations: []
config:
limits_config:
retention_period: 24h
retention_stream:
- selector: '{suite="pachyderm"}'
priority: 1
period: 168h # = 1 week
grafana:
enabled: false
promtail:
config:
clients:
- url: "http://{{ .Release.Name }}-loki:3100/loki/api/v1/push"
snippets:
# The scrapeConfigs section is copied from loki-stack-2.6.4
# The pipeline_stages.match stanza has been added to prevent multiple lokis in a cluster from mixing their logs.
scrapeConfigs: |
- job_name: kubernetes-pods
pipeline_stages:
{{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}
- match:
selector: '{namespace!="{{ .Release.Namespace }}"}'
action: drop
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_controller_name
regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
action: replace
target_label: __tmp_controller_name
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
- __meta_kubernetes_pod_label_app
- __tmp_controller_name
- __meta_kubernetes_pod_name
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: app
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_instance
- __meta_kubernetes_pod_label_release
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: instance
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_component
- __meta_kubernetes_pod_label_component
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: component
{{- if .Values.config.snippets.addScrapeJobLabel }}
- replacement: kubernetes-pods
target_label: scrape_job
{{- end }}
{{- toYaml .Values.config.snippets.common | nindent 4 }}
{{- with .Values.config.snippets.extraRelabelConfigs }}
{{- toYaml . | nindent 4 }}
{{- end }}
pipelineStages:
- cri: {}
common:
# This is copy and paste of existing actions, so we don't lose them.
# Cf. https://github.com/grafana/loki/issues/3519#issuecomment-1125998705
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: node_name
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
replacement: $1
separator: /
source_labels:
- namespace
- app
target_label: job
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
regex: true/(.*)
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
- __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: keep
regex: pachyderm
source_labels:
- __meta_kubernetes_pod_label_suite
# this gets all kubernetes labels as well
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
livenessProbe:
failureThreshold: 5
tcpSocket:
port: http-metrics
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
pachd:
enabled: true
preflightChecks:
# if enabled runs kube validation preflight checks.
enabled: true
affinity: {}
annotations: {}
# clusterDeploymentID sets the Pachyderm cluster ID.
clusterDeploymentID: ""
configJob:
annotations: {}
# goMaxProcs is passed as GOMAXPROCS to the pachd container.
goMaxProcs: 0
image:
repository: "pachyderm/pachd"
pullPolicy: "IfNotPresent"
# tag defaults to the chart’s specified appVersion.
# This sets the worker image tag as well (they should be kept in lock step)
tag: ""
logFormat: "json"
logLevel: "info"
# If lokiDeploy is true, a Pachyderm-specific instance of Loki will
# be deployed.
lokiDeploy: true
# lokiLogging enables Loki logging if set.
lokiLogging: true
metrics:
# enabled sets the METRICS environment variable if set.
enabled: true
# endpoint should be the URL of the metrics endpoint.
endpoint: ""
priorityClassName: ""
nodeSelector: {}
# podLabels specifies labels to add to the pachd pod.
podLabels: {}
# resources specifies the resource requests and limits
# replicas sets the number of pachd running pods
replicas: 1
resources:
{}
#limits:
# cpu: "1"
# memory: "2G"
#requests:
# cpu: "1"
# memory: "2G"
# requireCriticalServersOnly only requires the critical pachd
# servers to startup and run without errors. It is analogous to the
# --require-critical-servers-only argument to pachctl deploy.
requireCriticalServersOnly: false
# If enabled, External service creates a service which is safe to
# be exposed externally
externalService:
enabled: false
# (Optional) specify the existing IP Address of the load balancer
loadBalancerIP: ""
apiGRPCPort: 30650
s3GatewayPort: 30600
annotations: {}
service:
# labels specifies labels to add to the pachd service.
labels: {}
# type specifies the Kubernetes type of the pachd service.
type: "ClusterIP"
annotations: {}
apiGRPCPort: 30650
prometheusPort: 30656
oidcPort: 30657
identityPort: 30658
s3GatewayPort: 30600
#apiGrpcPort:
# expose: true
# port: 30650
# DEPRECATED: activateEnterprise is no longer used.
activateEnterprise: false
## if pachd.activateEnterpriseMember is set, enterprise will be activated and connected to an existing enterprise server.
## if pachd.enterpriseLicenseKey is set, enterprise will be activated.
activateEnterpriseMember: false
## if pachd.activateAuth is set, auth will be bootstrapped by the config-job.
activateAuth: true
## the license key used to activate enterprise features
enterpriseLicenseKey: ""
# enterpriseLicenseKeySecretName is used to pass the enterprise license key value via an existing k8s secret.
# The value is pulled from the key, "enterprise-license-key".
enterpriseLicenseKeySecretName: ""
# if a token is not provided, a secret will be autogenerated on install and stored in the k8s secret 'pachyderm-bootstrap-config.rootToken'
rootToken: ""
# rootTokenSecretName is used to pass the rootToken value via an existing k8s secret
# The value is pulled from the key, "root-token".
rootTokenSecretName: ""
# if a secret is not provided, a secret will be autogenerated on install and stored in the k8s secret 'pachyderm-bootstrap-config.enterpriseSecret'
enterpriseSecret: ""
# enterpriseSecretSecretName is used to pass the enterprise secret value via an existing k8s secret.
# The value is pulled from the key, "enterprise-secret".
enterpriseSecretSecretName: ""
# if a secret is not provided, a secret will be autogenerated on install and stored in the k8s secret 'pachyderm-bootstrap-config.authConfig.clientSecret'
oauthClientID: pachd
oauthClientSecret: ""
# oauthClientSecretSecretName is used to set the OAuth Client Secret via an existing k8s secret.
# The value is pulled from the key, "pachd-oauth-client-secret".
oauthClientSecretSecretName: ""
oauthRedirectURI: ""
# DEPRECATED: enterpriseRootToken is deprecated, in favor of enterpriseServerToken
# NOTE only used if pachd.activateEnterpriseMember == true
enterpriseRootToken: ""
# DEPRECATED: enterpriseRootTokenSecretName is deprecated in favor of enterpriseServerTokenSecretName
# enterpriseRootTokenSecretName is used to pass the enterpriseRootToken value via an existing k8s secret.
# The value is pulled from the key, "enterprise-root-token".
enterpriseRootTokenSecretName: ""
# enterpriseServerToken represents a token that can authenticate to a separate pachyderm enterprise server,
# and is used to complete the enterprise member registration process for this pachyderm cluster.
# The user backing this token should have either the licenseAdmin & identityAdmin roles assigned, or
# the clusterAdmin role.
# NOTE: only used if pachd.activateEnterpriseMember == true
enterpriseServerToken: ""
# enterpriseServerTokenSecretName is used to pass the enterpriseServerToken value via an existing k8s secret.
# The value is pulled from the key, "enterprise-server-token".
enterpriseServerTokenSecretName: ""
# only used if pachd.activateEnterpriseMember == true
enterpriseServerAddress: ""
enterpriseCallbackAddress: ""
# Indicates to pachd whether dex is embedded in its process.
localhostIssuer: "" # "true", "false", or "" (used string as bool doesn't support empty value)
# set the initial pachyderm cluster role bindings, mapping a user to their list of roles
# ex.
# pachAuthClusterRoleBindings:
# robot:wallie:
# - repoReader
# robot:eve:
# - repoWriter
pachAuthClusterRoleBindings: {}
# additionalTrustedPeers is used to configure the identity service to recognize additional OIDC clients as trusted peers of pachd.
# For example, see the following example or the dex docs (https://dexidp.io/docs/custom-scopes-claims-clients/#cross-client-trust-and-authorized-party).
# additionalTrustedPeers:
# - example-app
additionalTrustedPeers: []
serviceAccount:
create: true
additionalAnnotations: {}
name: "pachyderm" #TODO Set default in helpers / Wire up in templates
storage:
# backend configures the storage backend to use. It must be one
# of GOOGLE, AMAZON, MINIO, MICROSOFT or LOCAL. This is set automatically
# if deployTarget is GOOGLE, AMAZON, MICROSOFT, or LOCAL
backend: ""
amazon:
# bucket sets the S3 bucket to use.
bucket: ""
# cloudFrontDistribution sets the CloudFront distribution in the
# storage secrets. It is analogous to the
# --cloudfront-distribution argument to pachctl deploy.
cloudFrontDistribution: ""
customEndpoint: ""
# disableSSL disables SSL. It is analogous to the --disable-ssl
# argument to pachctl deploy.
disableSSL: false
# id sets the Amazon access key ID to use. Together with secret
# and token, it implements the functionality of the
# --credentials argument to pachctl deploy.
id: ""
# logOptions sets various log options in Pachyderm’s internal S3
# client. Comma-separated list containing zero or more of:
# 'Debug', 'Signing', 'HTTPBody', 'RequestRetries',
# 'RequestErrors', 'EventStreamBody', or 'all'
# (case-insensitive). See 'AWS SDK for Go' docs for details.
# logOptions is analogous to the --obj-log-options argument to
# pachctl deploy.
logOptions: ""
# maxUploadParts sets the maximum number of upload parts. It is
# analogous to the --max-upload-parts argument to pachctl
# deploy.
maxUploadParts: 10000
# verifySSL performs SSL certificate verification. It is the
# inverse of the --no-verify-ssl argument to pachctl deploy.
verifySSL: true
# partSize sets the part size for object storage uploads. It is
# analogous to the --part-size argument to pachctl deploy. It
# has to be a string due to Helm and YAML parsing integers as
# floats. Cf. https://github.com/helm/helm/issues/1707
partSize: "5242880"
# region sets the AWS region to use.
region: ""
# retries sets the number of retries for object storage
# requests. It is analogous to the --retries argument to
# pachctl deploy.
retries: 10
# reverse reverses object storage paths. It is analogous to the
# --reverse argument to pachctl deploy.
reverse: true
# secret sets the Amazon secret access key to use. Together with id
# and token, it implements the functionality of the
# --credentials argument to pachctl deploy.
secret: ""
# timeout sets the timeout for object storage requests. It is
# analogous to the --timeout argument to pachctl deploy.
timeout: "5m"
# token optionally sets the Amazon token to use. Together with
# id and secret, it implements the functionality of the
# --credentials argument to pachctl deploy.
token: ""
# uploadACL sets the upload ACL for object storage uploads. It
# is analogous to the --upload-acl argument to pachctl deploy.
uploadACL: "bucket-owner-full-control"
google:
bucket: ""
# cred is a string containing a GCP service account private key,
# in object (JSON or YAML) form. A simple way to pass this on
# the command line is with the set-file flag, e.g.:
#
# helm install pachyderm -f my-values.yaml --set-file storage.google.cred=creds.json pachyderm/pachyderm
cred: ""
# Example:
# cred: |
# {
# "type": "service_account",
# "project_id": "…",
# "private_key_id": "…",
# "private_key": "-----BEGIN PRIVATE KEY-----\n…\n-----END PRIVATE KEY-----\n",
# "client_email": "…@….iam.gserviceaccount.com",
# "client_id": "…",
# "auth_uri": "https://accounts.google.com/o/oauth2/auth",
# "token_uri": "https://oauth2.googleapis.com/token",
# "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
# "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/…%40….iam.gserviceaccount.com"
# }
local:
# hostPath indicates the path on the host where the PFS metadata
# will be stored. It must end in /. It is analogous to the
# --host-path argument to pachctl deploy.
hostPath: ""
requireRoot: true #Root required for hostpath, but we run rootless in CI
microsoft:
container: ""
id: ""
secret: ""
minio:
# minio bucket name
bucket: ""
# the minio endpoint. Should only be the hostname:port, no http/https.
endpoint: ""
# the username/id with readwrite access to the bucket.
id: ""
# the secret/password of the user with readwrite access to the bucket.
secret: ""
# enable https for minio with "true" defaults to "false"
secure: ""
# Enable S3v2 support by setting signature to "1". This feature is being deprecated
signature: ""
# putFileConcurrencyLimit sets the maximum number of files to
# upload or fetch from remote sources (HTTP, blob storage) using
# PutFile concurrently. It is analogous to the
# --put-file-concurrency-limit argument to pachctl deploy.
putFileConcurrencyLimit: 100
# uploadConcurrencyLimit sets the maximum number of concurrent
# object storage uploads per Pachd instance. It is analogous to
# the --upload-concurrency-limit argument to pachctl deploy.
uploadConcurrencyLimit: 100
# The shard size corresponds to the total size of the files in a shard.
# The shard count corresponds to the total number of files in a shard.
# If either criteria is met, a shard will be created.
compactionShardSizeThreshold: 0
compactionShardCountThreshold: 0
ppsWorkerGRPCPort: 1080
# the number of seconds between pfs's garbage collection cycles.
# if this value is set to 0, it will default to pachyderm's internal configuration.
# if this value is less than 0, it will turn off garbage collection.
storageGCPeriod: 0
# the number of seconds between chunk garbage colletion cycles.
# if this value is set to 0, it will default to pachyderm's internal configuration.
# if this value is less than 0, it will turn off chunk garbage collection.
storageChunkGCPeriod: 0
# There are three options for TLS:
# 1. Disabled
# 2. Enabled, existingSecret, specify secret name
# 3. Enabled, newSecret, must specify cert, key and name
tls:
enabled: false
secretName: ""
newSecret:
create: false
crt: ""
key: ""
tolerations: []
worker:
image:
repository: "pachyderm/worker"
pullPolicy: "IfNotPresent"
# Worker tag is set under pachd.image.tag (they should be kept in lock step)
serviceAccount:
create: true
additionalAnnotations: {}
# name sets the name of the worker service account. Analogous to
# the --worker-service-account argument to pachctl deploy.
name: "pachyderm-worker" #TODO Set default in helpers / Wire up in templates
rbac:
# create indicates whether RBAC resources should be created.
# Setting it to false is analogous to passing --no-rbac to pachctl
# deploy.
create: true
kubeEventTail:
# Deploys a lightweight app that watches kubernetes events and echos them to logs.
enabled: true
# clusterScope determines whether kube-event-tail should watch all events or just events in its namespace.
clusterScope: false
image:
repository: pachyderm/kube-event-tail
pullPolicy: "IfNotPresent"
tag: "v0.0.6"
resources:
limits:
cpu: "1"
memory: 100Mi
requests:
cpu: 100m
memory: 45Mi
pgbouncer:
service:
type: ClusterIP
annotations: {}
priorityClassName: ""
nodeSelector: {}
tolerations: []
image:
repository: pachyderm/pgbouncer
tag: 1.16.1-debian-10-r82
resources:
{}
#limits:
# cpu: "1"
# memory: "2G"
#requests:
# cpu: "1"
# memory: "2G"
# maxConnections specifies the maximum number of concurrent connections into pgbouncer.
maxConnections: 1000
# defaultPoolSize specifies the maximum number of concurrent connections from pgbouncer to the postgresql database.
defaultPoolSize: 20
# Note: Postgres values control the Bitnami Postgresql Subchart
postgresql:
# enabled controls whether to install postgres or not.
# If not using the built in Postgres, you must specify a Postgresql
# database server to connect to in global.postgresql
# The enabled value is watched by the 'condition' set on the Postgresql
# dependency in Chart.yaml
enabled: true
image:
tag: "13.3.0"
# DEPRECATED from pachyderm 2.1.5
initdbScripts:
dex.sh: |
#!/bin/bash
set -e
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
CREATE DATABASE dex;
GRANT ALL PRIVILEGES ON DATABASE dex TO "$POSTGRES_USER";
EOSQL
fullnameOverride: postgres
persistence:
# Specify the storage class for the postgresql Persistent Volume (PV)
# See notes in Bitnami chart values.yaml file for more information.
# More info for setting up storage classes on various cloud providers:
# AWS: https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html
# GCP: https://cloud.google.com/compute/docs/disks/performance#disk_types
# Azure: https://docs.microsoft.com/en-us/azure/aks/concepts-storage#storage-classes
storageClass: ""
# storageSize specifies the size of the volume to use for postgresql
# Recommended Minimum Disk size for Microsoft/Azure: 256Gi - 1,100 IOPS https://azure.microsoft.com/en-us/pricing/details/managed-disks/
# Recommended Minimum Disk size for Google/GCP: 50Gi - 1,500 IOPS https://cloud.google.com/compute/docs/disks/performance
# Recommended Minimum Disk size for Amazon/AWS: 500Gi (GP2) - 1,500 IOPS https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html
size: 10Gi
labels:
suite: pachyderm
primary:
priorityClassName: ""
nodeSelector: {}
tolerations: []
readReplicas:
priorityClassName: ""
nodeSelector: {}
tolerations: []
cloudsqlAuthProxy:
# connectionName may be found by running `gcloud sql instances describe INSTANCE_NAME --project PROJECT_ID`
connectionName: ""
serviceAccount: ""
iamLogin: false
port: 5432
enabled: false
image:
# repository is the image repo to pull from; together with tag it
# replicates the --dash-image & --registry arguments to pachctl
# deploy.
repository: "gcr.io/cloudsql-docker/gce-proxy"
pullPolicy: "IfNotPresent"
# tag is the image repo to pull from; together with repository it
# replicates the --dash-image argument to pachctl deploy.
tag: "1.23.0"
priorityClassName: ""
nodeSelector: {}
tolerations: []
# podLabels specifies labels to add to the dash pod.
podLabels: {}
# resources specifies the resource request and limits.
resources: {}
# requests:
# # The proxy's memory use scales linearly with the number of active
# # connections. Fewer open connections will use less memory. Adjust
# # this value based on your application's requirements.
# memory: ""
# # The proxy's CPU use scales linearly with the amount of IO between
# # the database and the application. Adjust this value based on your
# # application's requirements.
# cpu: ""
service:
# labels specifies labels to add to the cloudsql auth proxy service.
labels: {}
# type specifies the Kubernetes type of the cloudsql auth proxy service.
type: ClusterIP
oidc:
issuerURI: "" #Inferred if running locally or using ingress
requireVerifiedEmail: false
# IDTokenExpiry is parsed into golang's time.Duration: https://pkg.go.dev/time#example-ParseDuration
IDTokenExpiry: 24h
# (Optional) If set, enables OIDC rotation tokens, and specifies the duration where they are valid.
# RotationTokenExpiry is parsed into golang's time.Duration: https://pkg.go.dev/time#example-ParseDuration
RotationTokenExpiry: 48h
# (Optional) Only set in cases where the issuerURI is not user accessible (ie. localhost install)
userAccessibleOauthIssuerHost: ""
## to set up upstream IDPs, set pachd.mockIDP to false,
## and populate the pachd.upstreamIDPs with an array of Dex Connector configurations.
## See the example below or https://dexidp.io/docs/connectors/
# upstreamIDPs:
# - id: idpConnector
# config:
# issuer: ""
# clientID: ""
# clientSecret: ""
# redirectURI: "http://localhost:30658/callback"
# insecureEnableGroups: true
# insecureSkipEmailVerified: true
# insecureSkipIssuerCallbackDomainCheck: true
# forwardedLoginParams:
# - login_hint
# name: idpConnector
# type: oidc
#
# - id: okta
# config:
# issuer: "https://dev-84362674.okta.com"
# clientID: "client_id"
# clientSecret: "notsecret"
# redirectURI: "http://localhost:30658/callback"
# insecureEnableGroups: true
# insecureSkipEmailVerified: true
# insecureSkipIssuerCallbackDomainCheck: true
# forwardedLoginParams:
# - login_hint
# name: okta
# type: oidc
upstreamIDPs: []
# upstreamIDPsSecretName is used to pass the upstreamIDPs value via an existing k8s secret.
# The value is pulled from the secret key, "upstream-idps".
upstreamIDPsSecretName: ""
# Some dex configurations (like Google) require a credential file. Whatever secret is included in this
# below secret will be mounted to the pachd pod at /dexcreds/ so for example serviceAccountFilePath: /dexcreds/googleAuth.json
dexCredentialSecretName: ""
mockIDP: true
# additionalClients specifies a list of clients for the cluster to recognize
# See the ecample below or the dex docs (https://dexidp.io/docs/using-dex/#configuring-your-app).
# additionalOIDCClient:
# - id: example-app
# secret: example-app-secret
# name: 'Example App'
# redirectURIs:
# - 'http://127.0.0.1:5555/callback'
additionalClients: []
additionalClientsSecretName: ""
#TODO scopes:
testConnection:
image:
repository: alpine
tag: latest
# The proxy is a service to handle all Pachyderm traffic (S3, Console, OIDC, Dex, GRPC) on a single
# port; good for exposing directly to the Internet.
proxy:
# If enabled, create a proxy deployment (based on the Envoy proxy) and a service to expose it. If
# ingress is also enabled, any Ingress traffic will be routed through the proxy before being sent
# to pachd or Console.
enabled: false
# The external hostname (including port if nonstandard) that the proxy will be reachable at.
# If you have ingress enabled and an ingress hostname defined, the proxy will use that.
# Ingress will be deprecated in the future so configuring the proxy host instead is recommended.
host: ""
# The number of proxy replicas to run. 1 should be fine, but if you want more for higher
# availability, that's perfectly reasonable. Each replica can handle 50,000 concurrent
# connections. There is an affinity rule to prefer scheduling the proxy pods on the same node as
# pachd, so a number here that matches the number of pachd replicas is a fine configuration.
# (Note that we don't guarantee to keep the proxy<->pachd traffic on-node or even in-region.)
replicas: 1
# The envoy image to pull.
image:
repository: "envoyproxy/envoy"
tag: "v1.22.0"
pullPolicy: "IfNotPresent"
# Set up resources. The proxy is configured to shed traffic before using 500MB of RAM, so that's
# a resonable memory limit. It doesn't need much CPU.
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
memory: 512Mi
# Any additional labels to add to the pods. These are also added to the deployment and service
# selectors.
labels: {}
# Any additional annotations to add to the pods.
annotations: {}
# Configure the service that routes traffic to the proxy.
service:
# The type of service can be ClusterIP, NodePort, or LoadBalancer.
type: ClusterIP
# If the service is a LoadBalancer, you can specify the IP address to use.
loadBalancerIP: ""
# The port to serve plain HTTP traffic on.
httpPort: 80
# The port to serve HTTPS traffic on, if enabled below.
httpsPort: 443
# If the service is a NodePort, you can specify the port to receive HTTP traffic on.
httpNodePort: 30080
httpsNodePort: 30443
# Any additional annotations to add.
annotations: {}
# Any additional labels to add to the service itself (not the selector!).
labels: {}
# The proxy can also serve each backend service on a numbered port, and will do so for any port
# not numbered 0 here. If this service is of type NodePort, the port numbers here will be used
# for the node port, and will need to be in the node port range.
legacyPorts:
console: 0 # legacy 30080, conflicts with default httpNodePort
grpc: 0 # legacy 30650
s3Gateway: 0 # legacy 30600
oidc: 0 # legacy 30657
identity: 0 # legacy 30658
metrics: 0 # legacy 30656
# Configuration for TLS (SSL, HTTPS).
tls:
# If true, enable TLS serving. Enabling TLS is incompatible with support for legacy ports (you
# can't get a generally-trusted certificate for port numbers), and disables support for
# cleartext communication (cleartext requests will redirect to the secure server, and HSTS
# headers are set to prevent downgrade attacks).
#
# Note that if you are planning on putting the proxy behind an ingress controller, you probably
# want to configure TLS for the ingress controller, not the proxy. This is intended for the
# case where the proxy is exposed directly to the Internet. (It is possible to have your
# ingress controller talk to the proxy over TLS, in which case, it's fine to enable TLS here in
# addition to in the ingress section above.)
enabled: false
# The secret containing "tls.key" and "tls.crt" keys that contain PEM-encoded private key and
# certificate material. Generate one with "kubectl create secret tls <name> --key=tls.key
# --cert=tls.cert". This format is compatible with the secrets produced by cert-manager, and
# the proxy will pick up new data when cert-manager rotates the certificate.
secretName: ""
# If set, generate the secret from values here. This is intended only for unit tests.
secret: {}
deployTarget #
deployTarget
is where you’re deploying pachyderm. It configures the storage backend to use and cloud provider settings.
It must be one of:
GOOGLE
AMAZON
MINIO
MICROSOFT
CUSTOM
LOCAL
global #
global.postgreSQL #
This section is to configure the connection to the postgresql database. By default, it uses the included postgres service.
postgresqlUsername
is the username to access the pachyderm and dex databasespostgresqlPassword
to access the postgresql database. If blank, a value will be generated by the postgres subchartpostgresqlExistingSecretKey
the secret name of the secret containingpostgresqlPassword
(key:postgresql-postgres-password
) When using autogenerated value for the initial install, it must be pulled from the postgres secret and added to values.yaml for future helm upgrades.postgresqlDatabase
is the database name where pachyderm data will be storedpostgresqlHost
is the postgresql database host to connect to.postgresqlPort
is the postgresql database port to connect to.postgresqlSSL
is the SSL mode to use for connecting to Postgres, for the default local postgres it is disabled
global.imagePullSecrets #
imagePullSecrets
allow you to pull images from private repositories, these will also be added to pipeline workers https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
Example:
imagePullSecrets:
- regcred
console #
This section is to configure the Pachyderm UI (console
). It is enabled by default.
-
console.enabled
turns on the deployment of the UI. -
console.image
sets the image to use for the console. This can be left at the defaults unless instructed. -
console.podLabels
specifies labels to add to the console pod. -
console.resources
specifies resources and limits in standard kubernetes format. It is left unset by default. -
console.service.labels
specifies labels to add to the console service. -
console.service.type
specifies the Kubernetes type of the console service. The default isClusterIP
.
console.config #
This is where the primary configuration settings for the console are configured, including authentication.
-
config.reactAppRuntimeIssuerURI
this is the pachd oauth address thats accesible to clients outside of the cluster itself. When running local withkubectl port-forward
this would be set to localhost ("http://localhost:30658/"
). Otherwiswe this has to be an address acessible to clients. -
config.oauthRedirectURI
this is the oauth callback address within console that the pachd oauth service would redirect to. It’s the URL of console with/oauth/callback/?inline=true
appended. Running locally its therefore"http://localhost:4000/oauth/callback/?inline=true"
. -
config.oauthClientID
the client identifier for the Console with pachd -
config.oauthClientSecret
the secret configured for the client with pachd -
config.oauthClientSecretSecretName
the secret name of the secret containingoauthClientSecret
(key:OAUTH_CLIENT_SECRET
) -
config.graphqlPort
the http port that the console service will be accessible on. -
config.disableTelemetry
this can be set to true to opt out of console’s analytics and error data collection.
etcd #
This section is to configure the etcd cluster in the deployment.
-
etcd.image
sets the image to use for the etcd. This can be left at the defaults unless instructed. -
etcd.podLabels
specifies labels to add to the etcd pods. -
etcd.resources
specifies resources and limits in standard kubernetes format. It is left unset by default. -
etcd.dynamicNodes
sets the number of nodes in the etcd StatefulSet. The default is 1. -
etcd.storageClass
indicates the etcd should use an existing StorageClass for its storage. If left blank, a storageClass will be created. -
etcd.storageSize
specifies the size of the volume to use for etcd. Etcd does not require much space. For storage that scales IOPs with size, the size must be set large enought to provide at least 1000 IOPs for performance. If you do not specify, it will default to 256Gi on Azure and 100Gi on GCP/AWS for that reason. -
etcd.service.labels
specifies labels to add to the console service. -
etcd.service.annotations
specifies annotations to add to the etcd service. -
etcd.service.type
specifies the Kubernetes type of the etcd service. The default isClusterIP
.
enterpriseServer #
This section is to configure the Enterprise Server deployment (if desired).
-
enterpriseServer.enabled
turns on the deployment of the Enterprise Server. It is disabled by default. -
enterpriseServer.service.type
specifies the Kubernetes type of the console service. The default isClusterIP
. -
enterpriseServer.resources
specifies resources and limits in standard kubernetes format. It is left unset by default. -
enterpriseServer.podLabels
specifies labels to add to the enterpriseServer pod. -
enterpriseServer.image
sets the image to use for the etcd. This can be left at the defaults unless instructed.
enterpriseServer.tls #
There are three options for configuring TLS on the Enterprise Server under enterpriseServer.tls
.
disabled
. TLS is not used.enabled
, using an existing secret. You must set enabled to true and provide a secret name where the exiting cert and key are stored.enabled
, using a new secret. You must set enabled to true andnewSecret.create
to true and specify a secret name, and a cert and key in string format
ingress #
ingress
will be removed from the helm chart once the deployment of Pachyderm with a proxy becomes mandatory.
This section is to configure an ingress resource for an existing ingress controller.
-
ingress.enabled
turns on the creation of the ingress for the UI. -
ingress.annotations
specifies annotations to add to the ingress resource. -
host
your domain name, external IP address, or localhost. -
uriHttpsProtoOverride
when set to true, uriHttpsProtoOverride will add the https protocol to the ingress URI routes without configuring certs
ingress.tls #
There are three options for configuring TLS on the ingress under ingress.tls
.
disabled
. TLS is not used.enabled
, using an existing secret. You must set enabled to true and provide a secret name where the exiting cert and key are stored.enabled
, using a new secret. You must set enabled to true andnewSecret.create
to true and specify a secret name, and a cert and key in string format
loki-stack #
The loki-stack
section contains all the values passed to the loki-stack
subchart.
loki #
See the official Loki storage documentation for the most up-to-date information.
grafana #
See the official Grafana documentation for the most up-to-date information.
promtail #
See the official Promtail documentation for the most up-to-date information.
pachd #
This section is to configure the pachd deployment.
-
pachd.enabled
turns on the deployment of pachd. -
pachd.image
sets the image to use for pachd. This can be left at the defaults unless instructed. -
pachd.logFormat
sets the logging format (text
orjson
).json
is default. -
pachd.logLevel
sets the logging level.info
is default. -
pachd.lokiLogging
enables Loki logging if set. -
pachd.podLabels
specifies labels to add to the pachd pod. -
pachd.resources
specifies resources and limits in standard kubernetes format. It is left unset by default. -
pachd.requireCriticalServersOnly
only requires the critical pachd servers to startup and run without errors. -
pachd.service.labels
specifies labels to add to the pachd service. -
pachd.service.type
specifies the Kubernetes type of the pachd service. The default isClusterIP
.
pachd.externalService
will be removed from the helm chart once the deployment of Pachyderm with a proxy becomes mandatory.
-
pachd.externalService.enabled
creates a kubernetes service of typeloadBalancer
that is safe to expose externally. -
pachd.externalService.loadBalancerIP
optionally supply the existing IP address of the load balancer. -
pachd.externalService.apiGRPCPort
is the desired api GRPC port (30650 is default). -
pachd.externalService.s3GatewayPort
is the desired s3 gateway port (30600 is default). -
pachd.externalService.annotations
add your service annotations. -
pachd.activateEnterpriseMember
specifies whether to activate with an enterprise server. If pachd.activateEnterpriseMember is set, enterprise will be activated and connected to an existing enterprise server. -
activateAuth
If pachd.activateAuth is set, auth will be bootstrapped by the config-job. Defaults to true. -
pachd.enterpriseLicenseKey
specifies the enterprise license key if you have one. If pachd.enterpriseLicenseKey is set, enterprise will be activated. Alternatively, you can pass this value in a secret. -
pachd.enterpriseLicenseKeySecretName
specifies the secret name containing pachd License Key. -
pachd.rootToken
is the auth token used to communicate with the cluster as the root user. If a secret name is not provided inrootTokenSecretName
, a secret containingrootToken
(or a randomly generated value if empty) will be created on install and stored in the k8s secretpachyderm-auth
under the keyrootToken
-
pachd.rootTokenSecretName
specifies the secret name containing pachd root token secret (key:rootToken
). -
pachd.enterpriseSecret
specifies the enterprise cluster secret. If a secret name is not provided inenterpriseSecretSecretName
, a secret containingenterpriseSecret
(or a randomly generated value if empty) will be created on install and stored in the k8s secretpachyderm-enterprise
under the keyenterprise-secret
-
pachd.enterpriseSecretSecretName
specifies the secret name containing pachd enterpriseSecret (key:enterprise-secret
). -
pachd.oauthClientID
specifies the Oauth client ID representing pachd. Defaults to “pachd”. -
pachd.oauthClientSecret
specifies the Oauth client secret. If a secret name is not provided inoauthClientSecretSecretName
, a secret containingoauthClientSecret
(or a randomly generated value if empty) will be created on install and stored in the k8s secretpachyderm-auth
under the keyauth-config
-
pachd.oauthClientSecretSecretName
specifies the secret name containing pachd Oauth client secret (key:auth-config
). -
pachd.oauthRedirectURI
specifies the Oauth redirect URI served by pachd. Examplehttp://<PACHD-IP>:30657/authorization-code/callback
. -
pachd.enterpriseRootToken
only used if pachd.activateEnterpriseMember == true -
pachd.enterpriseRootTokenSecretName
specifies the secret name containingenterpriseRootToken
-
pachd.enterpriseServerAddress
only used if pachd.activateEnterpriseMember == true -
pachd.enterpriseCallbackAddress
only used if pachd.activateEnterpriseMember == true -
pachd.localhostIssuer
specifies to pachd whether dex is embedded in its process. This value can be set to “true”, “false”, or “”.
If any of rootToken
,enterpriseSecret
, or oauthClientSecret
are blank, a value will be generated automatically.
-
pachd.serviceAccount.create
creates a kubernetes service account for pachd. Default is true. -
pachd.rbac.create
indicates whether RBAC resources should be created. Default is true.
pachd.storage #
This section of pachd
configures the back end storage for pachyderm.
-
storage.backend
configures the storage backend to use. It must be one of GOOGLE, AMAZON, MINIO, MICROSOFT or LOCAL. This is set automatically if deployTarget is GOOGLE, AMAZON, MICROSOFT, or LOCAL. -
storage.putFileConcurrencyLimit
sets the maximum number of files to upload or fetch from remote sources (HTTP, blob storage) using PutFile concurrently. -
storage.uploadConcurrencyLimit
sets the maximum number of concurrent object storage uploads per Pachd instance.
pachd.storage.amazon #
If you’re using Amazon S3 as your storage backend, configure it here.
-
storage.amazon.bucket
sets the S3 bucket to use. -
storage.amazon.cloudFrontDistribution
sets the CloudFront distribution in the storage secrets. -
storage.amazon.customEndpoint
sets a custom s3 endpoint. -
storage.amazon.disableSSL
disables SSL. -
storage.amazon.id
sets the Amazon access key ID to use. -
storage.amazon.logOptions
sets various log options in Pachyderm’s internal S3 client. Comma-separated list containing zero or more of: ‘Debug’, ‘Signing’, ‘HTTPBody’, ‘RequestRetries’,‘RequestErrors’, ‘EventStreamBody’, or ‘all’ (case-insensitive). See ‘AWS SDK for Go’ docs for details. -
storage.amazon.maxUploadParts
sets the maximum number of upload parts. Default is10000
. -
storage.amazon.verifySSL
performs SSL certificate verification. -
storage.amazon.partSize
sets the part size for object storage uploads. It has to be a string due to Helm and YAML parsing integers as floats. -
storage.amazon.region
sets the AWS region to use. -
storage.amazon.retries
sets the number of retries for object storage requests.. -
storage.amazon.reverse
reverses object storage paths. -
storage.amazon.secret
sets the Amazon secret access key to use. -
storage.amazon.timeout
sets the timeout for object storage requests. -
storage.amazon.token
optionally sets the Amazon token to use. -
storage.amazon.uploadACL
sets the upload ACL for object storage uploads.
pachd.storage.google #
If you’re using Google Storage Buckets as your storage backend, configure it here.
-
storage.google.bucket
sets the object bucket to use. -
storage.google.cred
is a string containing a GCP service account private key, in object (JSON or YAML) form. A simple way to pass this on the command line is with the set-file flag, e.g.:helm install pachyderm -f my-values.yaml --set-file storage.google.cred=creds.json pach/pachyderm
Example:
cred: | { "type": "service_account", "project_id": "…", "private_key_id": "…", "private_key": "-----BEGIN PRIVATE KEY-----\n…\n-----END PRIVATE KEY-----\n", "client_email": "…@….iam.gserviceaccount.com", "client_id": "…", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/…%40….iam.gserviceaccount.com" }
-
storage.local.hostpath
indicates the path on the host where the PFS metadata will be stored. -
storage.local.requireRoot
pachd.storage.microsoft #
If you’re using Microsoft Blob Storage as your storage backend, configure it here.
-
storage.microsoft.container
sets the blob storage container. -
storage.microsoft.id
sets the access key ID to use. -
storage.microsoft.secret
sets the secret access key to use.
pachd.storage.minio #
If you’re using MinIO as your storage backend, configure it here.
-
storage.minio.bucket
sets the bucket to use. -
storage.minio.endpoint
sets the object endpoint. -
storage.minio.id
sets the access key ID to use. -
storage.minio.secret
sets the secret access key to use. -
storage.minio.secure
set to true for a secure connection. -
storage.minio.signature
sets the signature version to use.
pachd.tls #
There are three options for configuring TLS on pachd under pachd.tls
.
disabled
. TLS is not used.enabled
, using an existing secret. You must set enabled to true and provide a secret name where the exiting cert and key are stored.enabled
, using a new secret. You must set enabled to true andnewSecret.create
to true and specify a secret name, and a cert and key in string format.
pgbouncer #
This section is to configure the PGBouncer Postgres connection pooler.
-
service.type
specifies the Kubernetes type of the pgbouncer service. The default isClusterIP
. -
resources
specifies resources and limits in standard kubernetes format. It is left unset by default. -
maxConnections
defaults to 1000
postgresql #
This section is to configure the PostgresQL Subchart, if used.
-
enabled
controls whether to install postgres or not. If not using the built in Postgres, you must specify a Postgresql database server to connect to inglobal.postgresql
. The enabled value is watched by the ‘condition’ set on the Postgresql dependency in Chart.yaml -
image.tag
sets the postgres version. Leave at the default unless instructed otherwise. -
initdbScripts
creates the initaldex
database that’s needed for pachyderm. Leave at the default unless instructed otherwise. -
persistence.storageClass
specifies the storage class for the postgresql Persistent Volume (PV)
storageSize
specifies the size of the volume to use for postgresql.
- Recommended Minimum Disk size for Microsoft/Azure: 256Gi - 1,100 IOPS https://azure.microsoft.com/en-us/pricing/details/managed-disks/
- Recommended Minimum Disk size for Google/GCP: 50Gi - 1,500 IOPS https://cloud.google.com/compute/docs/disks/performance
- Recommended Minimum Disk size for Amazon/AWS: 500Gi (GP2) - 1,500 IOPS https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html
cloudsqlAuthProxy #
This section is to configure the CloudSQL Auth Proxy for deploying Pachyderm on GCP with CloudSQL.
-
connectionName
may be found by runninggcloud sql instances describe INSTANCE_NAME --project PROJECT_ID
-
serviceAccount
is the account to use to connect to the cloudSql instance. -
enabled
controls whether to deploy the cloudsqlAuthProxy. Default is false. -
port
is the cloudql database port to expose. The default is5432
-
service.type
specifies the Kubernetes type of the cloudsqlAuthProxy service. The default isClusterIP
.
oidc #
This section is to configure the oidc settings within pachyderm.
-
oidc.issuerURI
specifies the Oauth Issuer. Inferred if running locally or using ingress. -
oidc.requireVerifiedEmail
specifies whether email verification is required for authentication. -
oidc.IDTokenExpiry
specifies the duration where OIDC ID Tokens are valid. -
oidc.RotationTokenExpiry
if set, enables OIDC Rotation Tokens and specifies the duration where they are valid. -
oidc.upstreamIDPs
specifies a list of Identity Providers to use for authentication. -
oidc.upstreamIDPsSecretName
specifies the secret name of the secret containingupstreamIDPs
(key:upstream-idps
) -
oidc.mockIDP
when set totrue
, specifes to ignoreupstreamIDPs
in favor of a placeholder IDP with a username/password preset to “admin” and “password”. -
oidc.userAccessibleOauthIssuerHost
specifies the Oauth issuer’s address host that’s used in the Oauth authorization redirect URI. This value is only necessary in local settings or anytime the registered Issuer address isn’t accessible outside the cluster.
proxy #
The proxy is a service to handle all Pachyderm traffic (S3, Console, OIDC, Dex, GRPC) on a single port exposed directly to the Internet.
-
proxy.enabled
when set totrue
, create a proxy deployment and a service to expose it. If ingress is also enabled, any ingress traffic will be routed through the proxy before being sent. -
proxy.replicas
: The number of proxy replicas to run. 1 is a default but can be increased for higher availability. Each replica can handle 50,000 concurrent connections. There is an affinity rule to prefer scheduling the proxy pods on the same node as pachd, so a number here that matches the number of pachd replicas is a fine configuration. Use this setting to scale pachd instances horizontally; horizontal scaling spreads the load from PFS across each instance.
We don’t guarantee to keep the proxy<->pachd traffic on-node or even in-region.
-
proxy.image
specifies the details of the proxy image to pull. THe following fields can be left at the defaults unless instructed.-
proxy.image.repository
the proxy image repository. Defaults to “envoyproxy/envoy”. -
proxy.image. tag
the tag of the proxy image. -
proxy.image.pullPolicy
the proxy image pull policy. Defaults to “IfNotPresent”. -
proxy.resources
specifies the proxy resources. The proxy is configured to shed traffic before using 500MB of RAM, so that’s a resonable memory limit. It doesn’t need much CPU.-
proxy.resources.requests
proxy.resources.requests.cpu
defaults to 100mproxy.resources.requests.memory
defaults to 512Mi
-
proxy.resources.limits.memory
sets the proxy memory limit. defaults to 512Mi.
-
-
proxy.labels
list any additional labels to add to the pods. These are also added to the deployment and service selectors. -
proxy.annotations
list any additional annotations to add to the pods. -
proxy.service
this section configure the service that routes traffic to the proxy.-
proxy.service.type
The type of service can be ClusterIP, NodePort, or LoadBalancer. Defaults to ClusterIP.- If the service is a LoadBalancer, you can specify the IP
proxy.service.loadBalancerIP
address to use, the port to serve plain HTTP traffic onproxy.service.httpPort
(defaults to 80), the port to serve HTTPS traffic onproxy.service.httpsPort
(defaults to 443), if enabled below. - If the service is a NodePort, you can specify the port to receive HTTP traffic on
proxy.service.httpNodePort
(defaults to 30080), andproxy.service.httpsNodePort
(defaults to 30443).
- If the service is a LoadBalancer, you can specify the IP
-
proxy.service.annotations
list any additional annotations to add. -
proxy.service.labels
list any additional labels to add to the service itself (not the selector!). -
proxy.service.legacyPorts
The proxy can also serve each backend service on a numbered port, and will do so for any port not numbered 0. If this service is of type NodePort, the port numbers will be used for the node port, and will need to be in the node port range.proxy.service.legacyPorts.console
defaults to 0, legacy 30080, conflicts with default httpNodePortproxy.service.legacyPorts.grpc
defaults to 0, legacy 30650proxy.service.legacyPorts.s3Gateway
defaults to 0, legacy 30600proxy.service.legacyPorts.oidc
defaults to 0, legacy 30657proxy.service.legacyPorts.identity
defaults to 0, legacy 30658proxy.service.legacyPorts.metrics
defaults to 0, legacy 30656
-
-
proxy.tls.enabled
enable TLS (SSL, HTTPS) serving whentrue
. Enabling TLS is incompatible with support for legacy ports (you can’t get a generally-trusted certificate for port numbers), and disables support for cleartext communication (cleartext requests will redirect to the secure server, and HSTS headers are set to prevent downgrade attacks). -
proxy.tls.secretName
The secret containing “tls.key” and “tls.crt” keys that contain PEM-encoded private key and certificate material. Generate one with “kubectl create secret tls–key=tls.key –cert=tls.cert”. This format is compatible with the secrets produced by cert-manager.
-