Quickstart
Learn how to deploy the latest version of Pachyderm quickly with simplified instructions and pre-set Helm values.
March 30, 2023
On this page, you will find simplified deployment instructions and Helm values to get you started with the latest release of Pachyderm on the Kubernetes Engine of your choice (AWS (EKS), Google (GKS), and Azure (AKS)).
For each cloud provider, we will give you the option to “quick deploy” Pachyderm with or without an enterprise key. A quick deployment allows you to experiment with Pachyderm without having to go through any infrastructure setup. In particular, you do not need to set up any object store or PostgreSQL instance.
The deployment steps highlighted in this document are not intended for production. For production settings, please read our infrastructure recommendations. In particular, we recommend:
- the use of a managed PostgreSQL server (RDS, CloudSQL, or PostgreSQL Server) rather than Pachyderm’s default bundled PostgreSQL.
- the setup of a TCP Load Balancer in front of your pachd service.
- the setup of an Ingress Controller in front of Console.
Then find your targeted Cloud provider in the Deploy and Manage section of this documentation.
We are now shipping Pachyderm with an optional embedded proxy allowing your cluster to expose one single port externally. This deployment setup is optional.
If you choose to deploy Pachyderm with a Proxy, check out our new recommended architecture and deployment instructions.
Deploying with a proxy presents a couple of advantages:
- You only need to set up one TCP Load Balancer (No more Ingress in front of Console).
- You will need one DNS only.
- It simplifies the deployment of Console.
- No more port-forward.
1. Prerequisites #
Pachyderm is deployed on a Kubernetes Cluster.
Install the following clients on your machine before you start creating your cluster. Use the latest available version of the components listed below.
- kubectl: the cli to interact with your cluster.
- pachctl: the cli to interact with Pachyderm.
- Install
Helm
for your deployment.
Get a Enterprise key
To get a free-trial token, fill in this form, get in touch with us at sales@pachyderm.io, or on our Slack.
Select your favorite cloud provider.
Note that we often use the acronym CE
for Community Edition.
2. Create Your Values.yaml #
Pachyderm comes with a Web UI (Console) per default.
AWS #
Additional client installation: Install AWS CLI
Create an S3 bucket for your data
Create a values.yaml
Deploy Pachyderm CE (includes Console CE) #
deployTarget: "AMAZON"
pachd:
storage:
amazon:
bucket: "bucket_name"
# this is an example access key ID taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
id: "AKIAIOSFODNN7EXAMPLE"
# this is an example secret access key taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
region: "us-east-2"
externalService:
enabled: true
console:
enabled: true
Deploy Enterprise with Console #
Note that when deploying Enterprise with Console, we create a default mock user (username:admin
, password: password
) to authenticate yourself to Console so you don’t have to connect an Identity Provider to make things work. The mock user is a Cluster Admin per default.
deployTarget: "AMAZON"
pachd:
storage:
amazon:
bucket: "bucket_name"
# this is an example access key ID taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
id: "AKIAIOSFODNN7EXAMPLE"
# this is an example secret access key taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
region: "us-east-2"
# Enterprise key
enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
console:
enabled: true
Jump to Helm install
Google #
Additional client installation: Install Google Cloud SDK
Create a GKE cluster Note: Add
--scopes storage-rw
to yourgcloud container clusters create
command.Create a GCS Bucket for your data
Create a values.yaml
Deploy Pachyderm CE (includes Console CE) #
deployTarget: "GOOGLE"
pachd:
storage:
google:
bucket: "bucket_name"
cred: |
INSERT JSON CONTENT HERE
externalService:
enabled: true
console:
enabled: true
Deploy Enterprise with Console #
Note that when deploying Enterprise with Console, we create a default mock user (username:admin
, password: password
) to authenticate yourself to Console so you don’t have to connect an Identity Provider to make things work. The mock user is a Cluster Admin per default.
deployTarget: "GOOGLE"
pachd:
storage:
google:
bucket: "bucket_name"
cred: |
INSERT JSON CONTENT HERE
# Enterprise key
enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
console:
enabled: true
Jump to Helm install
Azure #
- This section assumes that you have an Azure Subscription.
Additional client installation: Install Azure CLI 2.0.1 or later.
Create a Storage Container for your data
Create a values.yaml
Deploy Pachyderm CE (includes Console CE) #
deployTarget: "MICROSOFT"
pachd:
storage:
microsoft:
# storage container name
container: "blah"
# storage account name
id: "AKIAIOSFODNN7EXAMPLE"
# storage account key
secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
externalService:
enabled: true
console:
enabled: true
Deploy Enterprise with Console #
Note that when deploying Enterprise with Console, we create a default mock user (username:admin
, password: password
) to authenticate yourself to Console so you don’t have to connect an Identity Provider to make things work. The mock user is a Cluster Admin per default.
deployTarget: "MICROSOFT"
pachd:
storage:
microsoft:
# storage container name
container: "blah"
# storage account name
id: "AKIAIOSFODNN7EXAMPLE"
# storage account key
secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
# Enterprise key
enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
console:
enabled: true
Jump to Helm install
3. Helm Install #
You will be deploying the latest GA release of Pachyderm:
helm repo add pach https://helm.pachyderm.com helm repo update helm install pachd pach/pachyderm -f my_pachyderm_values.yaml
Check your deployment:
kubectl get pods
The deployment takes some time. You can run
kubectl get pods
periodically to check the status of your deployment.Once all the pods are up, you should see a pod for
pachd
running (alongside etcd, pg-bouncer or postgres, console, depending on your installation). If you are curious about the architecture of Pachyderm, take a look at our high-level architecture diagram.System Response:
NAME READY STATUS RESTARTS AGE console-7b69ddf66d-bxmg5 1/1 Running 0 18h etcd-0 1/1 Running 0 18h pachd-5db79fb9dd-b2gdq 1/1 Running 2 18h pg-bouncer-55d9c86768-g8lx7 1/1 Running 0 18h postgres-0 1/1 Running 0 18h
4. Have ‘pachctl’ And Your Cluster Communicate #
You have deployed Pachyderm without Console #
Retrieve the external IP address of pachd service:
kubectl get services | grep pachd-lb | awk '{print $4}'
Then update your context for pachctl to point at your cluster:
pachctl connect grpc://localhost:80
If Authentication is activated (When you deploy with an enterprise key already set, for example), you need to run
pachct auth login
, then authenticate to Pachyderm with your mock User (username:admin
, password:password
), before you usepachctl
.
You have deployed Pachyderm with Console #
To connect to your new Pachyderm instance, run:
pachctl config import-kube local --overwrite
pachctl config set active-context local
Then run
pachctl port-forward
(Background this process in a new tab of your terminal).
Check that your cluster is up and running #
pachctl version
System Response:
COMPONENT VERSION
pachctl 2.5.3
pachd 2.5.3
5. Connect to Console #
To connect to your Console (Pachyderm UI):
- Point your browser to
http://localhost:4000
- If Authentication is activated (When you deploy with an enterprise key already set, for example), you you will be prompted to authenticate: Use your mock User (username:
admin
, password:password
).
You are all set!
6. Try our beginner tutorial. #
7. NOTEBOOKS USERS: Install Pachyderm JupyterLab Mount Extension #
Once your cluster is up and running, you can helm install JupyterHub on your Pachyderm cluster and experiment with your data in Pachyderm from your Notebook cells.
Check out our JupyterHub and Pachyderm Mount Extension page for installation instructions.
Use Pachyderm’s default image and values.yaml jupyterhub-ext-values.yaml
or follow the instructions to update your own.
Make sure to check our data science notebook examples running on Pachyderm, from a market sentiment NLP implementation using a FinBERT model to pipelines training a regression model on the Boston Housing Dataset.