Reference
PachCTL

AWS + Pachyderm

Learn how to deploy to Pachyderm to the cloud with AWS.

March 24, 2023

Before You Start #

This guide assumes that you have already tried Pachyderm locally and have all of the following installed:


1. Create an EKS Cluster #

  1. Use the eksctl tool to deploy an EKS Cluster:
eksctl create cluster --name pachyderm-cluster --region <region> -profile <your named profile>
  1. Verify deployment:
kubectl get all

2. Create an S3 Bucket #

  1. Run the following command:
aws s3api create-bucket --bucket ${BUCKET_NAME} --region ${AWS_REGION}
  1. Verify.
aws s3 ls

3. Enable Persistent Volumes Creation #

  1. Create an IAM OIDC provider for your cluster.
  2. Install the Amazon EBS Container Storage Interface (CSI) driver on your cluster.
  3. Create a gp3 storage class manifest file (e.g., gp3-storageclass.yaml)
    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: gp3
      annotations:
        storageclass.kubernetes.io/is-default-class: "true"
    provisioner: kubernetes.io/aws-ebs
    parameters:
      type: gp3
      fsType: ext4
  4. Set gp3 to your default storage class.
    kubectl apply -f gp3-storageclass.yaml
  5. Verify that it has been set as your default.
    kubectl get storageclass

4. Create a Values.yaml #

Version:

5. Configure Helm #

Run the following to add the Pachyderm repo to Helm:

helm repo add pach https://helm.pachyderm.com
helm repo update
helm install pachd pach/pachyderm -f my_pachyderm_values.yaml 

6. Verify Installation #

  1. In a new terminal, run the following command to check the status of your pods:
kubectl get pods
NAME                                           READY   STATUS      RESTARTS   AGE
pod/console-5b67678df6-s4d8c                   1/1     Running     0          2m8s
pod/etcd-0                                     1/1     Running     0          2m8s
pod/pachd-c5848b5c7-zwb8p                      1/1     Running     0          2m8s
pod/pg-bouncer-7b855cb797-jqqpx                1/1     Running     0          2m8s
pod/postgres-0                                 1/1     Running     0          2m8s
  1. Re-run this command after a few minutes if pachd is not ready.

7. Connect to Cluster #

pachctl connect grpc://localhost:80 
ℹī¸

If the connection commands did not work together, run each separately.

Optionally open your browser and navigate to the Console UI.

💡

You can check your Pachyderm version and connection to pachd at any time with the following command:

pachctl version
COMPONENT           VERSION  

pachctl             2.5.2  
pachd               2.5.2