Reference
PachCTL

Infrastructure Recommendations

Learn about some of our infrastructure recommendations.

May 26, 2023

In the simplest case, such as running a Pachyderm cluster locally, implicit and explicit port-forwarding enables you to communicate with pachd, the Pachyderm API pod, and console, the Pachyderm UI. Port-forwarding can be used in cloud environments as well, but a production environment might require you to define additional inbound connection rules.

Before we dive into the delivery of external traffic to Pachyderm, read the following recommendations to set up your infrastructure in production.

ℹ️

Refer to our generic “Helm Install” page for more information on how to install and get started with Helm.

Pachyderm Infrastructure Recommendations #

⚠️

We are now shipping Pachyderm with an embedded proxy allowing your cluster to expose one single port externally. This deployment setup is optional.

If you choose to deploy Pachyderm with a Proxy, our new recommended architecture and deployment instructions overwrite the following instructions.

For production deployments, we recommend that you:

Once you have your networking infrastructure setup, check the deployment page of your cloud provider. The following section comes back to the setup of an Ingress and a TCP Load Balancer in details.

Deliver External Traffic To Pachyderm #

Pachyderm provides multiple ways to deliver external traffic to services.

However, we recommend to set up the following resources in a production environment:

The diagram below gives a quick overview of those recommendations on AWS EKS:

Infrastruture Recommendation

NodePort #

By default, the local deployment of Pachyderm deploys the pachd service as type:NodePort. However, NodePort is a limited solution that is not recommended in production deployments. Therefore, Pachyderm services are otherwise exposed on the cluster internal IP (ClusterIP) instead of each node’s IP (Nodeport).

Ingress #

An Ingress exposes HTTP and HTTPS routes from outside the cluster to services in the cluster such as Console or Authentication services.

To configure the Ingress, enable the ingress field in your values.yaml, and chose one of the following:

If your ingress is enabled:

    - host: <your_domain_name>
    http:
        paths:
        - path: "/dex"
            backend:
            serviceName: "pachd"
            servicePort: "identity-port"
        - path: "/authorization-code/callback"
            backend:
            serviceName: "pachd"
            servicePort: "oidc-port"
        - path: "/*"
            backend:
            serviceName: "console"
            servicePort: "console-http"

See our reference values.yaml for all available fields.

📖

You might choose to deploy your preferred Ingress Controller (Traefik, NGINX). Read about the installation and configuration of Traefik on a cluster.

⚠️

To have the ingress routes use the https protocol without enabling the cert secret configuration, set ingress.uriHttpsProtoOverride to true in your values.yaml.

Example on AWS EKS #

In the example below, we are opening the HTTPS port and enabling TLS.

ingress:
    enabled: true
    annotations: 
        alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:region:account-id:certificate/aaaa-bbbb-cccc
        alb.ingress.kubernetes.io/group.name: pachyderm # lets multiple ingress resources be configured into one load balancer
        alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
        alb.ingress.kubernetes.io/scheme: internal
        alb.ingress.kubernetes.io/security-groups: sg-aaaa
        alb.ingress.kubernetes.io/subnets: subnet-aaaa, subnet-bbbb, subnet-cccc
        alb.ingress.kubernetes.io/target-type: ip
        kubernetes.io/ingress.class: alb
    host: "your_domain_name"
    tls:
        enabled: true
        secretName: "pach-tls"

Example on GCP GKE #

In the example below using the ingress controller Traefik, we are opening the HTTPS port and enabling TLS.

ingress:
    enabled: true
    annotations: 
      kubernetes.io/ingress.clas: traefik
    host: "your_domain_name"
    tls:
        enabled: true
        secretName: "pach-tls"

Example on Azure AKS #

In the example below, we are using the ingress controller Nginx, and opening the HTTP port.

ingress:
    enabled: true
    annotations:
        kubernetes.io/ingress.class: "nginx"
    host: "your_domain_name" 
    tls:
        enabled: true
        secretName: "pach-tls" 

ATTENTION: You must use TLS when deploying on Azure.

As of today, few Ingress Controller offer full support of the gRPC protocol. To access pachd over gRPC (for example, when using pachctl or the s3Gateway, we recommend using a Load Balancer instead.

📖

See Also:

LoadBalancer #

You should load balance all gRPC and S3 incoming traffic to a TCP LB (load balanced at L4 of the OSI model) deployed in front of the pachd service. To automatically provision an external load balancer in your current cloud (if supported), enable the externalService field of the pachd service in your values.yaml as follow:

# If enabled, External service creates a service which is safe to
# be exposed externally
pachd:
  externalService:
    enabled: true
    apiGRPCPort: 30650
    s3GatewayPort: 30600
    annotations: {see example below}

See our reference values.yaml for all available fields.

ℹ️

When externalService is enabled, Pachyderm creates a corresponding pachd-lb service of type:LoadBalancer allowing your cloud platform (AWS, GKE…) to provision a TCP Load Balancer automatically.

Add the appropriate annotations to attach any Load Balancer configuration information to the metadata of your service.

Example on AWS EKS #

In the following example, we deploy an NLB and enable TLS on AWS EKS:

pachd:
  externalService:
    enabled: true
    apiGRPCPort: 30650
    s3GatewayPort: 30600
    annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: "external"
        service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
        service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
        service.beta.kubernetes.io/aws-load-balancer-subnets: "subnet-aaaaa,subnet-bbbbb,subnet-ccccc"
        service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:region:account-id:certificate/aaa-bbb-cccc"
        service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "30600,30650,30657,30658"

Example on GCP GKE #

In the following example, we pre created a static IP by running gcloud compute addresses create ADDRESS_NAME --global --ip-version IPV4, then passed this external IP to the values.yaml as follow:

pachd:
  externalService:
    enabled: true
    apiGRPCPort: 30650
    s3GatewayPort: 30600
    loadBalancerIP: ${ADDRESS_NAME}

Example on Azure AKS #

This example is identical to the example on Google GKE.

pachd:
  externalService:
    enabled: true
    apiGRPCPort: 30650
    s3GatewayPort: 30600
    loadBalancerIP: ${ADDRESS_NAME}

Next: Find the deployment page that matches your cloud provider