Infrastructure Recommendations
Learn about some of our infrastructure recommendations.
May 26, 2023
In the simplest case, such as running a Pachyderm cluster locally, implicit and
explicit port-forwarding enables you to communicate with pachd
, the Pachyderm
API pod, and console
, the Pachyderm UI. Port-forwarding can be used in
cloud environments as well, but a production environment might require you to
define additional inbound connection rules.
Before we dive into the delivery of external traffic to Pachyderm, read the following recommendations to set up your infrastructure in production.
Refer to our generic “Helm Install” page for more information on how to install and get started with Helm
.
Pachyderm Infrastructure Recommendations #
We are now shipping Pachyderm with an embedded proxy allowing your cluster to expose one single port externally. This deployment setup is optional.
If you choose to deploy Pachyderm with a Proxy, our new recommended architecture and deployment instructions overwrite the following instructions.
For production deployments, we recommend that you:
Use a secure connection
Make sure that you have Transport Layer Security (TLS) enabled for Ingress connections. You can deploy
pachd
andconsole
with different certificates if required. Self-signed certificates might require additional configuration. For instructions on deployment with TLS, see Deploy Pachyderm with TLS.Use Pachyderm authentication/authorization
Pachyderm authentication is an additional security layer to protect your data from unauthorized access. See the authentication and authorization section to activate access control and set up an IdP (Identity Provider).
Add an Ingress Controller to your cluster for HTTP/HTTPS incoming traffic.
Provision a TCP load balancer for gRPC incoming traffic. Provision a TCP load balancer with port
30650
(gRPC port) and30600
(s3gateway port) forwarding to pachd.Configure access to your external IP addresses through firewalls or your Cloud Provider Network Security.
(Optional) Create a DNS entry for each public IP (each Load Balancer)
Once you have your networking infrastructure setup, check the deployment page of your cloud provider. The following section comes back to the setup of an Ingress and a TCP Load Balancer in details.
Deliver External Traffic To Pachyderm #
Pachyderm provides multiple ways to deliver external traffic to services.
However, we recommend to set up the following resources in a production environment:
- An Ingress Controller to manage HTTP/HTTPS external access to the
Console
and authentication services (oidc
andidentity
services). - A TCP Load Balancer to manage gRPC external access to
pachd
.
The diagram below gives a quick overview of those recommendations on AWS EKS:
NodePort
#
By default, the local deployment of Pachyderm deploys the pachd
service as type:NodePort
. However, NodePort
is a limited solution that is not recommended in production deployments. Therefore, Pachyderm services are otherwise exposed on the cluster internal IP (ClusterIP) instead of each node’s IP (Nodeport).
Ingress
#
An Ingress exposes HTTP and HTTPS routes from outside the cluster to services in the cluster such as Console or Authentication services.
To configure the Ingress, enable the ingress
field in your values.yaml, and chose one of the following:
- deploy your preferred Ingress Controller (Traefik, NGINX).
- or, provide any specific Kubernetes Ingress annotations to customize your ingress controller behavior.
If your ingress
is enabled:
- Cloud providers may provision a Load balancer automatically. For example, AWS will provision an Application Load Balancer (ALB) in front of Console.
- The deployment of Pachyderm (Check our Helm documentation) automatically creates the following set of rules:
- host: <your_domain_name>
http:
paths:
- path: "/dex"
backend:
serviceName: "pachd"
servicePort: "identity-port"
- path: "/authorization-code/callback"
backend:
serviceName: "pachd"
servicePort: "oidc-port"
- path: "/*"
backend:
serviceName: "console"
servicePort: "console-http"
See our reference values.yaml for all available fields.
You might choose to deploy your preferred Ingress Controller (Traefik, NGINX). Read about the installation and configuration of Traefik on a cluster.
To have the ingress routes use the https protocol without enabling the cert secret configuration, set ingress.uriHttpsProtoOverride
to true in your values.yaml.
Example on AWS EKS #
In the example below, we are opening the HTTPS port and enabling TLS.
ingress:
enabled: true
annotations:
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:region:account-id:certificate/aaaa-bbbb-cccc
alb.ingress.kubernetes.io/group.name: pachyderm # lets multiple ingress resources be configured into one load balancer
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/security-groups: sg-aaaa
alb.ingress.kubernetes.io/subnets: subnet-aaaa, subnet-bbbb, subnet-cccc
alb.ingress.kubernetes.io/target-type: ip
kubernetes.io/ingress.class: alb
host: "your_domain_name"
tls:
enabled: true
secretName: "pach-tls"
Example on GCP GKE #
In the example below using the ingress controller Traefik, we are opening the HTTPS port and enabling TLS.
ingress:
enabled: true
annotations:
kubernetes.io/ingress.clas: traefik
host: "your_domain_name"
tls:
enabled: true
secretName: "pach-tls"
Example on Azure AKS #
In the example below, we are using the ingress controller Nginx, and opening the HTTP port.
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: "nginx"
host: "your_domain_name"
tls:
enabled: true
secretName: "pach-tls"
ATTENTION: You must use TLS when deploying on Azure.
As of today, few Ingress Controller offer full support of the gRPC protocol. To access pachd
over gRPC (for example, when using pachctl
or the s3Gateway, we recommend using a Load Balancer instead.
See Also:
- Kubernetes Ingress.
- Kubernetes Ingress Controller.
LoadBalancer
#
You should load balance all gRPC and S3 incoming traffic to a TCP LB (load balanced at L4 of the OSI model) deployed in front of the pachd
service. To automatically provision an external load balancer in your current cloud (if supported), enable the externalService
field of the pachd
service in your values.yaml as follow:
# If enabled, External service creates a service which is safe to
# be exposed externally
pachd:
externalService:
enabled: true
apiGRPCPort: 30650
s3GatewayPort: 30600
annotations: {see example below}
See our reference values.yaml for all available fields.
When externalService is enabled, Pachyderm creates a corresponding pachd-lb
service of type:LoadBalancer
allowing your cloud platform (AWS, GKE…) to provision a TCP Load Balancer automatically.
Add the appropriate annotations to attach any Load Balancer configuration information to the metadata of your service.
Example on AWS EKS #
In the following example, we deploy an NLB and enable TLS on AWS EKS:
pachd:
externalService:
enabled: true
apiGRPCPort: 30650
s3GatewayPort: 30600
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
service.beta.kubernetes.io/aws-load-balancer-subnets: "subnet-aaaaa,subnet-bbbbb,subnet-ccccc"
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:region:account-id:certificate/aaa-bbb-cccc"
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "30600,30650,30657,30658"
Example on GCP GKE #
In the following example, we pre created a static IP by running gcloud compute addresses create ADDRESS_NAME --global --ip-version IPV4
, then passed this external IP to the values.yaml as follow:
pachd:
externalService:
enabled: true
apiGRPCPort: 30650
s3GatewayPort: 30600
loadBalancerIP: ${ADDRESS_NAME}
Example on Azure AKS #
This example is identical to the example on Google GKE.
pachd:
externalService:
enabled: true
apiGRPCPort: 30650
s3GatewayPort: 30600
loadBalancerIP: ${ADDRESS_NAME}
Next: Find the deployment page that matches your cloud provider