Learn about the types of pipelines used in Pachyderm, including: spout, cron, and service pipelines.

March 30, 2023

A pipeline is a Pachyderm primitive that is responsible for reading data from a specified source, such as a Pachyderm repo, transforming it according to the pipeline configuration, and writing the result to an output repo.

A pipeline subscribes to a branch in one or more input repositories. Every time the branch has a new commit, the pipeline executes a job that runs your code to completion and writes the results to a commit in the output repository. Every pipeline automatically creates an output repository by the same name as the pipeline. For example, a pipeline named model writes all results to the model output repo.

In Pachyderm, a Pipeline is an individual execution step. You can chain multiple pipelines together to create a directed acyclic graph (DAG).

You define a pipeline declaratively, using a JSON or YAML file. Pipeline specification files follow Pachyderm’s pipeline reference specification file.

A minimum pipeline specification must include the following parameters:

example #

  "pipeline": {
    "name": "wordcount"
  "transform": {
    "image": "wordcount-image",
    "cmd": ["python3", "/"]
  "input": {
        "pfs": {
            "repo": "data",
            "glob": "/*"