Pipeline Specification (PPS)
Learn about the different attributes of a pipeline spec.
March 24, 2023
This document discusses each of the fields present in a pipeline specification. To see how to use a pipeline spec to create a pipeline, refer to the create pipeline section.
Before You Start #
- Pachyderm’s pipeline specifications can be written in JSON or YAML.
- Pachyderm uses its json parser if the first character is
{
. - A pipeline specification file can contain multiple pipeline declarations at once.
Minimal Spec #
Generally speaking, the only attributes that are strictly required for all scenarios are pipeline.name
and transform
. Beyond those, other attributes are conditionally required based on your pipeline’s use case. The following are a few examples of common use cases along with their minimally required attributes.
{
"pipeline": {
"project": 1,
"name": "wordcount"
},
"transform": {
"image": "wordcount-image",
"cmd": ["/binary", "/pfs/data", "/pfs/out"]
},
"input": {
"cron": {
{
"name": string,
"spec": string,
"repo": string,
"start": time,
"overwrite": bool
}
}
}
}
{
"pipeline": {
"project": 1,
"name": "wordcount"
},
"transform": {
"image": "wordcount-image",
"cmd": ["/binary", "/pfs/data", "/pfs/out"]
},
"input": {
"pfs": {
"repo": "data",
"glob": "/*"
}
},
"egress": {
"sql_database": {
"url": string,
"file_format": {
"type": string,
"columns": [string]
},
"secret": {
"name": string,
"key": "PACHYDERM_SQL_PASSWORD"
}
}
},
}
{
"pipeline": {
"project": 1,
"name": "wordcount"
},
"transform": {
"image": "wordcount-image",
"cmd": ["/binary", "/pfs/data", "/pfs/out"]
},
"input": {
"pfs": {
"repo": "data",
"glob": "/*"
}
},
"egress": {
"URL": "s3://bucket/dir"
},
}
{
"pipeline": {
"project": 1,
"name": "wordcount"
},
"transform": {
"image": "wordcount-image",
"cmd": ["/binary", "/pfs/data", "/pfs/out"]
},
"input": {
"pfs": {
"repo": "data",
"glob": "/*"
}
}
}
{
"pipeline": {
"project": 1,
"name": "wordcount"
},
"transform": {
"image": "wordcount-image",
"cmd": ["/binary", "/pfs/data", "/pfs/out"]
},
"input": {
"pfs": {
"repo": "data",
"glob": "/*"
}
},
"service": {
"internal_port": int,
"external_port": int
},
}
{
"pipeline": {
"project": 1,
"name": "wordcount"
},
"transform": {
"image": "wordcount-image",
"cmd": ["/binary", "/pfs/data", "/pfs/out"]
},
"spout": {
},
}
{
"pipeline": {
"project": 1,
"name": "wordcount"
},
"transform": {
"image": "wordcount-image",
"cmd": ["/binary", "/pfs/data", "/pfs/out"]
},
"input": {
"pfs": {
"repo": "data",
"glob": "/*"
}
},
"s3_out": true,
}
For a single-page view of all PPS options, go to the PPS series page.