Run Commands
Read the PPS series >

Spout PPS

Ingest streaming data using a spout pipeline.

Spec #

This is a top-level attribute of the pipeline spec.

{
  "pipeline": {...},
  "transform": {...},
  "spout": {
  \\ Optionally, you can combine a spout with a service:
  "service": {
    "internal_port": int,
    "external_port": int
    }
  },
  ...
}

Attributes #

AttributeDescription
serviceAn optional field that is used to specify how to expose the spout as a Kubernetes service.
internal_portUsed for the spout’s container.
external_portUsed for the Kubernetes service that exposes the spout.

Behavior #

Diagram #

spout-tldr

When to Use #

You should use the spout field in a Pachyderm Pipeline Spec when you want to read data from an external source that is not stored in a Pachyderm repository. This can be useful in situations where you need to read data from a service that is not integrated with Pachyderm, such as an external API or a message queue.

Example scenarios:

Example #

{
  "pipeline": {
    "name": "my-spout"
  },
    "spout": {
  },
  "transform": {
    "cmd": [ "go", "run", "./main.go" ],
    "image": "myaccount/myimage:0.1",
    "env": {
        "HOST": "kafkahost",
        "TOPIC": "mytopic",
        "PORT": "9092"
    }
  }
}
💡

For a first overview of how spouts work, see our spout101 example.