A

Ancestry Syntax

Learn about the concept of Ancestry Syntax, which is used to reference the history of commits and branches in a repository.

B

Branch

A pointer to a commit that moves along with new commits as they are submitted.

C

Commit

An atomic operation that snapshots and preserves the state of files/directories within a repository.

Commit Set

Learn about the concept of a commit set, which is an immutable set of all the commits that resulted from a modification to the system.

Cron

Learn about the concept of a cron

D

DAG

Learn about DAGs, the Directed Acyclic Graphs that define the order in which pipelines are executed and how data flows between them.

Data Parallelism

Learn about the concept of data parallelism.

Datum

Learn about datums, the smallest indivisible unit of computation within a job.

Deferred Processing

Learn about the concept of deferred processing, which allows you to commit data more frequently than you process it.

Distributed Computing

Learn about the concept of distributed computing, which allows you to split your jobs across multiple workers.

E

F

File

A Unix filesystem object (directory or file) that stores data.

G

Glob Pattern

Learn about the concept of a glob pattern, which is a string of characters that specifies a set of filenames or paths in a file system.

Global Identifier

Learn about the concept of a global identifier, which is a unique identifier for a DAG.

H

History

The collective record of version-controlled commits for pipelines and jobs.

I

Input Repository

Learn about the concept of an input repository, which is a location where data resides that is used as input for a pipeline.

J

Job

Learn about the concept of a Job, which is a unit of work that is created by a pipeline.

K

L

M

N

NLP

Learn about the concept of NLP, which is a subfield of machine learning that focuses on teaching machines to understand and generate human language.

O

Output Repository

Learn about the concept of an output repo, which is a repository where the results of a pipeline's processing are stored after being transformed by the provided user code.

P

Pachyderm Worker

Learn about the concept of a Pachyderm worker.

Pipeline

Learn about the concept of a pipeline, which is a primitive responsible for reading data from a specified source, transforming it according to the pipeline specification, and writing the result to an output repo.

Pipeline Inputs

Learn about the concept of a pipeline input, which is the source of the data that the pipeline reads and processes.

Pipeline Specification

Learn about the concept of a pipeline specification, which is a declarative configuration file used to define the behavior of a pipeline.

Project

Learn about the concept of a project, which is a workspace collection of repositories and pipelines.

Provenance

The recorded data lineage that tracks the dependencies and relationships between datasets.

Q

R

S

T

Task Parallelism

Learn about the concept of task parallelism.

U

User Code

Learn about the concept of User Code, which is custom code that users write to process their data in pipelines.

V

W

X

Y

Z