Branch

Learn about the concept of a branch in Pachyderm.

December 2, 2022

A Pachyderm branch is a pointer to a commit that moves along with new commits as they are submitted. By default, when you create a repository, Pachyderm does not create any branches. Most users prefer to create a master branch by initiating the first commit and specifying the master branch in the put file command.

Branches enable collaboration between teams of data scientists. However, many users find it sufficient to use the master branch for all their work. Although the concept of a branch is similar to Git branches, in most cases, branches are not used as extensively as in source code version-control systems.

A branch also stores information about provenance, the other branches it uses as input and which rely on its output. These branch relationships are how Pachyderm knows which data each pipeline relies on, and which branches and repos should be included in each commit.

Each branch has a HEAD which references the latest commit in the branch. Pachyderm pipelines look at the HEAD of the branch for changes and, if they detect new changes, trigger a job. When you commit a new change, the HEAD of the branch moves to the latest commit.

You can create additional branches to experiment with the data (pachctl create branch <myrepo>@<branchname>. Optionally, you can add --head <myrepo>@<master> for the head of the new branch to reference the head commit on master).

To view a list of branches in a repo, run the pachctl list branch <myrepo> command.

example #

pachctl list branch images

System Response:

BRANCH HEAD
master c32879ae0e6f4b629a43429b7ec10ccc
⚠️
  • Deleting a branch (pachctl delete branch <myrepo>@<branchname>) does not delete the commits on it.
  • All branches must have a head commit.