Understanding Pachyderm access controls¶
If access controls are activated, each data repository, or repo, in Pachyderm has an Access Control List (ACL) associated with it. The ACL includes:
READERs- users who can read the data versioned in the repo.
WRITERs- users with
READERaccess who can also submit additions, deletions, or modifications of data into the repo.
OWNERs- users with READER and WRITER access who can also modify the repo’s ACL.
Pachyderm defines the following account types:
- GitHub user is a user account that is associated with a GitHub account and logs in through the GitHub OAuth flow. If you do not use any third-party identity provider, you use this option. When a user tries to log in with a GitHub account, Pachyderm verifies the identity and sends a Pachyderm token for that account.
- Robot user is a user account that logs in with a pach-generated authentication token. Typically, you create a user in simplified workflow scenarios, such as initial SAML configuration.
- Pipeline is an account that Pachyderm creates for data pipelines. Pipelines inherit access control from its creator.
- SAML user is a user account that is associated with a Security Assertion Markup Language (SAML) identity provider. When a user tries to log in through a SAML ID provider, the system confirms the identity, associates that identity with a SAML identity provider account, and responds with the SAML identity provider token for that user. Pachyderm verifies the token, drops it, and creates a new internal token that encapsulates the information about the user.
By default, Pachyderm defines one hardcoded group called
Users in the
admin group can perform any
action on the cluster including appointing other admins.
Furthermore, only the cluster admins can manage a repository
Enabling access control¶
Before you enable access controls, make sure that you have activated Pachyderm Enterprise Edition as described in this guide.
To enable access controls, complete the following steps:
Verify the status of the Enterprise features by opening the Pachyderm dashboard in your browser or by running the following
$ pachctl enterprise get-state ACTIVE
Activate the Enterprise access control features by completing the steps in one of these sections:
Activating access controls with the dashboard¶
To activate access controls in the Pachyderm dashboard, complete the following steps:
- Go to the Settings page.
- Click the Activate Access Controls button. After you click the button, Pachyderm enables you to add GitHub users as cluster admins and activate access control:
After activating access controls, you should see the following screen that asks you to log in to Pachyderm:
Activating access controls with
To activate access controls with
pachctl, choose one of these options:
Activate access controls by specifying an initial admin user:
$ pachctl auth activate --initial-admin=<prefix>:<user>
Note: You must prefix the username with the appropriate account
robot:<user>. If you select the
latter, Pachyderm generates and returns a Pachyderm auth token
that might be used to authenticate as the initial robot admin by using
pachctl auth use-auth-token. You can use this option when
you cannot use GitHub as an identity provider.
Activate access controls with a GitHub account:
$ pachctl auth activate
Pachyderm prompts you to log in with your GitHub account. The GitHub account that you sign in with is the only admin until you add more by running
pachctl auth modify-admins.
Logging in to Pachyderm¶
After you activate access controls, log in to your cluster either through the dashboard or CLI. The CLI and the dashboard have independent login workflows:
Log in to the dashboard¶
After you have activated access controls for Pachyderm, you need to log in to use the Pachyderm dashboard as shown above in this section.
To log in to the dashboard, complete the following steps:
- Click the Get GitHub token button. If you have not previously authorized Pachyderm on GitHub, an option to Authorize Pachyderm appears. After you authorize Pachyderm, a Pachyderm user token appears:
- Copy and paste this token back into the Pachyderm login screen and press Enter. You are now logged in to Pachyderm, and you should see your GitHub avatar and an indication of your user in the upper left-hand corner of the dashboard:
Log in to the CLI¶
To log in to
pachctl, complete the following steps:
Type the following command:
pachctl auth login
When you run this command,
pachctlprovides you with a GitHub link to authenticate as a GitHub user.
If you have not previously authorized Pachyderm on GitHub, an option to Authorize Pachyderm appears. After you authorize Pachyderm, a Pachyderm user token appears:
Copy and paste this token back into the terminal and press enter.
You are now logged in to Pachyderm!
Alternatively, you can run the command:
pachctl auth use-auth-token
Paste an authentication token recieved from
pachctl auth activate --initial-admin=robot:<user>or
pachctl auth get-auth-token
Manage and update user access¶
You can manage user access in the UI and CLI.
For example, you are logged in to Pachyderm as the user
and have a repository called
test. Because the user
dwhitena has full
OWNER-level access to the repo.
You can confirm this in the dashboard by navigating to or clicking on
Alternatively, you can confirm your access by running the
pachctl auth get ... command.
$ pachctl auth get dwhitena test` OWNER
An OWNER of
test or a cluster admin can then set other user’s
level of access to the repo by using
pachctl auth set ... command or through the dashboard.
For example, to give the GitHub users
READER, but not
OWNER, access to
WRITER, but not
click on Modify access controls under the repo details
in the dashboard. This functionality allows you to add
the users easily one by one:
Manage the Activation Code¶
When an enterprise activation code expires, an auth-activated
Pachyderm cluster goes into an
admin-only state. In this
state, only admins have access to data that is in Pachyderm.
This safety measure keeps sensitive data protected, even when
an enterprise subscription becomes stale. As soon as the enterprise
activation code is updated by using the dashboard or CLI, the
Pachyderm cluster returns to its previous state.
When you deactivate access controls on a Pachyderm cluster
pachctl auth deactivate, the cluster returns
its original state that including the
- All ACLs are deleted.
- The cluster returns to being a blank slate in regards to access control. Everyone that can connect to Pachyderm can access and modify the data in all repos.
- No users are present in Pachyderm, and no one can log in to Pachyderm.