Beginner Tutorial:
Explore Resources, Data, & Logs

Part 5: Explore Resources, Data, & Logs

A big part of building and maintaining a DAG of pipelines is understanding how to explore the resources, input/output data, and logs associated with your pipelines. Doing so will enable you to trouble-shoot issues, validate resource creation, and get a better understanding of how your pipelines are processing data.


List, Inspect, & Troubleshoot

Congratulations! You’ve successfully created a DAG of pipelines that process video files into a collage. However, we’ve only just scratched the surface of what you can do with Pachyderm. Now that you have a working pipeline, try out some of these commands to explore all of the details associated with the DAG.

You can quickly take an account of all the resources you’ve created by listing them in the terminal.

pachctl list projects
pachctl list repos
pachctl list pipelines
pachctl list commits
pachctl list jobs --pipeline content_collager
pachctl list files content_collager@master

You can inspect resources to get key details from within the terminal. This is a fast and easy way to validate resource creation and config.

pachctl inspect project video-to-frame-traces
pachctl inspect repo content_collager
pachctl inspect commit content_collager@<commit-id>
pachctl inspect pipeline content_collager
pachctl inspect files content_collager@master

Let’s say you’ve uploaded corrupted data or realized that your pipeline’s glob pattern/user code is flawed and you want to stop job processing and take a look at your logs. There’s a number of commands you can run to get to the bottom of the issue.

pachctl stop pipeline content_collager
pachctl stop job content_collager@<job-id>
pachctl logs --pipeline content_collager
pachctl debug dump debug_dump.tar.gz

Once you’ve updated your pipeline spec/user code, you’ll want to reprocess the data for one of your pipelines. Here’s how you can do that:

pachctl update pipeline -f content_collager.yaml --reprocess

For a comprehensive list of operations, check out the Build DAGs section of the documentation or browse the Command Library.

Bonus Exercise

  • How would you update the glob pattern in the video converter pipeline spec (video_mp4_converter.yaml) to only process video files in the raw_videos_and_images repo? That would enable you to reduce the complexity of the user code in def process_video_files and make the pipeline more efficient.

Congratulations!

You've successfully completed the Explore Resources, Data, & Logs.

🎉