Projects¶
Prefect projects are the recommended way to organize and manage Prefect deployments; projects provide a minimally opinionated, transparent way to organize your code and configuration so that its easy to debug.
A project is a directory of code and configuration for your workflows that can be customized for portability.
The main components of a project are 3 files:
deployment.yaml
: a YAML file that can be used to specify settings for one or more flow deploymentsprefect.yaml
: a YAML file that contains procedural instructions for building artifacts for this project's deployments, pushing those artifacts, and retrieving them at runtime by a Prefect worker.prefect/
: a hidden directory that designates the root for your project; basic metadata about the workflows within this project are stored here
Projects require workers
Note that using a project to manage your deployments requires the use of workers.
This tutorial assumes that you have already set up two work pools, each with a worker, which only requires a single CLI command for each:
- Local:
prefect worker start -t process -p local-work
- Docker:
prefect worker start -t docker -p docker-work
Each command will automatically create an appropriately typed work pool with default settings.
Initializing a project¶
Initializing a project is simple: within any directory that you plan to develop flow code, run:
$ prefect project init
Note that you can safely run this command in a non-empty directory where you already have work, as well.
This command will create your .prefect/
directory along with the two YAML files deployment.yaml
and prefect.yaml
; if any of these files or directories already exist, they will not be altered or overwritten.
Project Recipes
Prefect ships with multiple project recipes, which allow you to initialize a project with a more opinionated structure suited to a particular use. You can see all available recipes by running:
$ prefect project recipe ls
And you can use recipes with the --recipe
flag:
$ prefect project init --recipe docker
Providing this flag will prompt you for required variables needed to make the recipe work properly. If you want to run this CLI programmatically, these required fields can be provided via the --field
flag: prefect project init --recipe docker --field image_name=my-image/foo --field tag=dev
.
If no recipe is provided, the init
command makes an intelligent choice of recipe based on local configuration; for example, if you initialize a project within a git repository, Prefect will automatically use the git
recipe.
Creating a basic deployment¶
Projects are most useful for creating deployments; let's walk through some examples.
Local deployment¶
In this example, we'll create a project from scratch that runs locally. Let's start by creating a new directory, making that our working directory, and initializing a project:
$ mkdir my-first-project
$ cd my-first-project
$ prefect project init --recipe local
Next, let's create a flow by saving the following code in a new file called api_flow.py
:
# contents of my-first-project/api_flow.py
import requests
from prefect import flow
@flow(name="Call API", log_prints=True)
def call_api(url: str = "http://time.jsontest.com/"):
"""Sends a GET request to the provided URL and returns the JSON response"""
resp = requests.get(url).json()
print(resp)
return resp
You can experiment by importing and running this flow in your favorite REPL; let's now elevate this flow to a deployment via the prefect deploy
CLI command:
$ prefect deploy ./api_flow.py:call_api \
-n my-first-deployment \
-p local-work
This command will create a new deployment for your "Call API"
flow with the name "my-first-deployment"
that is attached to the local-work
work pool.
Note that Prefect has automatically done a few things for you:
- registered the existence of this flow with your local project
- created a description for this deployment based on the docstring of your flow function
- parsed the parameter schema for this flow function in order to expose an API for running this flow
You can customize all of this either by manually editing deployment.yaml
or by providing more flags to the prefect deploy
CLI command; CLI inputs will be prioritized over hard-coded values in your deployment's YAML file when creating or updating a single deployment.
Let's create two ad-hoc runs for this deployment and confirm things are healthy:
$ prefect deployment run 'Call API/my-first-deployment'
$ prefect deployment run 'Call API/my-first-deployment' \
--param url=https://cat-fact.herokuapp.com/facts/
You should now be able to monitor and confirm these runs were created and ran in the UI.
Flow registration
prefect deploy
will automatically register your flow with your local project; you can register flows yourself explicitly with the prefect project register-flow
command:
$ prefect project register-flow ./api_flow.py:call_api
This pre-registration allows you to deploy based on name instead of entrypoint path:
$ prefect deploy -f 'Call API' \
-n my-first-deployment \
-p local-work
Git-based deployment¶
In this example, we'll initialize a project from a pre-built GitHub repository and see how it is automatically portable across machines.
We start by cloning the remote repository and initializing a project within the root of the repo directory:
$ git clone https://github.com/PrefectHQ/hello-projects
$ cd hello-projects
$ prefect project init --recipe git
We can now proceed with the same steps as above to create a new deployment:
$ prefect deploy -f 'log-flow' \
-n my-git-deployment \
-p local-work
Notice that we were able to deploy based on flow name alone; this is because the repository owner pre-registered the log-flow
for us. Alternatively, if we knew the full entrypoint path, we could run prefect deploy ./flows/log_flow.py:log_flow
.
Let's run this flow and discuss it's output:
$ prefect deployment run 'log-flow/my-git-deployment'
In your worker process, you should see output that looks something like this:
Cloning into 'hello-projects'...
...
12:01:43.188 | INFO | Task run 'log_task-0' - Hello Marvin!
12:01:43.189 | INFO | Task run 'log_task-0' - Prefect Version = 2.8.7+84.ge479b48b6.dirty 🚀
12:01:43.189 | INFO | Task run 'log_task-0' - Hello from another file
...
12:01:43.236 | INFO | Task run 'log_config-0' - Found config {'some-piece-of-config': 100}
...
12:01:43.266 | INFO | Flow run 'delicate-labrador' - Finished in state Completed('All states completed.')
A few important notes on what we're looking at here:
- You'll notice the message "Hello from another file"; this flow imports code from other related files within the project. Prefect takes care of migrating the entire project directory for you, which includes files that you may import from
- Similarly, the configuration that is logged is located within the root directory of this project; you can always consider this root directory your working directory both locally and when this deployment is executed remotely
- Lastly, note the top line "Cloning into 'hello-projects'..."; because this project is based out of a GitHub repository, it is automatically portable to any remote location where both
git
andprefect
are configured! You can convince yourself of this by either running a new local worker on a different machine, or by switching this deployment to run with your docker work pool (more on this shortly).
prefect.yaml
The above process worked out-of-the-box because of the information stored within prefect.yaml
; if you open this file up in a text editor, you'll find that is not empty. Specifically, it contains the following pull
step that was automatically populated when you first ran prefect project init
:
pull:
- prefect.projects.steps.git_clone_project:
repository: https://github.com/PrefectHQ/hello-projects.git
branch: main
access_token: null
pull
steps are the instructions sent to your worker's runtime environment that allow it to clone your project in remote locations. For more information, see the project concept documentation.
For more examples of configuration options available for cloning projects, see the git_clone_project
step documentation.
Dockerized deployment¶
In this example, we extend the examples above by dockerizing our setup and executing runs with a Docker Worker. Building off the git-based example above, let's switch our deployment to submit work to the docker-work
work pool that we started at the beginning:
$ prefect deploy -f 'log-flow' \
-n my-docker-git-deployment \
-p docker-work
$ prefect deployment run 'log-flow/my-docker-git-deployment'
As promised above, this worked out of the box!
Let's deploy a new flow from this project that requires additional dependencies that might not be available in the default image our work pool is using; this flow requires both pandas
and numpy
as a dependency, which we will install locally first to confirm the flow is working:
$ pip install -r requirements.txt
$ python flows/pandas_flow.py
We now have two options for how to manage these dependencies in our worker's environment:
- setting the
EXTRA_PIP_PACKAGES
environment variable or using another hook to install the dependencies at runtime - building a custom Docker image with the dependencies baked in
In this tutorial, we will focus on building a custom Docker image. First, we need to configure a build
step within our prefect.yaml
file as follows (Note: if starting from scratch we could use the docker-git
recipe):
# partial contents of prefect.yaml
build:
- prefect_docker.projects.steps.build_docker_image:
image_name: local-only/testing
tag: dev
dockerfile: auto
push: false
pull:
- prefect.projects.steps.git_clone_project:
repository: https://github.com/PrefectHQ/hello-projects.git
branch: main
access_token: null
A few notes:
- each step references a function with inputs and outputs
- in this case, we are using
dockerfile: auto
to tell Prefect to automatically create aDockerfile
for us; otherwise we could write our own and pass its location as a path to thedockerfile
kwarg - to avoid dealing with real image registries, we are not pushing this image; in most use cases you will want
push: true
(which is the default) - to see all available configuration options for building Docker images, see the
build_docker_image
step documentation
All that's left to do is create our deployment and specify our image name to instruct the worker what image to pull:
$ prefect deploy -f 'pandas-flow' \
-n docker-build-deployment \
-p docker-work \
-v image=local-only/testing:dev
$ prefect deployment run 'log-flow/my-docker-git-deployment'
Your run should complete successfully, logs and all! Note that the -v
flag represents a job variable, which are the allowed pieces of infrastructure configuration on a given work pool. Each work pool can customize the fields they accept here.
Templating values
As a matter of best practice, you should avoid hardcoding the image name and tag in both your prefect.yaml
and CLI. Instead, you should use variable templating.
Dockerizing a local deployment¶
Revisiting our local deployment above, let's begin by switching it to submit work to our docker-work
work pool by re-running prefect deploy
to see what happens:
$ prefect deploy ./api_flow.py:call_api \
-n my-second-deployment \
-p docker-work
$ prefect deployment run 'Call API/my-second-deployment'
This fails with the following error:
ERROR: FileNotFoundError: [Errno 2] No such file or directory: '/Users/chris/dev/my-first-project'
The reason this occurs is because our deployment has a fundamentally local pull
step; inspecting prefect.yaml
we find:
pull:
- prefect.projects.steps.set_working_directory:
directory: /Users/chris/dev/my-first-project
In order to successfully submit such a project to a dockerized environment, we need to either:
push
this project to a remote location (such as a Cloud storage bucket)build
this project into a Docker image artifact
Advanced: push
steps
Populating a push
step is considered an advanced feature of projects that requires additional considerations to ensure the pull
step is compatible with the push
step; as such it is out of scope for this tutorial.
Following the same structure as above, we will include a new build
step as well as alter our pull
step to be compatible with the built image's filesystem:
# partial contents of prefect.yaml
build:
- prefect_docker.projects.steps.build_docker_image:
image_name: local-only/testing
tag: dev2
dockerfile: auto
push: false
pull:
- prefect.projects.steps.set_working_directory:
directory: /opt/prefect/hello-projects
Rerunning the same deploy
command above now makes this a healthy deployment!
Customizing the steps¶
For more information on what can be customized with prefect.yaml
, check out the Projects concept doc.