In this tutorial we will create groups of workflow tasks & enable parallel execution.
Grouping tasks can be helpful as a way to organize the business logic of a workflow, so it’s easier for other teammates to understand your intention, and for enabling parallel execution of certain tasks in a workflow.
If you haven’t already, please start by reading the TD Workflows introductory tutorial. This will provide you the context needed to understand the lesson below.
As a reminder, here is the workflow from the introductory tutorial. In this workflow,
task1 executes first followed by
_export: td: database: workflow_temp +task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open
This workflow will execute the tasks in a top-to-bottom sequential order. But, what if we wanted to run these tasks in parallel?
Step 1: Grouping Tasks
Here, we will create a “group task”, a task that consists of other sub-tasks. In the example below, you will see it named as
my_group_task, with the original tasks,
task2 indented 2 spaces to indicate they are within this new group task.
We’ve also added a final task,
output, which is not part of the
_export: td: database: workflow_temp +my_group_task: +task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open +output: td_run>: <place the name of a saved query here>
Grouping tasks is incredibly useful for enabling parallel execution and for organizing a workflow into similar steps that represent a part of your data flow being executed.
For example, you might organize many of your workflows into the following groups:
- Data Preparation
Step 2: Enable Parallel Execution
For every “group task” there is a hidden digdag parameter called
_parallel. This parameter is set to
False by default, but can be set to
True as shown below.
_export: td: database: workflow_temp +my_group_task: _parallel: True +task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open +output: td_run>: <place the name of a saved query here>
And, that’s it! If you change your initial workflow as shown above, you will now have all the tasks in
my_group_task run in parallel, followed by the
|As we are using the suffix '.dig' for our yml-like workflow configuration files, **your text editor may not automatically color & indent your workflow file correctly**. yml indentation is 2 spaces, while typically it's automatically set to 4. Most text editor programs will allow you to set '.dig' to automatically be written & read like a '.yml' file. We recommend you to make that modification.|
We would love to hear your feedback! Please share your thoughts on our TD Workflows ideas forum.
Also, if you have any ideas or feedback on the tutorial itself, we’d welcome them here!