In this tutorial, youl create groups of workflow tasks & enable parallel execution.
Grouping tasks can be helpful as a way to organize the business logic of a workflow, so it’s easier for other teammates to understand your intention, and for enabling parallel execution of certain tasks in a workflow.
Start by reading the TD Workflows introductory tutorial in order to get the context that you need to understand the lesson below.
As a reminder, here is the workflow from the introductory tutorial. In this workflow,
task1 executes first followed by
_export: td: database: workflow_temp +task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open
This workflow executes the tasks in a top-to-bottom sequential order. But, what if we wanted to run these tasks in parallel?
Step 1: Grouping Tasks
Let's create a “group task”, a task that consists of other sub-tasks. In the example below, you see
my_group_task, with the original tasks,
task2 indented 2 spaces to indicate that the tasks are within this new group task.
We’ve also added a final task,
output, which is not part of the
_export: td: database: workflow_temp +my_group_task: +task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open +output: td_run>: <place the name of a saved query here>
Grouping tasks is useful for enabling parallel execution and for organizing a workflow into similar steps that represent a part of your data flow being executed.
For example, you might organize many of your workflows into the following groups:
- Data Preparation
Step 2: Enable Parallel Execution
For every “group task” there is a hidden digdag parameter called
_parallel. This parameter is set to
False by default, but can be set to
True as shown:
_export: td: database: workflow_temp +my_group_task: _parallel: True +task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open +output: td_run>: <place the name of a saved query here>
And, that’s it! you now have all the tasks in
my_group_task run in parallel, followed by the
|As we are using the suffix '.dig' for our yml-like workflow configuration files, **your text editor may not automatically color & indent your workflow file correctly**. yml indentation is 2 spaces, while typically it's automatically set to 4. Most text editor programs will allow you to set '.dig' to automatically be written & read like a '.yml' file. We recommend you to make that modification.|
We would love to hear your feedback! Please share your thoughts on our TD Workflows ideas forum.
Also, if you have any ideas or feedback on the tutorial itself, we’d welcome them here!