In this tutorial we will create a workflow that runs Presto and Hive queries in coordination with one another.
If you haven’t already, start by going through the TD Workflows introductory tutorial
We will use the workflow project downloaded in the preceding linked tutorial.
As a reminder, here is the workflow from the introductory tutorial that we will be modifying.
timezone: UTC schedule: daily>: 07:00:00 _export: td: database: workflow_temp +task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open
Run workflow as is
First, go to your Treasure Data jobs page, so you can follow along as you run the workflows below. You can access the page here: https://console.treasuredata.com/jobs
Now, go into the
nasdaq_analysis directory, and run the following command.
$ td wf run --all
|Why should you include the `--all`? It's because you've probably run this workflow before. As a result, TD Workflows automatically remembers that it has run successfully, and thus shouldn't need to be run again. You can overwrite this condition by including this condition.|
You should see the following job runs in your account.
Modify so the dependent task executes a hive job
Having a query run on Hive is as simple as adding an extra parameter. Modify your workflow file with the following addition of
engine: hive as a parameter to +task2.
+task2: td>: queries/monthly_open.sql create_table: monthly_open engine: hive
Run the query of the modified workflow
$ td wf run --all
That’s it! It’s now simple to run Presto & Hive jobs in coordination on Treasure Data using Workflows.
If you have any feedback we welcome hearing your thoughts on our TD Workflows ideas forum.
Also, if you have any ideas or feedback on the tutorial itself, we’d welcome them here!